Blab Information Extract
Extract structured data from documents/images using Upstage Information Extraction
Overview
This node extracts structured data from documents or images using the Upstage Information Extraction API. It supports input as either binary data from a previous node or an image URL. Users can provide a JSON schema or a full response format to guide the extraction process. The node is useful for automating data extraction from various document types, such as invoices, receipts, or forms, enabling integration of extracted data into workflows.
Use Case Examples
- Extract structured data from scanned invoices by providing a JSON schema to parse key fields like invoice number, date, and total amount.
- Use an image URL of a receipt to extract itemized purchase information using the recommended information-extract model.
- Generate a JSON schema from a sample document image to use in subsequent extraction operations.
Properties
| Name | Meaning |
|---|---|
| Input Type | Specifies whether the input is binary data from a previous node or an image URL. |
| Binary Property | Name of the binary property containing the file, used when input type is binary. |
| Image URL | URL of the image to process, used when input type is URL. |
| Model | The model to use for information extraction, currently only 'information-extract' is supported. |
| Schema Input Type | Determines how the JSON schema is provided: either as schema only or full response format. |
| Schema Name | Name for the JSON schema in the response format, used when schema input type is 'schema'. |
| JSON Schema (object) | The target JSON schema object for extraction, used when schema input type is 'schema'. |
| Full Response Format JSON | Complete response format JSON including type, json_schema, name, and schema, used when schema input type is 'full'. |
| Pages per Chunk | Number of pages to chunk for performance optimization, recommended for documents with 30+ pages. 0 disables chunking. |
| Return | Specifies what to return: extracted JSON only, schema JSON only, or full response. |
Output
JSON
extracted- The extracted structured data as JSON.model- The model used for extraction.usage- API usage information.full_response- The full response from the information extraction API.json_schema- The JSON schema used for extraction (when returning schema).schema_type- The type of schema returned (when generating schema).raw- Raw schema data (when generating schema).
Dependencies
- Upstage Information Extraction API
- An API key credential for authentication
Troubleshooting
- Ensure the binary property name matches the actual binary data property in the input when using binary input type; otherwise, an error 'No binary data found in property' will occur.
- When using image URL input type, ensure the URL is valid and accessible; missing or invalid URLs will cause errors.
- Invalid JSON schema or full response format JSON will cause parsing errors; verify the JSON structure and correct any syntax issues.
- For large documents, use the 'Pages per Chunk' option to improve performance and avoid timeouts or memory issues.
Links
- Upstage Information Extraction API - Official API documentation for the Upstage Information Extraction service.