Blab Information Extract

Extract structured data from documents/images using Upstage Information Extraction

Actions2

- Extract Information
- Generate Schema

Overview

This node extracts structured data from documents or images using the Upstage Information Extraction API. It supports input as either binary data from a previous node or an image URL. Users can provide a JSON schema or a full response format to guide the extraction process. The node is useful for automating data extraction from various document types, such as invoices, receipts, or forms, enabling integration of extracted data into workflows.

Use Case Examples

Extract structured data from scanned invoices by providing a JSON schema to parse key fields like invoice number, date, and total amount.
Use an image URL of a receipt to extract itemized purchase information using the recommended information-extract model.
Generate a JSON schema from a sample document image to use in subsequent extraction operations.

Properties

Name	Meaning
Input Type	Specifies whether the input is binary data from a previous node or an image URL.
Binary Property	Name of the binary property containing the file, used when input type is binary.
Image URL	URL of the image to process, used when input type is URL.
Model	The model to use for information extraction, currently only 'information-extract' is supported.
Schema Input Type	Determines how the JSON schema is provided: either as schema only or full response format.
Schema Name	Name for the JSON schema in the response format, used when schema input type is 'schema'.
JSON Schema (object)	The target JSON schema object for extraction, used when schema input type is 'schema'.
Full Response Format JSON	Complete response format JSON including type, json_schema, name, and schema, used when schema input type is 'full'.
Pages per Chunk	Number of pages to chunk for performance optimization, recommended for documents with 30+ pages. 0 disables chunking.
Return	Specifies what to return: extracted JSON only, schema JSON only, or full response.

Output

JSON

extracted - The extracted structured data as JSON.
model - The model used for extraction.
usage - API usage information.
full_response - The full response from the information extraction API.
json_schema - The JSON schema used for extraction (when returning schema).
schema_type - The type of schema returned (when generating schema).
raw - Raw schema data (when generating schema).

Dependencies

Upstage Information Extraction API
An API key credential for authentication

Troubleshooting

Ensure the binary property name matches the actual binary data property in the input when using binary input type; otherwise, an error 'No binary data found in property' will occur.
When using image URL input type, ensure the URL is valid and accessible; missing or invalid URLs will cause errors.
Invalid JSON schema or full response format JSON will cause parsing errors; verify the JSON structure and correct any syntax issues.
For large documents, use the 'Pages per Chunk' option to improve performance and avoid timeouts or memory issues.

Blab Information Extract

Actions2

Overview

Use Case Examples

Properties

Output

JSON

Dependencies

Troubleshooting

Links

Discussion

Blab Information ExtractInstall

Actions2

Overview

Use Case Examples

Properties

Output

JSON

Dependencies

Troubleshooting

Links

Discussion

Blab Information Extract