h2oGPTe icon

h2oGPTe

h2oGPTe is an AI-powered search assistant for your internal teams to answer questions gleaned from large volumes of documents, websites and workplace content.

Actions198

Overview

This node operation creates a job to process a document by generating a summary or performing extraction/transformation tasks on the document content. It is useful in scenarios where you want to automate the summarization or analysis of large documents, such as research papers, reports, or manuals, using AI-powered language models. For example, you can create a job that summarizes a lengthy PDF report into concise key points or extracts structured data from a document for further processing.

Properties

Name Meaning
Document ID The string identifier of the document to be processed. This is required to specify which document the job will operate on.
Additional Options A collection of optional parameters to customize the processing job:
- Guardrails Settings JSON object specifying guardrails or PII detection settings to control sensitive information handling during processing.
- Image Batch Final Prompt A prompt string used to reduce all answers for each image batch when using vision models.
- Image Batch Image Prompt A prompt string applied to each image batch for vision models.
- Keep Intermediate Results Boolean flag indicating whether to keep intermediate results during processing. If false, further LLM calls are applied until one global summary is produced.
- Llm The name of the Large Language Model (LLM) to use for processing.
- Llm Args JSON map of arguments sent to the LLM with the query, e.g., temperature setting to modulate randomness.
- Max Num Chunks Maximum number of chunks from the document to send to the summarizer. Zero means no limit.
- Meta Data To Include JSON map with flags indicating which pieces of document metadata to include as context for the summarization or extraction.
- Pages JSON list specifying particular pages (1-based indexing) of the document to use for processing.
- Pre Prompt Summary A prompt string placed before each large piece of text to summarize.
- Prompt Summary A prompt string placed after each large piece of text to summarize.
- Sampling Strategy Strategy for sampling chunks if the document has more chunks than max_num_chunks. Options include "auto", "uniform", "first", "first+last", etc.
- Schema Optional JSON schema to guide JSON generation during processing.
- Summary ID Identifier requested for the output document summary.
- System Prompt System prompt string providing overall context to the model during processing.
- Timeout Time in seconds to allow the request to run. Default is 86400 seconds (24 hours).

Output

The node outputs the full response from the API call that creates the document processing job. The main output field is json, which contains details about the created job, including its unique job ID and status. This allows downstream nodes or workflows to track the job progress or retrieve results once processing completes.

If the processing involves images or other binary data, the node may handle those accordingly, but this operation primarily focuses on creating the job rather than returning processed content directly.

Dependencies

  • Requires an API key credential for authentication with the external service.
  • The base URL for API requests is configured via credentials.
  • The node depends on the external document processing API endpoint /documents/process_job to create the processing job.

Troubleshooting

  • Missing Document ID: The operation requires a valid document ID. Ensure the document exists and the ID is correctly provided.
  • Timeouts: Long-running jobs may exceed default timeouts. Adjust the timeout property as needed.
  • Invalid JSON in Additional Options: Properties like guardrails_settings, llm_args, meta_data_to_include, pages, and schema expect valid JSON. Invalid JSON will cause errors.
  • Model Selection Issues: Specifying an unsupported or incorrect LLM name in llm may result in errors. Use valid model names supported by the API.
  • Permission Errors: Ensure the API key has sufficient permissions to access and process the specified document.

Links and References

  • Refer to the external API documentation for /documents/process_job for detailed information on job creation and parameters.
  • Consult the AI model provider's documentation for supported LLM names and argument options.
  • Review best practices for document chunking and summarization prompts to optimize results.

Discussion