h2oGPTe icon

h2oGPTe

h2oGPTe is an AI-powered search assistant for your internal teams to answer questions gleaned from large volumes of documents, websites and workplace content.

Actions198

Overview

This node operation creates a job to add files from AWS S3 storage into a specified document collection. It is designed for batch ingestion of documents stored in S3 buckets or directories, enabling automated processing and indexing within the target collection. This is useful for organizations that maintain large volumes of documents in AWS S3 and want to integrate them into their internal knowledge bases or AI-powered search systems.

Typical use cases include:

  • Automatically ingesting new or updated files from S3 into a document management system.
  • Bulk importing datasets or document archives stored in S3 for further analysis or querying.
  • Integrating cloud storage with AI-driven document processing workflows.

For example, a company could schedule this node to create ingestion jobs that pull monthly reports stored in an S3 bucket into their searchable document collection.

Properties

Name Meaning
Collection ID String ID of the collection to add the ingested documents into. This identifies the target collection where files will be added.
URLs The path or list of paths of S3 files or directories to ingest. Specifies which files or folders in S3 should be processed.
Additional Options A set of optional parameters to customize the ingestion behavior:
- Audio Input Language Language of audio files; default is "auto" for automatic detection.
- Chunk By Page Boolean flag indicating whether each page should be treated as a separate chunk. If true, keep_tables_as_one_chunk is ignored.
- Credentials JSON object containing S3 credentials. If omitted, only public buckets are accessible.
- Gen Doc Questions Boolean flag to auto-generate sample questions for each document using a large language model (LLM).
- Gen Doc Summaries Boolean flag to auto-generate document summaries using LLM.
- Handwriting Check Boolean flag to check pages for handwriting and use specialized models if found.
- Ingest Mode Option to select ingestion mode: "standard" (files ingested for retrieval-augmented generation) or "agent_only" (bypasses standard ingestion).
- Keep Tables As One Chunk Boolean flag indicating whether tables identified by the parser should be kept as a single chunk.
- Metadata JSON object with metadata to associate with the ingested documents.
- Ocr Model Method to extract text from images using AI-enabled OCR models. Default is "auto".
- Region AWS region name used for interaction with AWS services.
- Tesseract Lang Language code to use when OCR model is "tesseract".
- Timeout Timeout in seconds for the ingestion job request. Zero means no timeout.

Output

The node outputs the response from the API call that creates the ingestion job. The output JSON typically contains details about the created job, such as its unique identifier, status, and any relevant metadata. This allows users to track the progress or manage the ingestion job after creation.

If the ingestion job involves binary data (e.g., files), the node handles it accordingly, but the primary output is the job creation confirmation and metadata.

Dependencies

  • Requires access to the external API endpoint that manages document collections and ingestion jobs.
  • Needs valid API authentication credentials configured in n8n to authorize requests.
  • For accessing private S3 buckets, appropriate AWS credentials must be provided in the "Credentials" property.
  • AWS region configuration may be necessary depending on the S3 bucket location.

Troubleshooting

  • Authentication Errors: Ensure that the API key credential is correctly configured and has permissions to create ingestion jobs.
  • Access Denied to S3 Buckets: Verify that the provided S3 credentials have sufficient permissions to read the specified files or directories.
  • Timeouts: If the ingestion job creation takes too long, increase the "Timeout" value or check network connectivity.
  • Invalid Paths: Confirm that the "URLs" property contains valid S3 paths. Incorrect or inaccessible paths will cause failures.
  • Unsupported File Types: Some file formats might not be supported by the ingestion process; check documentation for supported types.
  • Handwriting Check Issues: Enabling handwriting check requires specialized models; ensure these are available in your environment.

Links and References


This summary is based on static analysis of the node's properties and routing configuration for the "Document Ingestion" resource and the "Creates a Job to Add Files From the AWS S3 Storage Into a Collection" operation.

Discussion