h2oGPTe icon

h2oGPTe

h2oGPTe is an AI-powered search assistant for your internal teams to answer questions gleaned from large volumes of documents, websites and workplace content.

Actions198

Overview

The "Ingest Uploaded Document" operation of the Document Ingestion resource allows users to ingest documents that have already been uploaded into a specified collection. This process involves taking one or more uploaded document IDs and adding their content into a target collection for further processing, indexing, or retrieval.

This node is beneficial in scenarios where documents are first uploaded separately (e.g., via an upload interface or API) and then need to be ingested into a system for search, analysis, or AI-powered querying. For example, after uploading PDF files or audio transcripts, you can use this operation to ingest those files into a knowledge base collection to enable semantic search or question answering.

Practical examples:

  • Ingesting scanned contracts or reports into a legal document collection.
  • Adding transcribed audio files into a media archive collection.
  • Importing research papers into a scientific literature database.

Properties

Name Meaning
Upload IDs ID(s) of the uploaded document(s) to ingest. Multiple IDs can be provided as a string.
Collection ID The unique identifier of the collection where the ingested documents will be added.
Additional Options A set of optional parameters to customize ingestion behavior:
- Audio Input Language Language code for audio files; default is "auto" for automatic detection.
- Chunk By Page Boolean flag indicating whether each page should be treated as a separate chunk. If true, keep_tables_as_one_chunk is ignored.
- Gen Doc Questions Boolean flag to auto-generate sample questions for each document using a large language model (LLM).
- Gen Doc Summaries Boolean flag to auto-generate summaries for each document using LLM.
- Handwriting Check Boolean flag to check pages for handwriting and use specialized models if found.
- Ingest Mode Mode of ingestion: "standard" (default) for regular ingestion suitable for retrieval-augmented generation (RAG), or "agent_only" which bypasses standard ingestion.
- Keep Tables As One Chunk Boolean flag indicating whether tables identified by the parser should be kept as a single chunk.
- Metadata JSON object containing metadata to associate with the document.
- Ocr Model Method to extract text from images using AI-enabled OCR models; default is "auto".
- Permissions String listing usernames who have permissions to access the document.
- Restricted Boolean flag indicating if the document should be restricted to certain users only.
- Tesseract Lang Language code used when OCR model is set to "tesseract".
- Timeout Timeout in seconds for the ingestion request; 0 means no timeout.

Output

The output of this operation is the full HTTP response from the ingestion API endpoint. The main data returned is typically a JSON object representing the status or result of the ingestion job, such as confirmation of successful ingestion or details about the ingested documents.

If the ingestion involves binary data (e.g., files), it is handled during upload prior to this step, so this operation focuses on triggering ingestion rather than returning binary content.

Dependencies

  • Requires an API key credential for authentication with the backend service.
  • The node sends a POST request to an endpoint structured as /uploads/{upload_ids}/ingest with query parameters and body as per the additional options.
  • Proper configuration of the API URL and credentials in n8n is necessary.
  • The ingestion backend must support the specified ingestion modes and options.

Troubleshooting

  • Invalid Upload IDs: If the provided upload IDs do not exist or are incorrect, the API may return an error. Verify that the upload IDs are correct and correspond to previously uploaded documents.
  • Collection Not Found: Ensure the collection ID exists and is accessible with the provided credentials.
  • Timeouts: If ingestion takes too long, consider increasing the timeout property or checking backend performance.
  • Permission Errors: If the user lacks permission to add documents to the collection or access the uploaded files, errors will occur. Confirm user permissions and roles.
  • Incorrect Option Values: Passing invalid values for options like ingest_mode, ocr_model, or language codes may cause failures. Use documented valid values.
  • Handwriting Check Failures: Enabling handwriting check requires specialized models; if these are unavailable, ingestion might fail or fallback.

Links and References

  • No direct external links are provided in the source code.
  • Users should refer to the backend API documentation for detailed information on ingestion options and supported features.
  • For OCR and language options, consult relevant AI model documentation or service provider guides.

Discussion