Mineru Custom Service icon

Mineru Custom Service

Use the Mineru API to customize the parsing of PDF documents, supporting local file path input.

Overview

This node integrates with the Mineru API to parse and analyze documents from a given file URL. It supports multiple document formats including PDF, Word (DOC, DOCX), PowerPoint (PPT, PPTX), and common image formats (PNG, JPG, JPEG). The node downloads the specified file and sends it to the Mineru API server for processing, returning structured parsed data.

The node supports two versions of the Mineru API (v1 and v2), each offering different parsing options and output details. Users can customize parsing methods (automatic, OCR, or text-based), enable or disable features like formula and table recognition, specify page ranges for analysis, and choose which parts of the parsed data to return (e.g., layout info, images, markdown).

Common scenarios:

  • Extracting structured content from PDFs or scanned documents for further automation.
  • Parsing presentations or reports to convert them into machine-readable formats.
  • Using OCR to extract text from images or scanned files.
  • Enabling formula and table recognition in scientific or financial documents.
  • Generating markdown or JSON outputs for documentation or integration with other systems.

Practical example:
A user wants to process a PDF report hosted online, extracting tables and formulas while receiving the results as markdown. They configure the node with API version v2, set the file URL, enable table and formula recognition, and request markdown output. The node downloads the file, sends it to the Mineru API, and returns the parsed markdown content for use in downstream workflows.


Properties

Name Meaning
API Version Select Mineru API version: V1 or V2. Determines available parsing options and output format.
File URL URL address of the file to be parsed. Supports PDF, DOC, DOCX, PPT, PPTX, PNG, JPG, and JPEG formats.
API Server Address Mineru API server base URL (e.g., http://localhost:8000).
Output Directory Directory path where analysis results will be saved (used internally by the API).
Analysis Method (v1 only) Parsing method: Automatic, OCR, or TXT.
Whether To Generate JSON And MD Files (v1 only) Boolean flag to write parsed data to JSON and Markdown files.
Return Layout Information (v1 only) Boolean flag to include parsed PDF layout information in the response.
Return Document Information (v1 only) Boolean flag to include parsed PDF document metadata.
Return Content List (v1 only) Boolean flag to include a list of parsed content elements.
Return Image (v1 only) Boolean flag to include extracted images from the document.
Document Language (v2 only) Language setting for the document: Chinese, English, or Automatic detection.
Backend Engine (v2 only) Backend engine to process the document. Currently only Pipeline is available.
Parsing Method (v2 only) Parsing method: Automatic, OCR, or TXT.
Enable Formula Recognition (v2 only) Boolean flag to enable formula recognition functionality.
Enable Table Recognition (v2 only) Boolean flag to enable table recognition functionality.
Return Markdown (v2 only) Boolean flag to return results in Markdown format.
Return Middle JSON (v2 only) Boolean flag to return intermediate processing JSON data.
Return Model Output (v2 only) Boolean flag to return raw model output data.
Return Content List (v2 only) Boolean flag to include a content list in the response.
Return Image (v2 only) Boolean flag to include extracted images in the response.
Start Page ID (v2 only) Starting page index for analysis (zero-based).
End Page ID (v2 only) Ending page index for analysis.

Output

The node outputs an array of items, each containing a json object with the following structure:

  • apiVersion: The Mineru API version used (v1 or v2).
  • fileUrl: The original file URL provided.
  • fileName: The filename derived from the URL or HTTP headers.
  • fileSize: Size of the downloaded file in bytes.
  • serverUrl: The Mineru API server address used.
  • requestParams: Object containing the parameters sent to the API (reflecting user input).
  • response: Parsed data returned by the Mineru API. This varies depending on API version and options selected; it may include:
    • Parsed text content.
    • Layout information.
    • Document metadata.
    • Extracted images (if requested).
    • Markdown formatted results (v2).
    • Intermediate JSON or raw model output (optional in v2).
  • timestamp: ISO timestamp when the processing occurred.

If the node encounters errors during processing, and "Continue On Fail" is enabled, it outputs an item with an error message and the associated fileUrl.

Binary Data:
The node does not output binary data directly. Extracted images are included within the JSON response from the API if requested.


Dependencies

  • Requires access to the Mineru API server endpoint specified by the user.
  • The node performs HTTP GET requests to download the file from the provided URL.
  • The node performs HTTP POST requests to the Mineru API /file_parse endpoint with multipart form data.
  • No internal credential types are exposed; users must ensure the Mineru API server is accessible and properly configured.
  • Network connectivity and appropriate permissions to download the file URL are required.

Troubleshooting

  • File URL empty or invalid:
    Error: "File URL cannot be empty" or failure downloading the file.
    Resolution: Ensure the file URL is correctly specified and publicly accessible or accessible from the n8n environment.

  • API Server Address empty or unreachable:
    Error: "API server address cannot be empty" or network timeout.
    Resolution: Verify the Mineru API server URL is correct and reachable from n8n.

  • File download failure:
    Error: "Failed to download file: ..."
    Resolution: Check network connectivity, URL correctness, and that the file exists at the URL.

  • Empty file content:
    Error: "File is empty or cannot be downloaded"
    Resolution: Confirm the file at the URL is not empty and accessible.

  • JSON parse errors on API response:
    The node attempts to parse the API response as JSON but falls back gracefully if parsing fails. If unexpected data is returned, verify the Mineru API server status and compatibility.

  • Timeouts:
    Requests have a 1-hour timeout. Large files or slow networks might cause delays. Consider splitting large documents or improving network conditions.


Links and References

(Note: Replace placeholder URLs with actual Mineru API documentation links if available.)

Discussion