LLM文档转换 icon

LLM文档转换

LLM文档处理节点,将文档转换为大模型友好的格式

Overview

This node, named "LLM文档转换" (LLM Document Conversion), is designed to convert various document formats into Markdown text using large language model (LLM) processing. It supports multiple file types such as PDF, Word documents, Excel spreadsheets, PowerPoint presentations, CSV files, HTML, XML, and JSON. The node processes the input document and transforms it into a Markdown format that is friendly for LLM consumption or further text-based processing.

A common use case is automating the conversion of complex documents into clean Markdown for documentation, content management, or integration with systems that prefer Markdown input. For example, converting an XML file into Markdown allows users to easily read and edit structured data in a human-friendly format.

Specifically, the "XML转Markdown" operation under the "文件转Markdown" resource converts XML files into Markdown text.

Properties

Name Meaning
文件字段名 The name of the input file field containing the document to convert. Supported formats include pdf, doc, docx, ppt, pptx, xlsx, html, csv, xml, json, etc. Default is "data".
返回Markdown文本 Boolean option indicating whether to return the converted Markdown text content. If disabled, only the URL of the converted document is returned. Default is true.

Output

The node outputs JSON data containing the result of the conversion. When the "返回Markdown文本" property is enabled, the output includes the Markdown text representation of the input document. If this option is disabled, the output provides a URL pointing to the converted document instead.

If the node handles binary data (e.g., the original file or converted document), it will be included accordingly, but the main focus is on the Markdown textual content.

Dependencies

  • Requires an API key credential for accessing the external LLM document conversion service.
  • The node uses a base URL configured via credentials to send requests to the conversion API.
  • No other external dependencies are indicated in the static code.

Troubleshooting

  • Common issues:

    • Incorrect or missing file field name may cause the node to fail to locate the input document.
    • Unsupported file formats might not convert properly.
    • API authentication errors if the required API key credential is not set or invalid.
    • Network or service availability issues when calling the external conversion API.
  • Error messages:

    • Authentication failures typically indicate missing or incorrect API credentials.
    • File format errors suggest the input file is not supported or corrupted.
    • Timeout or network errors imply connectivity problems with the external service.

To resolve these, verify the input file field name matches the actual input, ensure the file format is supported, confirm API credentials are correctly configured, and check network connectivity.

Links and References

Discussion