LLM文档转换 icon

LLM文档转换

LLM文档处理节点,将文档转换为大模型友好的格式

Overview

The node "LLM文档转换" (LLM Document Conversion) provides functionality to convert various document formats into Markdown text. Specifically, the "PDF转Markdown" operation under the "文件转Markdown" resource converts PDF files into Markdown format. This is useful for scenarios where users want to extract and manipulate content from PDFs in a lightweight, editable Markdown format, such as preparing documentation, notes, or content for static site generators.

Practical examples include:

  • Converting research papers or reports in PDF into Markdown for easier editing.
  • Extracting meeting notes or presentations saved as PDFs into Markdown for integration with note-taking apps.
  • Automating content migration workflows where source documents are PDFs but target systems require Markdown input.

Properties

Name Meaning
文件字段名 The name of the input file field containing the document to convert. Supported formats include pdf, doc, docx, ppt, pptx, xlsx, html, csv, etc.
返回Markdown文本 Boolean option indicating whether to return the converted Markdown text content. If disabled, only the URL of the converted document is returned.

Output

The node outputs JSON data containing the result of the conversion. When "返回Markdown文本" is enabled, the output includes the Markdown text extracted from the PDF file. If this option is disabled, the output contains a URL pointing to the converted Markdown document instead.

If the node supports binary data output (not explicitly shown here), it would typically represent the converted document file in binary form.

Dependencies

  • Requires an API key credential for accessing the LLM document processing service.
  • The node configuration must include the base URL of the LLM document conversion API.
  • The node depends on the bundled module ToMarkdownDescription which defines operations and fields related to Markdown conversion.

Troubleshooting

  • Common issues:

    • Incorrect or missing file field name may cause the node to fail to locate the input document.
    • Unsupported file formats might not convert properly; ensure the input file is one of the supported types.
    • API authentication errors if the required API key credential is not configured or invalid.
    • Network or connectivity issues with the external LLM document conversion service.
  • Error messages:

    • Authentication failures usually indicate missing or incorrect API credentials.
    • File format errors suggest the input file is not supported or corrupted.
    • Timeout or HTTP errors imply network problems or service unavailability.

To resolve these, verify the input file field name, confirm the file format, check API credentials, and ensure network connectivity.

Links and References

  • No direct external links provided in the source code.
  • For more information, consult the documentation of the LLM document conversion API used by this node.

Discussion