LLM文档转换 icon

LLM文档转换

LLM文档处理节点,将文档转换为大模型友好的格式

Overview

The node "LLM文档转换" (LLM Document Conversion) provides functionality to convert various document formats into Markdown text. Specifically, the "HTML转Markdown" operation under the "文件转Markdown" resource converts HTML files into Markdown format. This is useful for scenarios where users want to extract and work with Markdown content from HTML documents, such as preparing documentation, blog posts, or notes in a clean, readable Markdown format.

Practical examples include:

  • Converting saved HTML reports or web pages into Markdown for easier editing.
  • Transforming HTML email content into Markdown for integration with Markdown-based systems.
  • Automating the conversion of HTML export files from other software into Markdown for publishing.

Properties

Name Meaning
文件字段名 The name of the file field containing the input document. Supported formats include pdf, doc, docx, ppt, pptx, xlsx, html, csv, etc. For this operation, it should point to an HTML file.
返回Markdown文本 Boolean option indicating whether to return the Markdown text content directly (true) or only return the URL of the converted document (false).

Output

The node outputs JSON data that includes the converted Markdown content if the "返回Markdown文本" property is enabled. If disabled, the output will contain a URL pointing to the converted Markdown document instead of the raw text.

If binary data is involved (e.g., the converted document file), it would be accessible through the node's binary output, representing the Markdown file content.

Dependencies

  • Requires an API key credential for accessing the LLM document conversion service.
  • The node communicates with an external API endpoint specified by the base URL configured in the credentials.
  • No additional environment variables are explicitly required beyond the API authentication setup.

Troubleshooting

  • Common issues:

    • Incorrect file field name: Ensure the "文件字段名" matches the actual input file field containing the HTML document.
    • Unsupported file format: Although multiple formats are supported, the input must be a valid HTML file for this operation.
    • API authentication errors: Verify that the API key credential is correctly configured and has necessary permissions.
    • Network or API endpoint errors: Check connectivity and the correctness of the base URL in credentials.
  • Error messages:

    • Authentication failures typically indicate invalid or missing API credentials.
    • File processing errors may occur if the input file is corrupted or not properly formatted as HTML.
    • Timeout or network errors suggest connectivity issues with the external API service.

Resolving these usually involves verifying input properties, ensuring correct credentials, and confirming network access.

Links and References

Discussion