LlamaParse

Parse PDF files and get their content in markdown!

Overview

This node, named "LlamaParse," is designed to parse PDF files and extract their content in markdown format. It is particularly useful when you need to convert PDF documents into a more editable or web-friendly markdown text for further processing, analysis, or display.

Common scenarios include:

Extracting textual content from reports, invoices, or manuals stored as PDFs.
Converting PDF documentation into markdown for integration with static site generators or content management systems.
Automating the ingestion of PDF data into workflows that require markdown input.

For example, you could use this node to parse a PDF user manual and then feed the markdown output into a knowledge base system.

Properties

Name	Meaning
File Path	The full path to the PDF file you want to parse. Example: `/User/user/Desktop/file.pdf`

Output

The node outputs an array of JSON objects, each representing a parsed segment of the PDF content converted into markdown format. Each object contains the markdown representation of a part of the PDF document.

No binary data output is produced by this node; it focuses solely on extracting and returning textual markdown content.

Dependencies

Requires an API key credential for an external service (referred to generically here as "an API key credential") that provides PDF parsing capabilities.
Uses the llamaindex library internally to perform the parsing operation.
The node expects the PDF file to be accessible at the specified file path on the system where n8n is running.

Troubleshooting

File Not Found: If the specified file path is incorrect or inaccessible, the node will fail to load the PDF. Ensure the path is correct and the file permissions allow reading.
Invalid API Key: If the provided API key credential is missing or invalid, the parsing request will fail. Verify that the API key is correctly configured in n8n credentials.
Parsing Errors: If the PDF is corrupted or uses unsupported features, the parser might return incomplete or empty markdown content.
Index Out of Range Error: There is a potential bug in the code where the loop pushes r[e].toJSON() instead of r[s].toJSON(). This could cause runtime errors or incorrect output if multiple segments are returned. Users encountering unexpected results should check for updates or report this issue.

Links and References

Markdown
PDF Parsing Concepts
Documentation for the external PDF parsing service (refer to your API provider's docs)

LlamaParseInstall