Overview
This node, named "LlamaParse," is designed to parse PDF files and extract their content in markdown format. It is particularly useful when you need to convert PDF documents into a more editable or web-friendly markdown text for further processing, analysis, or display.
Common scenarios include:
- Extracting textual content from reports, invoices, or manuals stored as PDFs.
- Converting PDF documentation into markdown for integration with static site generators or content management systems.
- Automating the ingestion of PDF data into workflows that require markdown input.
For example, you could use this node to parse a PDF user manual and then feed the markdown output into a knowledge base system.
Properties
| Name | Meaning |
|---|---|
| File Path | The full path to the PDF file you want to parse. Example: /User/user/Desktop/file.pdf |
Output
The node outputs an array of JSON objects, each representing a parsed segment of the PDF content converted into markdown format. Each object contains the markdown representation of a part of the PDF document.
No binary data output is produced by this node; it focuses solely on extracting and returning textual markdown content.
Dependencies
- Requires an API key credential for an external service (referred to generically here as "an API key credential") that provides PDF parsing capabilities.
- Uses the
llamaindexlibrary internally to perform the parsing operation. - The node expects the PDF file to be accessible at the specified file path on the system where n8n is running.
Troubleshooting
- File Not Found: If the specified file path is incorrect or inaccessible, the node will fail to load the PDF. Ensure the path is correct and the file permissions allow reading.
- Invalid API Key: If the provided API key credential is missing or invalid, the parsing request will fail. Verify that the API key is correctly configured in n8n credentials.
- Parsing Errors: If the PDF is corrupted or uses unsupported features, the parser might return incomplete or empty markdown content.
- Index Out of Range Error: There is a potential bug in the code where the loop pushes
r[e].toJSON()instead ofr[s].toJSON(). This could cause runtime errors or incorrect output if multiple segments are returned. Users encountering unexpected results should check for updates or report this issue.
Links and References
- Markdown
- PDF Parsing Concepts
- Documentation for the external PDF parsing service (refer to your API provider's docs)