Overview
This node, named "LlamaParse," is designed to parse PDF files and extract their content in markdown format. It is particularly useful when you need to convert PDF documents into a more editable or web-friendly markdown text for further processing, analysis, or display.
Common scenarios include:
- Extracting textual content from PDF reports or manuals to integrate into documentation systems.
- Converting PDFs into markdown for easier editing or publishing on markdown-supported platforms.
- Automating the extraction of data from PDF files in workflows that require text manipulation or transformation.
For example, you could use this node to parse a PDF invoice and then process the extracted markdown text to generate summaries or feed it into other tools.
Properties
| Name | Meaning |
|---|---|
| File Path | The full path to the PDF file you want to parse. Example: /User/user/Desktop/file.pdf |
Output
The node outputs an array of JSON objects, each representing a parsed segment of the PDF content converted into markdown format. Each object corresponds to a part of the document as processed by the underlying parser.
The output field json contains these markdown representations, allowing downstream nodes to consume or manipulate the extracted text easily.
No binary data output is produced by this node.
Dependencies
- Requires an external API key credential for authentication with the Llama Cloud service.
- Uses the
llamaindexlibrary internally to perform the parsing operation. - The node expects access to the local filesystem path specified in the "File Path" property to read the PDF file.
Troubleshooting
- File Not Found or Access Denied: Ensure the file path provided is correct and accessible by the n8n instance. Permissions issues can prevent reading the file.
- Invalid API Key or Authentication Failure: Verify that the API key credential for the external parsing service is correctly configured and valid.
- Parsing Errors: If the PDF is corrupted or uses unsupported formats, the parser may fail or return incomplete results.
- Empty Output: Confirm that the PDF actually contains extractable text; scanned images without OCR will not yield markdown content.
Links and References
- Markdown Format
- Documentation for the external parsing service (Llama Cloud) should be consulted for API key setup and usage details.