Overview
This node, named LlamaParse, is designed to parse PDF files and extract their content in markdown format. It is particularly useful when you need to convert PDF documents into a more editable or processable text format such as markdown for further automation, analysis, or integration with other tools.
Common scenarios:
- Extracting text content from PDF reports for automated processing.
- Converting PDF manuals or documentation into markdown for publishing or editing.
- Automating data extraction workflows where PDFs are input files.
Practical example:
You have a folder of PDF invoices and want to extract the textual content to feed into a database or generate summaries. This node can parse each PDF file and output its content as markdown, which can then be processed downstream.
Properties
| Name | Meaning |
|---|---|
| File Path | The full path to the PDF file you want to parse. Example: /User/user/Desktop/file.pdf |
Output
The node outputs an array of JSON objects, each representing parsed content from the PDF file in markdown format. Each item corresponds to a segment or page of the PDF converted into markdown text.
- The
jsonoutput field contains these markdown representations. - No binary data output is produced by this node.
Dependencies
- Requires an external service accessible via an API key credential (referred to generically here as "an API key credential").
- Uses the
llamaindexlibrary internally to perform the parsing. - The node expects the user to provide a valid API key credential for authentication with the external parsing service.
- The file must be accessible at the specified path on the system where n8n runs.
Troubleshooting
- File not found or inaccessible: Ensure the file path is correct and that the n8n process has permission to read the file.
- Invalid or missing API key: The node requires a valid API key credential. Verify that the credential is set up correctly in n8n.
- Parsing errors: If the PDF is corrupted or in an unsupported format, the parsing may fail. Try opening the PDF manually to confirm it is valid.
- Empty output: If the PDF contains no extractable text or is image-based only, the output may be empty or minimal.
Links and References
- Markdown format
- Documentation for the external parsing service (not provided here; refer to your API provider's docs)
- n8n documentation on working with credentials