Read PDF Form data

Reads a PDF Form and extracts its content

Overview

This n8n node, Read PDF Form data, reads a PDF file from a specified binary property and extracts its form data and metadata. It is useful for workflows that need to process PDF forms—such as extracting filled-in values from digital forms, automating document processing, or archiving form responses.

Practical examples:

  • Extracting user-submitted data from PDF application forms.
  • Automating the collection of survey results stored in PDF format.
  • Integrating with document management systems to index PDF form content.

Properties

Name Type Meaning
Binary Property String Name of the binary property from which to read the PDF file. This should match the property name where the PDF file is attached in the input item.

Output

The node outputs an object in the json field with the following structure:

{
  "numpages": <number>,         // Total number of pages in the PDF
  "numrender": <number>,        // Number of pages rendered (always 0 in this implementation)
  "info": { ... },              // General PDF information (e.g., title, author, etc.)
  "metadata": { ... },          // Additional metadata if available
  "text": "<string>",           // Concatenated text content from the PDF (empty in this implementation)
  "version": "<string>",        // Version of the pdfjs library used
  "formData": { ... }           // Extracted form fields and their values, if present
}
  • If an error occurs and "Continue On Fail" is enabled, the output will be:
    {
      "error": "<error message>"
    }
    

Binary Data:
The node passes through the original binary data unchanged in the output's binary property.

Dependencies

  • External Library: Uses pdfjs-dist for PDF parsing.
  • No external API keys required.
  • The PDF file must be provided as a binary property on the input item.

Troubleshooting

Common Issues:

  • Missing Binary Data: If the specified binary property does not exist, the node will throw an error indicating it cannot find the PDF file.
  • Corrupted or Unsupported PDF: If the PDF cannot be parsed, an error will be thrown.
  • Form Data Not Present: If the PDF does not contain form fields, the formData object will be empty.

Error Messages:

  • "Cannot find binary data": Ensure the input item contains the correct binary property.
  • "Failed to parse PDF": Check that the uploaded file is a valid PDF and not corrupted.

How to resolve:

  • Double-check the property name in "Binary Property".
  • Verify the input file is a valid, non-corrupted PDF.
  • Make sure the PDF actually contains form fields if you expect formData to be populated.

Links and References

Discussion