Read PDF With Password

Reads a PDF and extracts its content

Overview

The "Read PDF With Password" node reads a PDF file from an input binary property, optionally using a password if the PDF is protected, and extracts its content (such as text and metadata). This node is useful in workflows where you need to process or analyze the contents of PDF documents, including those that are password-protected.

Practical examples:

  • Extracting text from uploaded PDF reports for further processing.
  • Automating data extraction from password-protected invoices or statements.
  • Integrating with document management systems to index PDF content.

Properties

Name Type Meaning
Binary Property String Name of the binary property from which to read the PDF file.
Password String PDF password (if the PDF is password-protected; leave blank otherwise).

Output

  • json: Contains the extracted content from the PDF. The structure typically includes:
    • text: The full text extracted from the PDF.
    • Other fields may include metadata such as number of pages, info, etc., depending on the PDF and the underlying library's output.
  • binary: The original binary data is preserved and passed through.

If an error occurs and "Continue On Fail" is enabled, the output will contain:

{
  "error": "Error message here"
}

with a reference to the failed item.

Dependencies

  • External Library: Uses the pdf-parse npm package for PDF parsing.
  • No external API keys or services required.

Troubleshooting

Common issues:

  • Incorrect Binary Property Name: If the specified binary property does not exist, the node will throw an error.
  • Wrong Password: If the provided password is incorrect for a protected PDF, extraction will fail with an error message.
  • Corrupted or Unsupported PDF: If the PDF file is corrupted or uses unsupported features, extraction may fail.

Error messages:

  • "No binary data property '...' exists on item!" – Check that the binary property name matches the input.
  • "Incorrect Password" or similar – Ensure the correct password is provided for protected PDFs.
  • "Failed to parse PDF" – The file may be corrupted or not a valid PDF.

Links and References

Discussion