PDF.co Api icon

PDF.co Api

Generate PDF, extract data from PDF, split PDF, merge PDF, convert PDF. Fill PDF forms, add text and images to pdf and much more with pdf.co!

Overview

This node enables searching for specific text within a PDF document accessible via a URL. It is useful when you need to extract or locate information inside PDFs without manually opening them, such as scanning contracts for keywords, verifying the presence of certain terms in reports, or automating data extraction workflows.

For example, you can provide a URL to a PDF invoice and search for the company name or invoice number. The node supports advanced options like using regular expressions for complex search patterns, restricting the search to specific pages, and handling password-protected PDFs.

Properties

Name Meaning
PDF URL The URL of the PDF file to search.
Search Query The text string or pattern you want to find within the PDF document.
Use Regular Expressions Whether to interpret the search query as a regular expression for more flexible and complex matching.
Pages Comma-separated list of page numbers to limit the search to specific pages. Leave empty to search all pages.
File Name (Advanced) The desired name for the output file generated by the search operation.
Webhook URL (Advanced) A callback URL or webhook endpoint where the output data will be sent asynchronously.
Output Links Expiration (In Minutes) (Advanced) Duration in minutes before the output link expires. Defaults to 60 minutes.
Inline (Advanced) Whether to return the output directly in the response (true) or only provide a link to download it (false).
Word Matching Mode (Advanced) Defines how words are matched: Smart Match (default, intelligent matching), Exact Match (strict matching), or None (no special word matching).
Password (Advanced) Password for accessing password-protected PDF files.
HTTP Username (Advanced) Username for HTTP authentication if required to access the PDF URL.
HTTP Password (Advanced) Password for HTTP authentication if required to access the PDF URL.
Custom Profiles (Advanced) JSON string to specify custom API call options or profiles for fine-tuning behavior. See the external API documentation for available profile settings.

Output

The node outputs a JSON object containing the results of the search operation. This typically includes details about the found matches such as page numbers, positions, and the matched text snippets. If configured to return inline, the output data is included directly; otherwise, a downloadable link to the result file is provided.

If binary data is involved (e.g., an output PDF with highlights), it would be returned accordingly, but this node primarily focuses on JSON search results.

Dependencies

  • Requires access to the PDF file via a publicly accessible URL or one accessible with provided HTTP credentials.
  • Uses an external PDF processing API service to perform the search operation.
  • May require an API key credential configured in n8n to authenticate requests to the external PDF service.
  • Optional webhook URL support for asynchronous callbacks.

Troubleshooting

  • Common issues:

    • Invalid or inaccessible PDF URL: Ensure the URL is correct and reachable from the n8n environment.
    • Incorrect HTTP credentials: Verify username and password if the PDF URL requires authentication.
    • Password-protected PDFs: Provide the correct password to access encrypted documents.
    • Malformed regular expressions: When using regex search, ensure the pattern syntax is valid.
    • Page numbers format: Use comma-separated integers without spaces for the pages property.
  • Error messages:

    • "Failed to fetch PDF": Check network connectivity and URL correctness.
    • "Authentication failed": Verify HTTP username and password.
    • "Invalid password for PDF": Confirm the PDF password is correct.
    • "Regex pattern error": Review and correct the regular expression syntax.

Resolving these usually involves verifying input parameters and ensuring proper access rights.

Links and References

Discussion