pdf-page-extract-next

n8n node for extracting text from PDF pages along with images

Package Information

Downloads: 6 weekly / 33 monthly

Latest Version: 1.0.1

Author: Arsalan Mughal

Available Nodes

PDF Page Extract Next

Extract text and images from a specific page of a PDF

Documentation

n8n-nodes-pdf-page-extract-next

A community n8n node that extracts text and images from a specific page of a PDF file.

Installation

Open your n8n instance
Go to Settings → Community Nodes
Click Install
Enter the package name: n8n-nodes-pdf-page-extract-next
Click Install and wait for the process to complete
Restart n8n when prompted

Once installed, the PDF Page Extract Next node will appear in your node panel.

Operations

Extract Pages With Images

Extracts text and all embedded images from a single page of a PDF.

Parameter	Type	Default	Description
Binary Property	string	`data`	Name of the binary property on the input item that holds the PDF file
Page Number	number	`1`	The page number to extract content from
Image Timeout (ms)	number	`5000`	How long to wait for image extraction per page before skipping

Output

JSON

{
  "pageNumber": 3,
  "text": "Extracted text content from page 3..."
}

Binary

Every image found on the page is output as a separate binary property named page_{n}_image_{k}:

Binary Key	Description
`page_3_image_1`	First image on page 3
`page_3_image_2`	Second image on page 3

If no images are found on the page, the binary output will be empty.

Usage

Extract a single page

Add a node that provides a PDF as binary data (e.g. Read/Write Files from Disk, HTTP Request, Google Drive)
Connect it to PDF Page Extract Next
Set Binary Property to the name of the binary field holding your PDF (default: data)
Set Page Number to the page you want to extract
Run the node — output will contain the page text in JSON and images as binary properties

Extract all pages using a loop

To process every page of a PDF:

Load the PDF using any file node
Add a Code node to generate page numbers (replace 20 with your PDF's total pages):

const totalPages = 20;
return Array.from({ length: totalPages }, (_, i) => ({ json: { page: i + 1 } }));

Connect the Code node to PDF Page Extract Next
Set Page Number to {{ $json.page }}
Each execution returns the text and images for that page independently

Notes

Pages with no images will still return text successfully
If image extraction exceeds the timeout, text is still returned and the node continues without error
All images are output in image/png format
Enable Continue On Fail in node settings to handle errors gracefully without stopping the workflow

Author

Arsalan Mughal

GitHub: @arslanmughal99
Repository: n8n-nodes-pdf-page-extract-next

License

MIT

pdf-page-extract-nextInstall