Package Information
Downloads: 6 weekly / 33 monthly
Latest Version: 1.0.1
Author: Arsalan Mughal
Documentation
n8n-nodes-pdf-page-extract-next
A community n8n node that extracts text and images from a specific page of a PDF file.
Installation
- Open your n8n instance
- Go to Settings → Community Nodes
- Click Install
- Enter the package name:
n8n-nodes-pdf-page-extract-next - Click Install and wait for the process to complete
- Restart n8n when prompted
Once installed, the PDF Page Extract Next node will appear in your node panel.
Operations
Extract Pages With Images
Extracts text and all embedded images from a single page of a PDF.
| Parameter | Type | Default | Description |
|---|---|---|---|
| Binary Property | string | data |
Name of the binary property on the input item that holds the PDF file |
| Page Number | number | 1 |
The page number to extract content from |
| Image Timeout (ms) | number | 5000 |
How long to wait for image extraction per page before skipping |
Output
JSON
{
"pageNumber": 3,
"text": "Extracted text content from page 3..."
}
Binary
Every image found on the page is output as a separate binary property named page_{n}_image_{k}:
| Binary Key | Description |
|---|---|
page_3_image_1 |
First image on page 3 |
page_3_image_2 |
Second image on page 3 |
If no images are found on the page, the binary output will be empty.
Usage
Extract a single page
- Add a node that provides a PDF as binary data (e.g. Read/Write Files from Disk, HTTP Request, Google Drive)
- Connect it to PDF Page Extract Next
- Set Binary Property to the name of the binary field holding your PDF (default:
data) - Set Page Number to the page you want to extract
- Run the node — output will contain the page text in JSON and images as binary properties
Extract all pages using a loop
To process every page of a PDF:
- Load the PDF using any file node
- Add a Code node to generate page numbers (replace
20with your PDF's total pages):
const totalPages = 20;
return Array.from({ length: totalPages }, (_, i) => ({ json: { page: i + 1 } }));
- Connect the Code node to PDF Page Extract Next
- Set Page Number to
{{ $json.page }} - Each execution returns the text and images for that page independently
Notes
- Pages with no images will still return text successfully
- If image extraction exceeds the timeout, text is still returned and the node continues without error
- All images are output in
image/pngformat - Enable Continue On Fail in node settings to handle errors gracefully without stopping the workflow
Author
Arsalan Mughal
- GitHub: @arslanmughal99
- Repository: n8n-nodes-pdf-page-extract-next