Package Information
Documentation
n8n-nodes-mineru

📖 Introduction
n8n-nodes-mineru is a powerful n8n community node package that integrates the MinerU document parsing API, providing you with free and comprehensive document parsing capabilities. It supports intelligent parsing of various formats including PDF, Word, PowerPoint, images, and can automatically recognize text, tables, formulas, and image content.
✨ Key Features
- 🚀 Multi-format Support: Supports PDF, DOC, DOCX, PPT, PPTX, PNG, JPG, JPEG and other formats
- 🧠 Intelligent Recognition: Automatically recognizes text, tables, formulas, and images in documents
- 🌐 Dual Service Modes: Supports both online API service and local self-deployed service
- 📊 Multiple Output Formats: Supports Markdown, JSON, DOCX, HTML, LaTeX and other output formats
- 🔧 Flexible Configuration: Provides rich parameter configuration options to meet different scenario requirements
- 🌍 Multi-language Support: Supports Chinese, English, and automatic language detection
📦 Included Nodes
1. MinerU Node
- Function: Uses MinerU online API service to parse documents
- Features: Automatically creates tasks and waits for results, returns parsed ZIP files
- Use Case: Users who need to use the official API service
2. MinerU Custom Service Node
- Function: Connects to self-deployed MinerU API server
- Features: Supports local file upload with more custom configuration options
- Use Case: Users with self-deployment needs or requiring more control
🛠️ Installation
Method 1: Install via n8n Community Nodes
- Open n8n interface
- Go to Settings > Community Nodes
- Click Install Community Node
- Enter package name:
n8n-nodes-mineru - Click Install
Method 2: Install via npm
# Execute in n8n root directory
npm install n8n-nodes-mineru
Method 3: Manual Installation
# Clone repository
git clone https://github.com/opendatalab/awsome-mineru.git
cd awsome-mineru/n8n-nodes-mineru
# Install dependencies
npm install
# Build project
npm run build
# Link to n8n (for development environment)
npm link
🔑 Credential Configuration
MinerU API Credentials
- Create new credentials in n8n
- Select MinerU API type
- Enter your API Token
- Save credentials
Get API Token:
- Visit MinerU Official Website
- Register an account and obtain API Token
📋 Usage Guide
MinerU Node Usage
- Add Node: Add "MinerU" node to your workflow
- Configure Credentials: Select the created MinerU API credentials
- Set Parameters:
- Document URL: Link to the document to be parsed (required)
- Enable OCR: Whether to enable image text recognition
- Enable Formula Recognition: Whether to recognize mathematical formulas
- Enable Table Recognition: Whether to recognize table structures
- Document Language: Select the main language of the document
- Extra Export Format: Select additional output formats needed
- Model Version: Select the MinerU model version to use
- Execute Node: The node will automatically create parsing task and wait for completion
- Get Results: Returns ZIP file containing all results after parsing completion
MinerU Custom Service Node Usage
- Deploy Service: First need to deploy MinerU API server
- Add Node: Add "MinerU Custom Service" node to your workflow
- Configure Parameters:
- API Version: Select V1 or V2
- File URL: Link to the document to be parsed
- API Server Address: Your MinerU server address
- Output Directory: Output directory for parsing results
- Configure corresponding parameters based on selected API version
- Execute Node: Directly call your server for parsing
🔧 Parameter Description
Common Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
| Document URL | String | - | URL address of the document to be parsed |
| Enable OCR | Boolean | false | Whether to enable optical character recognition |
| Enable Formula Recognition | Boolean | true | Whether to recognize mathematical formulas |
| Enable Table Recognition | Boolean | true | Whether to recognize table structures |
| Document Language | Option | Chinese | Main language of the document |
MinerU Node Specific Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
| Data ID | String | - | Optional data identifier |
| Page Range | String | - | Specify the page range to parse |
| Extra Export Format | Multi-select | [] | Additional output formats besides default |
| Polling Interval | Number | 5 | Interval time to check task status (seconds) |
| Maximum Wait Time | Number | 10 | Maximum time to wait for task completion (minutes) |
Custom Service Node Specific Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
| API Server Address | String | http://localhost:8000 | MinerU server address |
| Output Directory | String | ./output | Output directory for parsing results |
| Backend Engine | Option | pipeline | Processing engine type |
| Return Markdown | Boolean | true | Whether to return Markdown format results |
🌟 Usage Examples
Example 1: Parse PDF Document and Extract Text
{
"nodes": [
{
"name": "MinerU",
"type": "n8n-nodes-mineru.mineru",
"parameters": {
"url": "https://example.com/document.pdf",
"isOcr": true,
"enableFormula": true,
"enableTable": true,
"language": "ch",
"extraFormats": ["docx", "html"]
}
}
]
}
Example 2: Parse Multiple Format Documents Using Custom Service
{
"nodes": [
{
"name": "MinerU Custom Service",
"type": "n8n-nodes-mineru.mineruCustom",
"parameters": {
"apiVersion": "v2",
"fileUrl": "https://example.com/presentation.pptx",
"serverUrl": "http://your-mineru-server:8000",
"langList": "auto",
"formulaEnable": true,
"tableEnable": true,
"returnMd": true
}
}
]
}
🚀 Advanced Usage
Batch Document Processing
You can combine with other n8n nodes to implement batch document processing:
- Use HTTP Request node to get document list
- Use Split In Batches node to process in batches
- Use MinerU node to parse each document
- Use Merge node to combine results
Result Post-processing
After parsing completion, you can:
- Use Move Binary Data node to process returned files
- Use HTTP Request node to upload results to cloud storage
- Use Email node to send parsing results
- Use Webhook node to trigger subsequent processes
🔍 Troubleshooting
Common Issues
Q: Node execution fails with "API Token verification failed"
A: Please check if your API Token is correct and ensure you have obtained a valid Token from the MinerU official website.
Q: Document parsing timeout
A: You can appropriately increase the "Maximum Wait Time" parameter or check if the document size is too large.
Q: Custom service connection failed
A: Please ensure your MinerU server is running normally and the network connection is stable.
Q: Some document formats cannot be parsed
A: Please confirm the document format is in the supported list and check if the document is corrupted.
Debugging Tips
- Enable Node Debug: Enable "Continue On Fail" option in node settings
- Check Error Logs: Check n8n error logs for detailed information
- Test Connection: Use simple documents to test if connection is normal first
- Check Parameters: Ensure all required parameters are set correctly
🤝 Contributing
We welcome community contributions! If you want to contribute to the project:
- Fork this repository
- Create your feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
📄 License
This project is licensed under the MIT License - see the LICENSE.md file for details.
🔗 Related Links
👥 Contact Us
- Author: opendatalab
- Email: opendatalab-feedback@pjlab.org.cn
- GitHub: @opendatalab
If this project helps you, please give us a ⭐️!