Package Information
Downloads: 28 weekly / 275 monthly
Latest Version: 1.0.7
Author: Gerd First-Grüttner
Documentation
n8n-nodes-parquet
An n8n community node for reading and writing Apache Parquet files with multi-cloud storage support.
n8n is a fair-code licensed workflow automation platform.
Features
- ✅ Read, Write, Merge & Schema Scan: Full support for Apache Parquet format plus schema analysis
- ✅ Multi-Cloud Storage: Azure Blob, AWS S3, Google Cloud Storage, S3-compatible (MinIO, Cloudflare R2)
- ✅ Local Files & URLs: Read from filesystem or HTTP(S) endpoints
- ✅ Wildcard Merge: Non-recursive wildcard expansion, sorted deterministically
- ✅ Schema Merge Modes: strict, union, intersection (nested schemas supported)
- ✅ Smart Schema Detection: Automatic schema inference with nested/complex type support
- ✅ Manual Schema Override: Define custom schemas when needed
- ✅ Streaming Support: Memory-efficient processing for large files
- ✅ Multiple Compression: SNAPPY, GZIP, UNCOMPRESSED (BROTLI not supported due to n8n compatibility)
- ✅ Column Filtering: Read specific columns only
- ✅ Progress Logging: Track processing of large datasets
- ✅ Type Promotion: Intelligent handling of mixed types
- ✅ n8n Compatible: Pure JavaScript implementation, no native modules
Installation
Community Node Installation
- Go to Settings > Community Nodes in your n8n instance
- Click Install and enter:
n8n-nodes-parquet - Click Install and agree to the risks
- Restart n8n if required
Manual Server Installation (systemd)
If you manage n8n directly on a server (without UI install), install the package into the default n8n community path:
~/.n8n/nodes
Example:
npm pack(locally)- Upload tarball to server
cd ~/.n8n/nodes && npm install /tmp/n8n-nodes-parquet-<version>.tgzsudo systemctl restart n8n
Usage
Read
- Operation: Read
- Select storage type (Local File / URL / Azure / S3 / GCS)
- Optional: Batch size, column filter, max rows, metadata
Write
- Operation: Write
- Select destination storage
- Schema mode: Auto or Manual
- Nested object handling (auto mode): Struct or JSON
- Compression: SNAPPY (default), ### Merge
- Operation: Merge
- Add one or more sources (supports wildcards like
data-*.parquet) - Choose output storage and output path
- Optional: Mismatch handling (
failFastorcollect) - Optional: Schema merge mode (
strict,union,intersection) - Optional: Type conflict handling (
error,coerce,null) - Optional: Include Keys (comma-separated, supports dot paths like
meta.agent.name) - Optional: Exclude Keys (comma-separated, supports dot paths; Exclude takes precedence)
Schema Scan
- Operation: Schema Scan
- Add one or more sources (same source model and wildcard behavior as Merge)
- Reuse merge schema settings (
strict,union,intersection) and mismatch handling - Reuse merge key filters (Include/Exclude Keys with dot-path support)
- Returns merged schema; optional per-file schema details and resolved source file list
- No output parquet file is written (analysis-only operation)
,coerce,null`)
Credentials
All cloud credentials are available in the node UI. Select only the ones you need for the chosen storage backends.
Changelog
See CHANGELOG.md for full release notes.
1.0.6 (latest)
- Added nested object handling mode for write auto-schema:
structorjson - Added recursive nested schema detection and validation
- Added nested-aware merge support for
unionandintersection - Improved merge transformation/coercion for nested values
- Updated UI text and docs for nested merge support
Made for the n8n community
