parquet

n8n community node for reading and writing Apache Parquet files with multi-cloud storage support

Package Information

Downloads: 28 weekly / 275 monthly
Latest Version: 1.0.7
Author: Gerd First-Grüttner

Documentation

n8n-nodes-parquet

NPM Version
License

An n8n community node for reading and writing Apache Parquet files with multi-cloud storage support.

n8n is a fair-code licensed workflow automation platform.

Features

  • Read, Write, Merge & Schema Scan: Full support for Apache Parquet format plus schema analysis
  • Multi-Cloud Storage: Azure Blob, AWS S3, Google Cloud Storage, S3-compatible (MinIO, Cloudflare R2)
  • Local Files & URLs: Read from filesystem or HTTP(S) endpoints
  • Wildcard Merge: Non-recursive wildcard expansion, sorted deterministically
  • Schema Merge Modes: strict, union, intersection (nested schemas supported)
  • Smart Schema Detection: Automatic schema inference with nested/complex type support
  • Manual Schema Override: Define custom schemas when needed
  • Streaming Support: Memory-efficient processing for large files
  • Multiple Compression: SNAPPY, GZIP, UNCOMPRESSED (BROTLI not supported due to n8n compatibility)
  • Column Filtering: Read specific columns only
  • Progress Logging: Track processing of large datasets
  • Type Promotion: Intelligent handling of mixed types
  • n8n Compatible: Pure JavaScript implementation, no native modules

Installation

Community Node Installation

  1. Go to Settings > Community Nodes in your n8n instance
  2. Click Install and enter: n8n-nodes-parquet
  3. Click Install and agree to the risks
  4. Restart n8n if required

Manual Server Installation (systemd)

If you manage n8n directly on a server (without UI install), install the package into the default n8n community path:

  • ~/.n8n/nodes

Example:

  1. npm pack (locally)
  2. Upload tarball to server
  3. cd ~/.n8n/nodes && npm install /tmp/n8n-nodes-parquet-<version>.tgz
  4. sudo systemctl restart n8n

Usage

Read

  • Operation: Read
  • Select storage type (Local File / URL / Azure / S3 / GCS)
  • Optional: Batch size, column filter, max rows, metadata

Write

  • Operation: Write
  • Select destination storage
  • Schema mode: Auto or Manual
  • Nested object handling (auto mode): Struct or JSON
  • Compression: SNAPPY (default), ### Merge
  • Operation: Merge
  • Add one or more sources (supports wildcards like data-*.parquet)
  • Choose output storage and output path
  • Optional: Mismatch handling (failFast or collect)
  • Optional: Schema merge mode (strict, union, intersection)
  • Optional: Type conflict handling (error, coerce, null)
  • Optional: Include Keys (comma-separated, supports dot paths like meta.agent.name)
  • Optional: Exclude Keys (comma-separated, supports dot paths; Exclude takes precedence)

Schema Scan

  • Operation: Schema Scan
  • Add one or more sources (same source model and wildcard behavior as Merge)
  • Reuse merge schema settings (strict, union, intersection) and mismatch handling
  • Reuse merge key filters (Include/Exclude Keys with dot-path support)
  • Returns merged schema; optional per-file schema details and resolved source file list
  • No output parquet file is written (analysis-only operation), coerce, null`)

Credentials

All cloud credentials are available in the node UI. Select only the ones you need for the chosen storage backends.

Changelog

See CHANGELOG.md for full release notes.

1.0.6 (latest)

  • Added nested object handling mode for write auto-schema: struct or json
  • Added recursive nested schema detection and validation
  • Added nested-aware merge support for union and intersection
  • Improved merge transformation/coercion for nested values
  • Updated UI text and docs for nested merge support

Made for the n8n community

Discussion