bozonx-news-crawler-microservice

n8n nodes for BozonX News Crawler microservice with flexible authentication (None, Basic Auth, Bearer Token)

Package Information

Downloads: 22 weekly / 99 monthly
Latest Version: 2.7.0
Author: Ivan K

Documentation

n8n nodes for BozonX News Crawler microservice

Quickly orchestrate News Crawler workflows inside n8n. The package ships three nodes that cover payload preparation, job dispatch, and data retrieval.

Authentication

The package uses News Crawler API credentials specific to this microservice. Configure authentication in n8n credentials:

  • Base URL (required): Service base URL (optionally including BASE_PATH) (e.g., https://news-crawler.example.com or https://news-crawler.example.com/custom/base). The nodes append /api/v1 automatically.
  • Authentication: Choose from three options:
    • None (default): No authentication required
    • Basic Auth: Requires username and password
    • Bearer Token: Requires API token

All nodes in this package use these credentials to connect to your News Crawler instance.

Available nodes

News Crawler Request (All-in-One)

  • Consolidated functionality: Handles Batch creation, listing, Source listing, and Data retrieval in a single node.
  • Operations:
    • Batch: Create: Replaces News Crawler Start Batch and Params (supports both UI Builder and raw JSON).
    • Batch: Get All: GET /batches to list batches.
    • Source: Get All: GET /sources to list available sources.
    • Data: Get: GET /data to retrieve parsed items (replaces News Crawler Get Data).

News Crawler Params

  • Collect sources from your registry or configure custom RSS / page crawlers.
  • Validates input and builds a payload compatible with POST /batches.
  • Optional webhook section adds delivery settings without leaving the node.
  • Advanced options (fingerprint, locale, time zone) appear contextually, keeping the form compact.

News Crawler Start Batch

  • Sends the payload from the previous node to the microservice gateway.
  • Accepts shared credentials (Gateway URL, optional base path, API key headers, etc.).
  • Choose between referencing the previous node output via expression ({{$json}}) or pasting JSON manually.
  • Returns the batch metadata from the API, so you can branch on status or batch ID.

News Crawler Get Data

  • Fetches parsed items through GET /data using the same gateway credentials.
  • Minimal configuration: select sources (CSV), optional sourceLimit, and optional date filters:
    • fromDate – filter by item publication date (from original source)
    • fromSavedAt – filter by when the item was saved to the dataset
    • Note: fromDate and fromSavedAt are mutually exclusive
  • Output JSON follows the API envelope (sources, sourceLimit, optional fromDate / fromSavedAt) and exposes a flat results array of items for downstream automations (e.g., notifier, storage).

Build a workflow in minutes

  1. Configure credentials – Create a new "News Crawler API" credential with your microservice URL and authentication method (None, Basic Auth, or Bearer Token).
  2. Prepare tasks – Drop the Params node, pick the task kind, and fill in the required fields. Use the preview to confirm validation.
  3. Send the batch – Connect it to Start Batch or use the all-in-one Request node, select your credentials, and point the payload field to {{$json}} from the Params node.
  4. Process results – Either listen for your webhook or periodically poll the Get Data node and hand the items to the next step.

Pro tip: wrap the Start Batch node with If or Wait nodes to react to failures or schedule retries.

Helpful UI behaviors

  • Task types – Switching the Kind toggles only the relevant form groups (registry overrides, RSS selectors, page selectors).
  • RSS extraction – For RSS tasks you can override feed fields for link, title, description, date, and tags, matching the backend extract* options.
  • Locale & timezone – Custom kinds expose locale/timezone overrides so Playwright and parsers match target sites.
  • Fingerprinting – Enable fingerprint generation to rotate headers; browser/device lists accept CSV strings.
  • Webhook card – Enter a URL to unlock custom headers (YAML/JSON), retry overrides, and per-webhook timeout (seconds via timeoutSecs); leave blank if you only poll data. Headers support both YAML and JSON formats, and can be passed via expressions.
  • Legacy fields – The service no longer accepts the scraper field. Use the mode selector instead.

Development quick start

npm install
npm run dev

The dev script runs n8n with hot reload so you can iterate on node UX quickly.

Discussion