plasmate

n8n community node for Plasmate — fetch web pages and get structured Semantic Object Model (SOM) content instead of raw HTML

Package Information

Downloads: 2 weekly / 26 monthly
Latest Version: 0.1.0
Author: David Hurley

Documentation

Plasmate

n8n-nodes-plasmate

n8n community node for Plasmate — fetch web pages and get structured Semantic Object Model (SOM) content instead of raw HTML.

CI n8n community node License


What it does

The Plasmate node fetches any URL using Plasmate — a fast headless browser engine — and returns structured data instead of raw HTML. Plasmate compiles pages into a Semantic Object Model (SOM): organized regions, interactive elements with stable IDs, extracted text, and structured data (JSON-LD, OpenGraph).

Why not just use the HTTP Request node?

The HTTP Request node returns raw HTML — tens of thousands of tokens that downstream AI nodes have to parse. Plasmate returns structured JSON that's 10-800x smaller and immediately usable.

Operations

Operation Output
Fetch Page Full SOM: title, regions, elements, metadata
Extract Text Plain text joined from all page regions
Extract Links Array of {text, href, region} objects
Extract Structured Data JSON-LD, OpenGraph, and microdata

Prerequisites

  1. A self-hosted n8n instance (community nodes require self-hosted n8n)
  2. Plasmate installed on the same machine as n8n:
curl -fsSL https://plasmate.app/install.sh | sh

Installation

In your n8n instance, go to Settings → Community Nodes → Install and enter:

n8n-nodes-plasmate

Or install via npm in your n8n directory:

npm install n8n-nodes-plasmate

Usage

Basic — Fetch a page

  1. Add a Plasmate node to your workflow
  2. Set Operation to "Fetch Page"
  3. Set URL to any web address
  4. Connect downstream nodes to work with the SOM output

Extract links from a page

Set Operation to "Extract Links". The output includes links (an array) and link_count. Use the Split Out node to process each link individually in downstream steps.

Authenticated browsing

Set Auth Profile in Options to the domain (e.g. github.com). Requires cookies to be stored via the Plasmate browser extension beforehand.

Batch processing

Connect multiple URLs from an upstream node (e.g. a list from a Google Sheet or database). The Plasmate node processes one URL per input item.

Options

Option Default Description
Auth Profile (none) Domain for authenticated browsing (e.g. github.com)
Plasmate Binary Path plasmate Override if plasmate is not in PATH
Timeout (Seconds) 30 Max seconds to wait for a page fetch

Example output — Fetch Page

{
  "url": "https://example.com",
  "title": "Example Domain",
  "lang": "en",
  "element_count": 4,
  "interactive_count": 1,
  "region_count": 1,
  "som": {
    "regions": [
      {
        "id": "main",
        "role": "main",
        "elements": [
          { "id": "e1", "role": "heading", "text": "Example Domain" },
          { "id": "e2", "role": "text", "text": "This domain is for use in illustrative examples." },
          { "id": "e3", "role": "link", "text": "More information...", "href": "https://www.iana.org/domains/example" }
        ]
      }
    ]
  }
}

Token savings

Real-world benchmark (SOM vs raw HTML):

Site Savings
Vercel docs 99.6%
Stripe API 95.8%
Next.js docs 92.3%
Stack Overflow 85.6%
Wikipedia 82.8%

Related

License

MIT

Discussion