Actions3
- Extractor Actions
- Agent Scrape Actions
Overview
The "Parse HTML" operation of the AI Scraper node extracts structured data from raw HTML or text content using the Parsera API. Instead of scraping a webpage by URL, it processes provided HTML content directly to parse and extract specified data fields.
This node is useful when you already have the HTML content (e.g., downloaded or received from another source) and want to extract meaningful information without making an HTTP request. It supports defining extraction attributes either as individual fields or as a JSON schema, allowing flexible integration with AI tools or complex data schemas.
Practical examples:
- Extract product details like name, price, and availability from saved HTML snippets.
- Parse blog post metadata such as title, author, and publish date from raw HTML.
- Use in workflows where HTML content is fetched by other means and then parsed for specific data points.
Properties
| Name | Meaning |
|---|---|
| Content | Raw HTML or text content to extract data from. This is the input HTML string that the node will parse. |
| Attributes Input Mode | Select how to define attributes to extract: - Fields: Define each attribute individually with name, type, and description. - JSON: Define all attributes as a single JSON object, suitable for complex schemas or AI tool integration. |
| Attributes | When "Attributes Input Mode" is set to "Fields", define one or more data fields to extract. Each attribute requires: - Field Name: key in output JSON - Type: data type (Any, Boolean, Integer, List, Number, Object, String) - Description: natural language instruction on what to extract. |
| Attributes (JSON) | When "Attributes Input Mode" is set to "JSON", define attributes as a JSON object where each key is a field name and its value is an object with "description" and "type". Allowed types are any, string, integer, number, bool, list, object. |
| Mode | Extraction mode: - Standard: Balanced speed and accuracy. - Precision: Better for extracting data hidden deeper inside HTML structures. |
Output
The node outputs an array of items, each containing a json property with the extracted data fields as key-value pairs according to the defined attributes.
- The structure of the
jsonoutput matches the attribute names defined in the input. - Each field's value corresponds to the extracted data, typed as per the attribute definition.
- No binary data output is produced by this operation.
Example output item:
{
"json": {
"productName": "Example Product",
"price": 19.99,
"inStock": true
}
}
Dependencies
- Requires an active connection to the Parsera API service at
https://api.parsera.org/v1. - Needs an API authentication token credential configured in n8n to authorize requests.
- No additional environment variables are required beyond standard API credential setup.
Troubleshooting
- Missing or empty Content: If the "Content" property is empty or whitespace, the node throws an error
"Content is required for Parse HTML."Ensure valid HTML/text content is provided. - Invalid Attributes JSON: When using JSON mode, malformed JSON or incorrect structure causes errors like
"Attributes field contains invalid JSON"or"Attributes must resolve to a JSON object."Validate JSON syntax and structure before running. - Empty or malformed attribute definitions: Errors such as
"Attribute at index X is malformed or missing required properties.","Empty Field Name at index X.", or"Empty Field Description for ..."indicate incomplete attribute configuration. Make sure all required fields are filled correctly. - No attributes defined: The node requires at least one attribute to extract; otherwise, it throws
"At least one attribute is required." - API connectivity issues: Network problems or invalid API credentials will cause request failures. Verify API key validity and network access to
api.parsera.org.
Links and References
- Parsera API Documentation — Official docs for the underlying API used for parsing and extraction.
- n8n Documentation — General guidance on creating and configuring nodes and credentials in n8n.