0CodeKit icon

0CodeKit

A toolbox of no-code utilities

Actions108

Overview

The node provides a versatile HTML parsing utility within the "Operator" resource under the "HTML Parser" operation. It allows users to extract HTML content from either a URL or a direct HTML string input. Users can specify CSS selectors, tag names, class names, or element IDs to target specific parts of the HTML document. The node supports returning either the first matching element or all elements that satisfy the query.

This node is beneficial in scenarios where you need to scrape or extract structured data from web pages or raw HTML content without writing custom code. For example, extracting product details from an e-commerce page, retrieving article text from a blog post, or gathering metadata from HTML snippets.

Properties

Name Meaning
Url The URL of the website from which to extract the HTML content.
Html Directly provide an HTML string instead of fetching it from a URL.
All Boolean flag (default false). Set to true to return all elements matching the selector; otherwise, only the first match is returned.
Selector A CSS selector string that overrides tagSelector, classSelector, and idSelector to select elements.
Tag Selector The tag name of elements to extract (e.g., p, div).
Class Selector The class name of elements to extract.
ID Selector The ID of the element to extract.
Code Variables A collection of user-defined variables (name/value pairs) that can be used in the code editor for dynamic value substitution.

Output

The output JSON contains the extracted HTML elements based on the provided selectors and input source. If the "All" property is set to true, the output will include an array of all matching elements; otherwise, it returns only the first matched element. The exact structure depends on the underlying API response but generally includes the HTML content or relevant parsed data.

The node does not explicitly handle binary data output for this operation.

Dependencies

  • Requires an API key credential for authentication with the external service providing the HTML parsing functionality.
  • The node makes HTTP POST requests to an external API endpoint corresponding to the resource and operation (operator/htmlparser/get).
  • No additional environment variables are indicated as necessary beyond the API key credential.

Troubleshooting

  • Common Issues:

    • Providing both URL and HTML properties empty or invalid may cause the node to fail due to lack of input.
    • Incorrect or overly broad CSS selectors might return no results or unexpected elements.
    • Network issues or invalid API credentials will prevent successful API calls.
  • Error Messages:

    • Errors related to invalid selectors or missing parameters typically come from the external API and should be resolved by verifying input values.
    • Authentication errors indicate problems with the API key credential setup.
    • JSON parsing errors when using the "Code Variables" feature may occur if variable values are malformed.

Links and References

Discussion