0CodeKit icon

0CodeKit

A toolbox of no-code utilities

Actions108

Overview

The node provides a versatile "toolbox" of no-code utilities under various resources and operations. Specifically, for the Generate resource with the HTML Scraping operation, it allows users to scrape content from a specified URL. Users can choose to retrieve either the full HTML or only the text content from the webpage.

This node is beneficial in scenarios where you want to extract data from web pages automatically without writing custom scraping scripts. For example:

  • Extracting article text from news websites.
  • Collecting product descriptions from e-commerce pages.
  • Gathering metadata or textual content for further processing or analysis.

Properties

Name Meaning
Url The URL of the webpage to scrape. This is required.
Text Only Boolean flag indicating whether to return only the text content of the HTML (true) or the full HTML (false).
Code Variables A collection of code variables defined in the code editor for the selected function. Each variable has:
- Variable Name or ID: The name or ID of the variable.
- Value: The value assigned to that variable. Useful for passing dynamic values into code functions.

Output

The output is a JSON array containing the results of the scraping operation. Each item corresponds to one input item processed.

  • The json field contains the scraped content from the specified URL.
  • If "Text Only" is true, the output will contain only the textual content extracted from the HTML.
  • Otherwise, the full HTML content of the page is returned.
  • The node does not explicitly output binary data for this operation.

Dependencies

  • Requires an API key credential for authentication with the external service powering the node's functionality.
  • The node makes HTTP POST requests to endpoints corresponding to the resource and operation (e.g., generate/html-scrape) to perform the scraping.
  • No additional environment variables are explicitly required beyond the API key credential.

Troubleshooting

  • Common issues:

    • Invalid or unreachable URL: The node may fail if the URL is malformed or the target website is down.
    • Network connectivity problems: Ensure the n8n instance has internet access.
    • API authentication errors: Verify that the API key credential is correctly configured and valid.
    • Large or complex webpages might cause timeouts or incomplete scraping.
  • Error messages:

    • Errors related to HTTP request failures usually indicate network or URL issues.
    • Authentication errors suggest problems with the provided API key.
    • Parsing errors could occur if the response from the server is unexpected or malformed.

To resolve these:

  • Double-check the URL format and accessibility.
  • Confirm API credentials are correct and have necessary permissions.
  • Retry with simpler URLs or smaller pages to isolate issues.

Links and References

  • n8n Expressions Documentation — for using expressions in variable definitions.
  • General web scraping best practices and legal considerations should be reviewed when scraping third-party websites.

Discussion