Overview
This node integrates with the Horizon Data Wave API to parse and crawl websites. Specifically, the Map operation discovers URLs starting from a given URL, optionally filtering them by a search term or limiting results. It can use sitemap.xml files and/or HTML links for discovery, and supports including subdomains in the results.
Common scenarios include:
- Generating a list of URLs from a website for further processing or scraping.
- Discovering site structure or content pages automatically.
- Filtering URLs based on keywords to focus on relevant pages.
For example, you might start from a homepage URL and map all linked pages containing a certain keyword, limiting the output to 1000 URLs.
Properties
| Name | Meaning |
|---|---|
| Base URL | Custom API base URL; leave empty to use the default Horizon Data Wave API endpoint. |
| URL | Starting URL for URL discovery (required). |
| Search Term | Optional keyword to filter discovered URLs by matching text. |
| Ignore Sitemap | If true, skip sitemap.xml discovery and only use HTML links for URL discovery. |
| Sitemap Only | If true, only use sitemap.xml for discovery, ignoring HTML links. |
| Include Subdomains | If true, include URLs from subdomains in the results. |
| Limit | Maximum number of URLs to return (default 1000). |
Output
The node outputs an array of JSON objects, each representing a discovered URL and its associated metadata as returned by the Horizon Data Wave API. The exact fields depend on the API response but typically include URL strings and possibly additional info about each link.
No binary data is output by this operation.
Dependencies
- Requires an API key credential for the Horizon Data Wave API.
- The node makes authenticated HTTP POST requests to the API endpoints.
- Optionally configurable base URL for the API if not using the default.
Troubleshooting
Common issues:
- Invalid or missing API credentials will cause authentication failures.
- Providing an invalid or unreachable starting URL may result in errors or empty results.
- Setting conflicting options like both
Ignore SitemapandSitemap Onlymay lead to unexpected behavior. - Exceeding the limit or timeout constraints could truncate results.
Error messages:
- Errors from the API are propagated; typical messages relate to network issues, invalid parameters, or authentication failures.
- To resolve, verify API credentials, check URL validity, and adjust parameters accordingly.
- Use "Continue On Fail" option in n8n to handle errors gracefully if desired.
Links and References
- Horizon Data Wave API Documentation (generic reference, actual docs should be consulted)
- n8n HTTP Request Node documentation for understanding request options and authentication setup.
