Actions31
- Company Actions
- Company (Related) Actions
- Data (Advanced) Actions
- Enrich Actions
- Live Web RAG Actions
- LLM Template Actions
- Other Data Actions
- Validation & Cleansing Actions
Overview
The "Live Web RAG" resource with the "Get RAG URL" operation allows users to retrieve content from a publicly accessible URL using the Bedrijfsdata API. This node is useful for extracting and processing web page data dynamically, enabling workflows that require fetching live web content for further analysis or integration.
Common scenarios include:
- Extracting cleaned HTML or markdown content from a webpage for content analysis.
- Retrieving raw or formatted snippets of text from URLs for knowledge extraction.
- Using localization options to fetch content as if browsing from a specific country or language setting.
Practical example:
- A user wants to gather clean textual content from a news article URL in Dutch, including markdown formatting but without images or links, to feed into a summarization workflow.
Properties
| Name | Meaning |
|---|---|
| URL | (Required) The URL of the webpage you want to retrieve content from. |
| Localization Options | Optional settings to specify the browsing context: - Country (ISO 639-1): Country code to simulate browsing from (e.g., US, NL). - Language (ISO 3166-1 Alpha-2): Language code for browsing (e.g., us, nl). |
| Output Options | Options to customize the output content: - Add Cleaned HTML: Disable or enable cleaned HTML output. - Add Markdown: No markdown, CommonMark, or cleaned markdown without images and links. - Add Raw Content: Disable or enable raw content output. - Add Raw HTML: Disable or enable raw HTML output. - Max. Snippets Length: Maximum length for snippets; if >0, unique sentences are added to the result. |
Output
The node outputs JSON data containing the retrieved content from the specified URL. Depending on the selected output options, the JSON may include:
- Cleaned HTML version of the page content.
- Markdown formatted content (CommonMark or cleaned markdown without images/links).
- Raw content extracted from the page.
- Raw HTML source of the page.
- A list of unique text snippets up to the specified maximum length.
This flexible output structure allows downstream nodes to consume the content in the preferred format for further processing or analysis.
The node does not output binary data.
Dependencies
- Requires an active connection to the Bedrijfsdata API service.
- Needs an API key credential configured in n8n for authentication.
- Internet access to retrieve the target URLs.
Troubleshooting
- Missing or invalid URL: Ensure the URL property is provided and correctly formatted.
- API request failures: Check the API key validity and network connectivity.
- Empty or incomplete content: Verify the URL is publicly accessible and not behind authentication or paywalls.
- Localization options not applied: Confirm correct ISO codes for country and language are used.
- Output options ignored: Make sure the options are set properly; some combinations might not produce expected results depending on the source page.
If errors occur, review the error messages returned by the API and verify all required parameters are correctly set.
Links and References
- Bedrijfsdata API Documentation (general reference for API endpoints)
- ISO 639-1 Codes for country codes
- ISO 3166-1 Alpha-2 Codes for language codes