Overview
This node converts the content of a given web page URL into Markdown format using the Postlight Parser library. It extracts the main article content along with optional metadata such as title, author, publication date, and lead image. This is useful for workflows that need to process or archive web articles in a clean, readable Markdown format.
Common scenarios include:
- Archiving web articles in Markdown for note-taking or documentation.
- Feeding cleaned article content into AI agents or other automation tools.
- Extracting article metadata alongside content for further processing or display.
For example, you can input a news article URL and get back a Markdown-formatted version including the headline, author, publish date, lead image, and the article body text.
Properties
| Name | Meaning |
|---|---|
| URL | The web page URL to convert to Markdown. |
| Output Format | The format of the output Markdown: - Full Article with Metadata - Content Only - AI-Agent Compatible |
| Include Title | Whether to include the article's title in the Markdown output (true/false). |
| Include Author | Whether to include the author's name in the Markdown output (true/false). |
| Include Date | Whether to include the publication date in the Markdown output (true/false). |
| Include Lead Image | Whether to include the lead image in the Markdown output (true/false). |
Output
The node outputs JSON data with the following structure depending on the selected output format:
Full Article with Metadata:
Includesmarkdown(full Markdown content with optional metadata), plus separate fields fortitle,author,date_published,lead_image_url,url,domain,excerpt, andword_count.Content Only:
Outputs only the raw article content as Markdown in themarkdownfield without any metadata.AI-Agent Compatible:
Provides a structured output designed for AI agent consumption, including:markdown: combined Markdown content with metadata included inline.sourceUrl,sourceTitle,sourceType(always "webpage").content: same asmarkdown.metadata: an object containing detailed metadata fields (title,author,date_published,lead_image_url,url,domain,excerpt,word_count).
No binary data output is produced by this node.
Dependencies
- Uses the external library Postlight Parser to parse and extract article content from URLs.
- Optionally uses a user-provided API key credential to set a custom User-Agent header when fetching the URL content.
- Requires internet access to fetch and parse the target web pages.
Troubleshooting
- Missing or invalid URL: The node throws an error if the URL property is empty or invalid. Ensure a valid URL is provided.
- Network issues or inaccessible URL: If the URL cannot be fetched (due to network errors, restrictions, or invalid URLs), the node will fail. Check connectivity and URL validity.
- Parsing failures: Some web pages may not be parsed correctly if their structure is unusual or heavily scripted. In such cases, the output Markdown might be incomplete or missing.
- Credential issues: If a custom User-Agent is required but the API key credential is missing or misconfigured, the request might be blocked or return unexpected results.
- To continue processing multiple items even if some fail, enable the "Continue On Fail" option in the node settings.
Links and References
- Postlight Parser GitHub Repository – The underlying library used for parsing web articles.
- Markdown Syntax Guide – For understanding the Markdown output format.