Markdowned URL icon

Markdowned URL

Converts URL content to Markdown using Postlight Parser

Overview

This node converts the content of a given web page URL into Markdown format using the Postlight Parser library. It extracts the main article content along with optional metadata such as title, author, publication date, and lead image. This is useful for workflows that need to process or archive web articles in a clean, readable Markdown format.

Common scenarios include:

  • Archiving web articles in Markdown for note-taking or documentation.
  • Feeding cleaned article content into AI agents or other automation tools.
  • Extracting article metadata alongside content for further processing or display.

For example, you can input a news article URL and get back a Markdown-formatted version including the headline, author, publish date, lead image, and the article body text.

Properties

Name Meaning
URL The web page URL to convert to Markdown.
Output Format The format of the output Markdown:
- Full Article with Metadata
- Content Only
- AI-Agent Compatible
Include Title Whether to include the article's title in the Markdown output (true/false).
Include Author Whether to include the author's name in the Markdown output (true/false).
Include Date Whether to include the publication date in the Markdown output (true/false).
Include Lead Image Whether to include the lead image in the Markdown output (true/false).

Output

The node outputs JSON data with the following structure depending on the selected output format:

  • Full Article with Metadata:
    Includes markdown (full Markdown content with optional metadata), plus separate fields for title, author, date_published, lead_image_url, url, domain, excerpt, and word_count.

  • Content Only:
    Outputs only the raw article content as Markdown in the markdown field without any metadata.

  • AI-Agent Compatible:
    Provides a structured output designed for AI agent consumption, including:

    • markdown: combined Markdown content with metadata included inline.
    • sourceUrl, sourceTitle, sourceType (always "webpage").
    • content: same as markdown.
    • metadata: an object containing detailed metadata fields (title, author, date_published, lead_image_url, url, domain, excerpt, word_count).

No binary data output is produced by this node.

Dependencies

  • Uses the external library Postlight Parser to parse and extract article content from URLs.
  • Optionally uses a user-provided API key credential to set a custom User-Agent header when fetching the URL content.
  • Requires internet access to fetch and parse the target web pages.

Troubleshooting

  • Missing or invalid URL: The node throws an error if the URL property is empty or invalid. Ensure a valid URL is provided.
  • Network issues or inaccessible URL: If the URL cannot be fetched (due to network errors, restrictions, or invalid URLs), the node will fail. Check connectivity and URL validity.
  • Parsing failures: Some web pages may not be parsed correctly if their structure is unusual or heavily scripted. In such cases, the output Markdown might be incomplete or missing.
  • Credential issues: If a custom User-Agent is required but the API key credential is missing or misconfigured, the request might be blocked or return unexpected results.
  • To continue processing multiple items even if some fail, enable the "Continue On Fail" option in the node settings.

Links and References

Discussion