Google Search Fetcher

Search Google and fetch clean text from top N sites with smart key information extraction

Overview

This node performs a Google Custom Search based on a user-provided query and fetches the top N search results. It then retrieves and cleans the textual content from each result's webpage. The node offers two modes of output:

  • Key Information Extraction: Extracts and returns a set of key facts relevant to the search query, optimizing for concise context.
  • Full Text Retrieval: Returns the full cleaned text content of each search result, truncated to a specified maximum length to avoid overflow.

This node is useful when you want to programmatically gather summarized or detailed information from web search results, for example:

  • Quickly extracting key insights about a topic without reading entire articles.
  • Feeding summarized search data into further processing workflows like natural language understanding or report generation.
  • Collecting raw textual data from multiple sources for analysis or archival.

Properties

Name Meaning
Search Query The search term or phrase to query in Google.
API Key The API key credential required to authenticate requests to the Google Custom Search API.
Search Engine ID (CX) The identifier of the custom search engine to use for querying Google.
Number of Results Number of top search results to retrieve, between 1 and 10.
Enable Key Information Extraction Whether to extract only key facts from the search results to optimize context length (true/false).
Max Key Facts Maximum number of key facts to extract when key extraction is enabled (1–100).
Max Text Length per Result Maximum length of text to keep per search result when key extraction is disabled (100–5000 characters).

Output

The node outputs an array of JSON objects, each corresponding to one input item. Each output object contains a json field with the following structure:

  • query: The original search query string.
  • links: An array of URLs retrieved from the search results.
  • totalResults: Number of successfully fetched and parsed search results.
  • timestamp: ISO timestamp of when the search was performed.
  • Depending on extraction mode:
    • If key information extraction is enabled:
      • keyFacts: Array of extracted key fact objects, each containing:
        • text: A short sentence or fact relevant to the query.
        • source: Title of the source page.
        • url: URL of the source page.
        • relevance: Numeric score indicating relevance to the query.
      • extractionMode: "key_facts"
      • factsCount: Number of key facts extracted.
      • summary: Concatenated text of the top 3 key facts.
    • If full text retrieval is enabled:
      • texts: Array of cleaned text strings from each search result, truncated to the max text length.
      • extractionMode: "full_text"
      • sources: Array of objects describing each source with:
        • title: Title of the page.
        • url: URL of the page.
        • textLength: Length of the full cleaned text before truncation.

If any error occurs during fetching or parsing a particular result, the error message is included in place of that result's content.

The node does not output binary data.

Dependencies

  • Requires a valid Google Custom Search API key and a configured Custom Search Engine ID.
  • Uses the Google Custom Search REST API endpoint.
  • Fetches webpage content using HTTP GET requests with a custom User-Agent header.
  • Uses the axios library for HTTP requests.
  • Uses the cheerio library to parse and clean HTML content by removing scripts, styles, ads, navigation, and other non-content elements.

Troubleshooting

  • Common issues:

    • Invalid or missing API key or Search Engine ID will cause the Google API request to fail.
    • Network timeouts or unreachable URLs may cause individual result fetches to fail.
    • Some webpages may block automated scraping or require more sophisticated parsing.
    • Exceeding Google Custom Search API quota limits will result in errors.
  • Error messages:

    • Errors during the Google API call or individual page fetches are caught and included in the output as error messages.
    • If the node is configured to continue on failure, it will output error details per failed item; otherwise, it throws an operation error stopping execution.
  • Resolutions:

    • Verify API key and Search Engine ID correctness.
    • Ensure network connectivity and that target pages allow scraping.
    • Adjust timeout settings or increase max text length if needed.
    • Monitor and respect Google API usage quotas.

Links and References

Discussion