Google Search Fetcher
Search Google and fetch clean text from top N sites with smart key information extraction
Overview
This node performs a Google Custom Search based on a user-provided query and fetches the top N search results. It then retrieves and cleans the textual content from each result's webpage. The node offers two modes of output:
- Key Information Extraction: Extracts and returns a set of key facts relevant to the search query, optimizing for concise context.
- Full Text Retrieval: Returns the full cleaned text content of each search result, truncated to a specified maximum length to avoid overflow.
This node is useful when you want to programmatically gather summarized or detailed information from web search results, for example:
- Quickly extracting key insights about a topic without reading entire articles.
- Feeding summarized search data into further processing workflows like natural language understanding or report generation.
- Collecting raw textual data from multiple sources for analysis or archival.
Properties
| Name | Meaning |
|---|---|
| Search Query | The search term or phrase to query in Google. |
| API Key | The API key credential required to authenticate requests to the Google Custom Search API. |
| Search Engine ID (CX) | The identifier of the custom search engine to use for querying Google. |
| Number of Results | Number of top search results to retrieve, between 1 and 10. |
| Enable Key Information Extraction | Whether to extract only key facts from the search results to optimize context length (true/false). |
| Max Key Facts | Maximum number of key facts to extract when key extraction is enabled (1–100). |
| Max Text Length per Result | Maximum length of text to keep per search result when key extraction is disabled (100–5000 characters). |
Output
The node outputs an array of JSON objects, each corresponding to one input item. Each output object contains a json field with the following structure:
query: The original search query string.links: An array of URLs retrieved from the search results.totalResults: Number of successfully fetched and parsed search results.timestamp: ISO timestamp of when the search was performed.- Depending on extraction mode:
- If key information extraction is enabled:
keyFacts: Array of extracted key fact objects, each containing:text: A short sentence or fact relevant to the query.source: Title of the source page.url: URL of the source page.relevance: Numeric score indicating relevance to the query.
extractionMode:"key_facts"factsCount: Number of key facts extracted.summary: Concatenated text of the top 3 key facts.
- If full text retrieval is enabled:
texts: Array of cleaned text strings from each search result, truncated to the max text length.extractionMode:"full_text"sources: Array of objects describing each source with:title: Title of the page.url: URL of the page.textLength: Length of the full cleaned text before truncation.
- If key information extraction is enabled:
If any error occurs during fetching or parsing a particular result, the error message is included in place of that result's content.
The node does not output binary data.
Dependencies
- Requires a valid Google Custom Search API key and a configured Custom Search Engine ID.
- Uses the Google Custom Search REST API endpoint.
- Fetches webpage content using HTTP GET requests with a custom User-Agent header.
- Uses the
axioslibrary for HTTP requests. - Uses the
cheeriolibrary to parse and clean HTML content by removing scripts, styles, ads, navigation, and other non-content elements.
Troubleshooting
Common issues:
- Invalid or missing API key or Search Engine ID will cause the Google API request to fail.
- Network timeouts or unreachable URLs may cause individual result fetches to fail.
- Some webpages may block automated scraping or require more sophisticated parsing.
- Exceeding Google Custom Search API quota limits will result in errors.
Error messages:
- Errors during the Google API call or individual page fetches are caught and included in the output as error messages.
- If the node is configured to continue on failure, it will output error details per failed item; otherwise, it throws an operation error stopping execution.
Resolutions:
- Verify API key and Search Engine ID correctness.
- Ensure network connectivity and that target pages allow scraping.
- Adjust timeout settings or increase max text length if needed.
- Monitor and respect Google API usage quotas.