Package Information
Documentation
n8n-nodes-google-vertex-cached
This is an n8n community node that provides Google Vertex AI Chat with Context Caching and Google Search Grounding support for n8n workflows.
Google Vertex AI is Google Cloud's unified AI platform that provides access to powerful language models like Gemini. This node implements native support for:
- Context Caching: Cache large contexts (documents, codebases, conversation history) and reuse them across multiple API calls, significantly reducing costs and latency
- Google Search Grounding: Ground model responses with real-time Google Search results for accurate, up-to-date information
n8n is a fair-code licensed workflow automation platform.
Installation | Operations | Credentials | Usage | Resources
Installation
Follow the installation guide in the n8n community nodes documentation.
Manual Installation
npm install n8n-nodes-google-vertex-cached
Development
Prerequisites
- Node.js (>= 18)
- npm
- n8n installed globally or locally for testing
Setup
- Clone the repository:
git clone https://github.com/wtyeung/n8n-nodes-google-vertex-cached.git cd n8n-nodes-google-vertex-cached - Install dependencies:
npm install - Build the node:
npm run build
Testing Locally
To test this node in your local n8n instance:
- Link the package:
npm link - Go to your n8n nodes directory (usually
~/.n8n/custom) and link it:
Alternatively, use the built-in dev server:npm link n8n-nodes-google-vertex-cachednpm run dev
Operations
This node provides a Chat Model that can be used as a sub-node in n8n's AI Agent workflows:
- Chat Completion: Generate responses using Google's Gemini models
- Context Caching: Reuse cached contexts to reduce API costs and improve response times
- Google Search Grounding: Ground responses with real-time Google Search results for accurate, up-to-date information
- Tool Binding: Full support for n8n AI Agent tool calling
Credentials
This node uses n8n's built-in Google Service Account credential (googleApi). This is the same credential used by the standard Google Vertex AI Chat Model node.
Prerequisites
- A Google Cloud Platform account
- Vertex AI API enabled in your GCP project
- Service account with Vertex AI permissions (e.g.
Vertex AI User)
Setup
- In n8n, go to Credentials ā New
- Search for "Google Service Account"
- Provide:
- Service Account Email: Your service account email
- Private Key: Your service account private key (from JSON key file)
- Project ID: Your GCP project ID (optional, can be auto-detected)
Usage
Basic Chat (No Cache)
- Add an AI Agent node to your workflow
- Under Language Model, select Google Vertex AI Chat (Cached)
- Configure:
- Model:
gemini-1.5-flash-001(or other Gemini model) - Temperature: Control randomness (0-2)
- Leave Cached Content Name empty
- Model:
Using Context Caching
- First, create a cached content resource in Vertex AI (via API or Console)
- Copy the full resource name (format:
projects/{project}/locations/{location}/cachedContents/{cache-id}) - In the node configuration:
- Cached Content Name: Paste the full resource name
- Temperature/TopK: These are ignored when using cache (settings come from the cache itself)
Using Google Search Grounding
Ground your model's responses with real-time Google Search results for more accurate, up-to-date information:
- In the node configuration, expand Options
- Enable Google Search Grounding
Use cases:
- Questions requiring current information (news, weather, stock prices)
- Fact-checking and verification
- Questions about recent events or developments
Important: Google Search grounding cannot be used with cached content, as tools are not supported when using context caching. If both are enabled, grounding will be automatically disabled and a warning will be logged.
Token Usage Tracking
This node provides comprehensive token usage tracking:
Standard Usage:
- Prompt tokens: Input tokens consumed
- Completion tokens: Response tokens generated
- Total tokens: Sum of prompt and completion
With Context Caching:
- Cached tokens: Tokens served from cache (90% discount)
- Logged to console:
š° Token Usage - Prompt: X, Completion: Y, Cached: Z (90% discount) - Available in workflow output data under
tokenUsage.cachedTokens
Note: n8n's UI tooltip only displays Prompt and Completion tokens. To see cached token counts, check your console logs or access the data programmatically in subsequent nodes.
Important Notes
- When using a cache, generation parameters (temperature, topK, topP, maxOutputTokens) are inherited from the cached content and cannot be overridden
- The node automatically handles the "bind fix" to ensure compatibility with n8n's AI Agent tool calling
- Cached content has an expiration time - check your cache status in GCP Console
- Google Search grounding requires appropriate Vertex AI API permissions
Technical Deep Dive: How Cache + Tools Work Together
This node solves a tricky technical challenge: using Context Caching while maintaining full n8n AI Agent tool support. Here's how it works:
The Problem
In LangChain, non-standard parameters like cachedContent must be passed via .bind():
model.bind({ cachedContent: 'projects/.../cachedContents/...' })
However, .bind() creates a generic RunnableBinding wrapper that strips away specialized methods:
- ā Lost:
bindTools(required by n8n AI Agent) - ā Lost:
withStructuredOutput - ā Lost: Model-specific identity
When n8n's AI Agent checks for tool support, it fails because the wrapper doesn't have bindTools.
The Solution: Smart Wrapper Pattern
This node uses a hybrid approach that satisfies both LangChain's API and n8n's requirements:
Step 1: Create the Cache Wrapper using RunnableBinding
We use RunnableBinding directly instead of .bind() to avoid method existence issues and version compatibility problems.
const boundModel = new RunnableBinding({
bound: model,
kwargs: { cachedContent: cacheId },
config: {}
});
ā
Cache works
ā Tools broken
Step 2: Restore bindTools Method
We manually restore bindTools and use a Flattened Binding strategy to merge tool arguments with cache arguments. This prevents nesting issues that cause message propagation failures.
boundModel.bindTools = function(tools, options) {
// 1. Delegate tool formatting to the original model
const modelWithTools = model.bindTools(tools, options);
// 2. Extract the underlying model and kwargs
const bound = modelWithTools.bound || modelWithTools;
const toolKwargs = modelWithTools.kwargs || {};
// 3. Merge tool arguments with cache ID
const combinedKwargs = {
...toolKwargs,
cachedContent: cacheId
};
// 4. Create a single, flat RunnableBinding
return new RunnableBinding({
bound: bound,
kwargs: combinedKwargs,
config: modelWithTools.config || {}
});
};
ā
Cache works
ā
Tools work
Step 3: Restore Other Methods
boundModel.lc_namespace = model.lc_namespace;
boundModel.withStructuredOutput = model.withStructuredOutput?.bind(model);
The Object Chain
When the AI Agent runs, the request flows through multiple layers:
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
ā 1. n8n AI Agent ā
ā Calls: boundModel.bindTools() ā
āāāāāāāāāāāāāāāā¬āāāāāāāāāāāāāāāāāāāāāāā
ā
āāāāāāāāāāāāāāāā¼āāāāāāāāāāāāāāāāāāāāāāā
ā 2. Restored bindTools Method ā
ā Formats tools for Gemini ā
āāāāāāāāāāāāāāāā¬āāāāāāāāāāāāāāāāāāāāāāā
ā
āāāāāāāāāāāāāāāā¼āāāāāāāāāāāāāāāāāāāāāāā
ā 3. Cache Binding Layer ā
ā Injects cachedContent parameter ā
āāāāāāāāāāāāāāāā¬āāāāāāāāāāāāāāāāāāāāāāā
ā
āāāāāāāāāāāāāāāā¼āāāāāāāāāāāāāāāāāāāāāāā
ā 4. ChatVertexAI Core ā
ā Sends request to Vertex AI ā
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
Why This Works
Previous approaches failed:
- Monkey Patching: Tried to hack internal methods (
_generate,invoke), but n8n calls different methods at different times - Subclassing: Created inheritance issues with LangChain's type system
This solution succeeds:
- ā
Uses official LangChain
.bind()API for cache injection - ā Manually restores required methods to satisfy n8n's checks
- ā Maintains full compatibility with both systems
- ā No internal method hacking required
Benefits
- Cost Reduction: Cache large contexts once, reuse across multiple calls
- Lower Latency: Cached content doesn't need to be reprocessed
- Full Agent Support: Works seamlessly with n8n's AI Agent tools
- Future-Proof: Uses official APIs, not internal hacks
Compatibility
- Minimum n8n version: 1.0.0
- Tested with: n8n 1.x
- LangChain version: ^0.3.0
- @langchain/google-vertexai: ^0.2.18
- @langchain/core: ^0.3.68
Known Limitations
Tools + Context Caching
Due to current Vertex AI API restrictions, you cannot use dynamic tools (n8n AI Agent Tools) alongside a Context Cache.
The API returns an error: "Tool config, tools and system instruction should not be set in the request when using cached content."
If you need to use tools with cached content, the tools must be defined at the time of cache creation, which is not currently supported by this node's dynamic tool binding.
Workaround: Use the node for Chat (RAG/QA) with Cache, but handle Tool Calling in a separate non-cached step or agent if needed.
Recommended Architecture: Agent as a Tool
To use both Context Caching and Dynamic Tools (like Calculator, Web Search, etc.) in the same workflow, use the "Agent as a Tool" pattern:
- Main AI Agent:
- Connect your dynamic tools here (e.g., "Date & Time").
- Use a standard Chat Model (non-cached).
- Sub-Agent (The "Secretary"):
- Create a second AI Agent node.
- Connect the Google Vertex AI Chat (Cached) model to this agent.
- Do not connect any tools to this sub-agent.
- Configure this agent's system prompt to answer questions based on the cache.
- Connect:
- Connect the Sub-Agent to the Main Agent's "Tool" input.
- Describe the tool as "A meeting secretary that has access to the full meeting transcripts and notes."
Result: The Main Agent can look up the time using its own tool, then ask the "Secretary" tool to retrieve information from the large cached context. This bypasses the API limitation because the Cached Model never sees the dynamic tools.
Resources
- n8n community nodes documentation
- Google Vertex AI Context Caching
- Vertex AI Documentation
- LangChain Google Vertex AI
Version History
0.2.6 (Latest)
- Developer Experience: Enhanced the Tools + Cache conflict error. It now automatically converts your n8n tools into Vertex AI Function Declaration JSON, which you can directly copy and paste into your cache creation request to enable tools with cached content.
0.2.4
- Developer Experience: Added a helpful error message when attempting to use Tools + Cache. The error now includes the JSON Tool Schema, allowing users to copy it and use it for creating a cache with baked-in tools.
- Documentation: Added the "Agent as a Tool" architecture guide to help users implement RAG + Dynamic Tools workflows.
0.2.1
Documentation updates to reflect correct technical implementation and dependency versions.
Major Stability Fix: Downgraded internal dependencies to align with n8n's core (
@langchain/core0.3.x), resolving "Unsupported message type" and "SystemMessage.isInstance" errors.Critical Fix: Fixed parameter passing (Temperature, TopP, etc.) so the model respects user settings and doesn't get stuck in cached personas.
Improved Defaults: Default temperature increased to 0.9 for better responsiveness.
Robust Binding: Implemented "Flattened Binding" strategy for tools + cache to ensure reliability.
0.1.0 - 0.1.3 (Beta)
- Initial development and binding fixes.