LiteLLM Chat Model icon

LiteLLM Chat Model

For advanced usage with an AI chain

Overview

This node provides an interface to interact with a language model API for advanced AI chain usage. It allows users to generate text completions or structured JSON responses by specifying a model and various generation options. This node is beneficial in scenarios where you want to integrate AI-generated content, such as chatbots, content creation, summarization, or any workflow requiring natural language understanding and generation.

Practical examples include:

  • Generating conversational replies in a chatbot.
  • Creating summaries or explanations based on input prompts.
  • Producing structured JSON data from natural language prompts when using the JSON response format.
  • Experimenting with different models and tuning parameters like temperature and penalties to control output randomness and diversity.

Properties

Name Meaning
This node must be connected to an AI chain. Insert one A notice indicating that this node requires connection to an AI chain or agent node to function properly.
If using JSON response format, you must include word "json" in the prompt in your chain or agent. Also, make sure to select latest models released post November 2023. A notice shown only when JSON response format is selected, reminding users to include the word "json" in their prompt and use recent models for proper JSON output.
Model The AI model used to generate completions. Options are dynamically loaded from the API and sorted by name. Default is "gemini/2.0-flash".
Options A collection of additional parameters to customize the completion generation:
  Frequency Penalty Penalizes new tokens based on their existing frequency in the generated text to reduce repetition. Range: -2 to 2.
  Maximum Number of Tokens Maximum number of tokens to generate. Most models support up to 2048 tokens, newer ones up to 32,768. Default is -1 (no limit).
  Response Format Output format of the response. Options: "Text" (regular text) or "JSON" (ensures valid JSON output). Default is "text".
  Presence Penalty Penalizes new tokens based on whether they appear in the text so far, encouraging discussion of new topics. Range: -2 to 2.
  Sampling Temperature Controls randomness of output. Lower values produce more deterministic results. Range: 0 to 1. Default is 0.7.
  Timeout Maximum time allowed for the request in milliseconds. Default is 60000 (60 seconds).
  Max Retries Maximum number of retry attempts if the request fails. Default is 2.
  Top P Controls diversity via nucleus sampling. Value between 0 and 1. Default is 1.

Output

The node outputs data under the json field containing the AI model's generated completion. Depending on the selected response format:

  • Text format: The output is a plain text string representing the model's completion.
  • JSON format: The output is guaranteed to be valid JSON generated by the model, useful for workflows requiring structured data.

If binary data were supported, it would typically represent files or media generated by the model, but this node focuses on textual completions only.

Dependencies

  • Requires an API key credential for the AI service endpoint (referred generically as an API key credential).
  • The base URL for the API must be configured in the credentials.
  • Uses the LangChain OpenAI client internally to communicate with the AI model.
  • Supports dynamic loading of available models from the API endpoint /v1/models.
  • Includes tracing callbacks for monitoring requests.
  • Handles rate limiting errors specifically, providing custom error messages.

Troubleshooting

  • Common issues:

    • Not connecting this node to an AI chain or agent node will prevent it from functioning correctly.
    • Using JSON response format without including the word "json" in the prompt or selecting outdated models may cause invalid JSON output.
    • Requests timing out if the timeout value is too low or network issues occur.
    • Rate limit errors from the API if too many requests are made in a short period.
  • Error messages:

    • Rate limit errors trigger a custom message explaining the issue. Users should reduce request frequency or increase retry delays.
    • General API errors are surfaced as node execution errors with details for debugging.
  • Resolutions:

    • Ensure the node is part of a properly configured AI chain or agent.
    • Include "json" in prompts when expecting JSON output and use recent models.
    • Adjust timeout and retry settings according to network conditions.
    • Monitor API usage to avoid hitting rate limits.

Links and References

Discussion