GitLab Code Splitter

Split GitLab repository code into manageable chunks

Actions9

Code Actions
- Split
Repository Actions
Stem4 Integration Actions
System Actions
- Health Check
- Get Languages

Overview

The GitLab Code Splitter node is designed to split code content or entire GitLab repositories into smaller, manageable chunks. This is particularly useful for processing large codebases in tasks such as code analysis, indexing, or feeding code snippets into language models that have token limits.

For the Code - Split operation, the node takes raw code content and splits it based on configurable token limits and chunk sizes. This helps break down large source files into smaller pieces while optionally preserving newlines and overlapping tokens between chunks for context continuity.

Practical examples include:

Splitting a large Go source file into smaller chunks for incremental analysis.
Preparing code snippets for AI-powered code review tools.
Breaking down repository files by architectural layers or paths for targeted processing.

Properties

Name	Meaning
Code Content	The actual source code text to be split into chunks.
Language	Programming language of the provided code (e.g., "go"). This can influence how splitting or tokenization is handled.
File Path	The path or filename associated with the code content (e.g., "example.go"). Useful metadata for output or downstream processing.
Target Path	Optional target directory path where the split files should be written or logically placed.
Service	Optional identifier string to tag the output metadata with a service name or context.
Split Options	Collection of options controlling the splitting behavior: • Max Tokens: Maximum number of tokens per chunk (default 800). • Overlap: Number of tokens overlapping between chunks (default 50). • Min Chunk Size: Minimum tokens per chunk (default 100). • Preserve Newlines: Whether to keep newline characters in chunks (default true).

Output

The node outputs JSON data representing the split chunks of code. Each chunk contains a portion of the original code content, respecting the configured token limits and overlap settings.

The exact structure depends on the API response from the external splitting service but generally includes:

The chunked code segments.
Metadata such as language, file path, target path, and optional service identifier.

No binary data output is produced by this operation.

Dependencies

Requires an external API service accessible via a URL and authenticated with an API key credential.
The API endpoint /api/split is called with POST requests containing the code and splitting parameters.
The user must configure credentials providing a valid API URL and API key.
The API key must be a non-placeholder alphanumeric string between 10 and 200 characters.
The node uses n8n's HTTP request helper to communicate with the external service.

Troubleshooting

Invalid API URL: If the API URL in credentials is malformed or missing, the node throws an error "Invalid API URL provided in credentials". Ensure the URL is correct and reachable.
Invalid API Key: The node validates the API key format and rejects placeholders or keys that do not meet length and character requirements. Use a valid API key.
Empty or Missing Code Content: Since the code content is required, ensure it is provided; otherwise, the API may return errors or empty results.
API Request Failures: Network issues or incorrect API endpoint configuration can cause request failures. Verify connectivity and API availability.
Token Limits Misconfiguration: Setting very low max tokens or min chunk size might result in many small chunks or errors. Adjust these values appropriately.

Links and References

No direct links are embedded in the source code. Users should refer to their external API documentation for the code splitting service.
For general information on tokenization and chunking strategies, see resources on natural language processing and code analysis best practices.

GitLab Code SplitterInstall