GitLab Code Splitter

Split GitLab repository code into manageable chunks

Actions9

Code Actions
- Split
Repository Actions
Stem4 Integration Actions
System Actions
- Health Check
- Get Languages

Overview

The "GitLab Code Splitter" node is designed to split code files from a GitLab repository into manageable chunks. This is particularly useful for processing large codebases, enabling easier analysis, indexing, or integration with other tools that require smaller pieces of code rather than entire files.

For the Repository resource and Split by Path operation, the node fetches files from specified directory paths within a GitLab project, filters them by file extensions and size, and splits their content into chunks based on token limits and overlap settings. This allows users to focus on specific parts of a repository, such as certain folders or modules, rather than processing the whole repository.

Practical examples:

Splitting all .go source files under src/ and lib/ directories in a GitLab project to prepare for code search indexing.
Processing only files in a tests/ folder to analyze test coverage or quality.
Extracting chunks of code from specific paths to feed into AI models for code review or documentation generation.

Properties

Name	Meaning
GitLab URL	The URL of the GitLab instance hosting the repository (e.g., https://gitlab.com).
Project ID	Identifier of the GitLab project in the format `group/project-name`.
GitLab Token	Personal access token for authenticating with GitLab API. Must start with `glpat-`, `gldt-`, or `gloas-`.
Branch	Git branch to process (default: `main`).
Target Path	Optional prefix path to prepend to processed files in the output metadata.
Service	Optional service identifier to include in the output metadata.
Include Paths	One or more directory paths within the repository to include for processing. Only files under these paths will be considered.
File Extensions	List of file extensions to process (e.g., `.go`, `.js`). Only files matching these extensions will be included.
Exclude Paths	Directory paths to exclude from processing (e.g., `.git`, `vendor`, `node_modules`). Files under these paths will be ignored.
Split Options	Collection of options controlling how files are split into chunks: - Max Tokens: Maximum tokens per chunk (default 800) - Overlap: Token overlap between chunks (default 50) - Min Chunk Size: Minimum tokens per chunk (default 100) - Preserve Newlines: Whether to keep newline characters (default true)
Max File Size	Maximum file size in bytes to process (default 2MB). Files larger than this size will be skipped.

Output

The node outputs an array of JSON objects representing the split chunks of code extracted from the specified repository paths. Each chunk typically contains:

The chunked code content.
Metadata including original file path, target path prefix (if any), service identifier (if provided), and possibly other relevant details about the chunk.

No binary data output is produced by this node; all output is structured JSON suitable for further processing or storage.

Dependencies

Requires a valid personal access token for GitLab with appropriate permissions to read the repository.
Needs network access to the specified GitLab instance.
Requires configuration of credentials in n8n containing:
- An API URL for the external splitting service.
- An API key credential for authentication with that service.
The node makes HTTP POST requests to an external API endpoint (/api/split-gitlab-repo) to perform the actual splitting logic.

Troubleshooting

Invalid GitLab URL: If the GitLab URL is malformed or unreachable, the node will throw an error. Ensure the URL is correct and accessible.
Invalid GitLab Token: The token must start with glpat-, gldt-, or gloas-. Using an incorrect or expired token will cause authentication failures.
API Key Issues: The node requires a valid API key credential for the external splitting service. The key must be 10-200 characters long, alphanumeric with hyphens or underscores, and not a placeholder value.
File Size Limits: Files exceeding the configured max file size (default 2MB) will be skipped silently. Adjust the limit if needed.
Empty Include Paths: If no include paths are specified for the "Split by Path" operation, no files will be processed. Make sure to provide at least one path.
Timeouts: Large repositories or many files may cause long processing times. The node sets a timeout of 1 hour for the request; ensure your environment can handle this duration.

Links and References

GitLab Personal Access Tokens
GitLab API Documentation
n8n Documentation - Creating Custom Nodes
Tokenization Concepts (for chunking) (general reference for token-based splitting)

GitLab Code SplitterInstall