Actions9
- Code Actions
- Repository Actions
- Stem4 Integration Actions
- System Actions
Overview
The "GitLab Code Splitter" node is designed to split code files from a GitLab repository into smaller, manageable chunks. This is particularly useful for processing large codebases where handling entire files at once is impractical, such as for code analysis, indexing, or feeding into language models that have token limits.
For the Repository resource with the Split All Files operation, the node fetches all files from a specified GitLab project and branch, filters them by file extensions and paths, and splits their content into chunks based on configurable token limits and overlap settings. This enables workflows that require granular processing of source code across an entire repository.
Practical examples:
- Preparing a large Go project’s source code for semantic search by splitting it into chunks.
- Feeding repository code into AI models for automated code review or documentation generation.
- Incrementally processing code files while excluding vendor or dependency directories.
Properties
| Name | Meaning |
|---|---|
| GitLab URL | The URL of the GitLab instance hosting the repository (e.g., https://gitlab.com). |
| Project ID | Identifier of the GitLab project in group/project-name format. |
| GitLab Token | Personal access token for authenticating with GitLab API. Must start with glpat-, gldt-, or gloas-. |
| Branch | Git branch name to process (default is main). |
| Target Path | Optional prefix path to prepend to processed files in output metadata. |
| Service | Optional service identifier to include in output metadata. |
| File Extensions | List of file extensions to include when processing files (e.g., .go, .js). Only files matching these extensions will be processed. |
| Exclude Paths | List of directory paths to exclude from processing (e.g., .git, vendor, node_modules). |
| Split Options | Collection of options controlling how files are split into chunks: • Max Tokens: Maximum tokens per chunk (default 800) • Overlap: Number of overlapping tokens between chunks (default 50) • Min Chunk Size: Minimum tokens per chunk (default 100) • Preserve Newlines: Whether to keep newline characters (default true) |
| Max File Size (Bytes) | Maximum size of files to process in bytes (default 2MB). Files larger than this size will be skipped. |
Output
The node outputs JSON objects representing the split chunks of code from the repository files. Each output item typically contains:
- Metadata about the original file (such as file path, target path, and optionally service identifier).
- The chunked content of the file, split according to the specified token limits and overlap.
- Additional information related to the splitting process.
No binary data output is produced by this operation.
Dependencies
- Requires a valid personal access token for GitLab with appropriate permissions to read the repository.
- Needs network access to the configured GitLab instance.
- Requires an external API endpoint (configured via credentials) that performs the actual splitting logic. This API must support endpoints like
/api/split-gitlab-repo. - The node expects the API key credential to provide an API URL and key for authentication.
Troubleshooting
- Invalid API URL provided in credentials: Ensure the API URL is a valid URL string.
- Valid API key is required and must be provided in credentials: Check that the API key is present, correctly formatted, and not a placeholder.
- Invalid GitLab URL provided: Verify the GitLab URL is correct and accessible.
- Invalid GitLab token format: The token must start with
glpat-,gldt-, orgloas-. Confirm you are using a proper personal access token. - Timeouts or slow responses: Processing large repositories can take time; the node sets a timeout of 1 hour for requests. Consider increasing resources or filtering files more narrowly.
- Files skipped due to size: Files larger than the configured max file size (default 2MB) will not be processed. Adjust the limit if needed.
Links and References
- GitLab Personal Access Tokens
- GitLab API Documentation
- General concepts of code chunking for NLP or AI model input preparation (no direct link available in code)