GitLab Code Splitter

Split GitLab repository code into manageable chunks

Overview

The "GitLab Code Splitter" node is designed to split code files from a GitLab repository into manageable chunks. This is particularly useful for processing large codebases, enabling easier analysis, indexing, or integration with other tools that require smaller pieces of code rather than entire files.

For the Repository resource and Split by Path operation, the node fetches files from specified directory paths within a GitLab project, filters them by file extensions and size, and splits their content into chunks based on token limits and overlap settings. This allows users to focus on specific parts of a repository, such as certain folders or modules, rather than processing the whole repository.

Practical examples:

  • Splitting all .go source files under src/ and lib/ directories in a GitLab project to prepare for code search indexing.
  • Processing only files in a tests/ folder to analyze test coverage or quality.
  • Extracting chunks of code from specific paths to feed into AI models for code review or documentation generation.

Properties

Name Meaning
GitLab URL The URL of the GitLab instance hosting the repository (e.g., https://gitlab.com).
Project ID Identifier of the GitLab project in the format group/project-name.
GitLab Token Personal access token for authenticating with GitLab API. Must start with glpat-, gldt-, or gloas-.
Branch Git branch to process (default: main).
Target Path Optional prefix path to prepend to processed files in the output metadata.
Service Optional service identifier to include in the output metadata.
Include Paths One or more directory paths within the repository to include for processing. Only files under these paths will be considered.
File Extensions List of file extensions to process (e.g., .go, .js). Only files matching these extensions will be included.
Exclude Paths Directory paths to exclude from processing (e.g., .git, vendor, node_modules). Files under these paths will be ignored.
Split Options Collection of options controlling how files are split into chunks:
- Max Tokens: Maximum tokens per chunk (default 800)
- Overlap: Token overlap between chunks (default 50)
- Min Chunk Size: Minimum tokens per chunk (default 100)
- Preserve Newlines: Whether to keep newline characters (default true)
Max File Size Maximum file size in bytes to process (default 2MB). Files larger than this size will be skipped.

Output

The node outputs an array of JSON objects representing the split chunks of code extracted from the specified repository paths. Each chunk typically contains:

  • The chunked code content.
  • Metadata including original file path, target path prefix (if any), service identifier (if provided), and possibly other relevant details about the chunk.

No binary data output is produced by this node; all output is structured JSON suitable for further processing or storage.

Dependencies

  • Requires a valid personal access token for GitLab with appropriate permissions to read the repository.
  • Needs network access to the specified GitLab instance.
  • Requires configuration of credentials in n8n containing:
    • An API URL for the external splitting service.
    • An API key credential for authentication with that service.
  • The node makes HTTP POST requests to an external API endpoint (/api/split-gitlab-repo) to perform the actual splitting logic.

Troubleshooting

  • Invalid GitLab URL: If the GitLab URL is malformed or unreachable, the node will throw an error. Ensure the URL is correct and accessible.
  • Invalid GitLab Token: The token must start with glpat-, gldt-, or gloas-. Using an incorrect or expired token will cause authentication failures.
  • API Key Issues: The node requires a valid API key credential for the external splitting service. The key must be 10-200 characters long, alphanumeric with hyphens or underscores, and not a placeholder value.
  • File Size Limits: Files exceeding the configured max file size (default 2MB) will be skipped silently. Adjust the limit if needed.
  • Empty Include Paths: If no include paths are specified for the "Split by Path" operation, no files will be processed. Make sure to provide at least one path.
  • Timeouts: Large repositories or many files may cause long processing times. The node sets a timeout of 1 hour for the request; ensure your environment can handle this duration.

Links and References

Discussion