GitLab Code Splitter

Split GitLab repository code into manageable chunks

Overview

The "GitLab Code Splitter" node is designed to split code files from a GitLab repository into smaller, manageable chunks. This is particularly useful for processing large codebases where handling entire files at once is impractical, such as for code analysis, indexing, or feeding into language models that have token limits.

The Split by Layer operation focuses on splitting files belonging to a specific architectural layer within the repository (e.g., domain, application, adapters). It fetches files from the specified GitLab project and branch, filters them by the chosen layer paths and file extensions, and splits their content according to configurable chunking options.

Practical Examples

  • Splitting the domain layer of a microservices repository to analyze business logic separately.
  • Processing only test files in a repository to generate test coverage reports or summaries.
  • Extracting command files for deployment automation scripts to feed into a documentation generator.

Properties

Name Meaning
GitLab URL The URL of the GitLab instance hosting the repository (e.g., https://gitlab.com).
Project ID Identifier of the GitLab project in the format group/project-name.
GitLab Token Personal access token for authenticating with GitLab API. Must start with glpat-, gldt-, or gloas-.
Branch Git branch to process (default: main).
Target Path Optional prefix path for the processed files in the output metadata.
Service Optional service identifier to include in the output metadata.
Layer Type Architectural layer to process. Options are: domain, application, adapters, ports, tests, command. Each corresponds to predefined folder paths within the repository.
File Extensions List of file extensions to include when processing files (e.g., .go). Multiple extensions can be specified.
Exclude Paths List of directory paths to exclude from processing (e.g., .git, vendor, node_modules).
Split Options Collection of options controlling how files are split into chunks:
• Max Tokens: Maximum tokens per chunk (default 800)
• Overlap: Token overlap between chunks (default 50)
• Min Chunk Size: Minimum tokens per chunk (default 100)
• Preserve Newlines: Whether to keep newline characters (default true)
Max File Size Maximum size in bytes of files to process (default 2MB). Files larger than this are skipped.

Output

The node outputs an array of JSON objects representing the split chunks of code from the selected architectural layer. Each object typically contains:

  • The chunked code content.
  • Metadata including original file path, target path prefix (if any), service identifier (if provided).
  • Possibly other information related to the chunking process.

No binary data output is produced by this node.

Dependencies

  • Requires a valid personal access token for GitLab with appropriate permissions to read the repository.
  • Needs network access to the configured GitLab instance.
  • Requires an external API endpoint (configured via credentials) that performs the actual splitting logic (/api/split-gitlab-repo).
  • The node expects the API key credential for authentication with the splitting service.

Troubleshooting

  • Invalid API URL: If the API URL in credentials is malformed, the node will throw an error. Ensure the URL is correct and accessible.
  • Invalid API Key: The API key must be 10-200 characters long, alphanumeric with hyphens or underscores, and not a placeholder value. Check the key format if errors occur.
  • Invalid GitLab URL: The GitLab URL must be a valid URL string.
  • Invalid GitLab Token: The token must start with glpat-, gldt-, or gloas-. Using an incorrect token format will cause errors.
  • Timeouts: Processing large repositories or many files may take time; the request timeout is set to 1 hour. Network issues or very large repos might cause failures.
  • File Size Limits: Files exceeding the max file size setting are skipped silently; adjust the limit if needed.
  • Empty Layer Paths: If the selected layer does not map to any known paths, no files will be processed.

Links and References

Discussion