Actions9
- Code Actions
- Repository Actions
- Stem4 Integration Actions
- System Actions
Overview
The "Stem4 Integration" node with the "Split by Layer with GCP Upload" operation is designed to process source code stored in a GitLab repository by splitting it into manageable chunks based on architectural layers. After splitting, the processed code chunks are uploaded to Google Cloud Platform (GCP) Storage. This node is particularly useful for teams or systems that need to analyze, index, or transform large codebases by architectural concerns and want to leverage cloud storage for further processing or archival.
Typical use cases include:
- Preparing code for machine learning models or code analysis tools that require chunked inputs.
- Organizing code snippets by architectural layers (e.g., domain, application, adapters) for modular processing.
- Automating code ingestion pipelines where processed code is stored in GCP for downstream workflows.
Properties
| Name | Meaning |
|---|---|
| GitLab URL | The URL of the GitLab instance hosting the repository (e.g., https://gitlab.com). |
| Project ID | Identifier of the GitLab project in the format group/project-name. |
| GitLab Token | Personal access token for authenticating with GitLab. Must start with glpat-, gldt-, or gloas-. |
| Branch | The Git branch to process, defaulting to main. |
| Target Path | Optional prefix path for the processed files when uploaded to GCP Storage. |
| Service | Optional service identifier to include in the output metadata. |
| File Extensions | List of file extensions to include in processing (e.g., .go). Multiple extensions can be specified. |
| Exclude Paths | List of directory paths to exclude from processing (e.g., .git, vendor, node_modules). |
| Split Options | Collection of options controlling how code is split into chunks: • Max Tokens: Maximum tokens per chunk (default 800). • Overlap: Token overlap between chunks (default 50). • Min Chunk Size: Minimum tokens per chunk (default 100). • Preserve Newlines: Whether to keep newline characters (default true). |
| Max File Size | Maximum size in bytes of files to process (default 2MB). Files larger than this will be skipped. |
Output
The node outputs JSON data representing the result of the split and upload operation. The structure typically includes metadata about the processed files, their chunks, and possibly references or URLs to the uploaded content in GCP Storage.
- The
jsonoutput field contains details such as:- Project and branch information.
- Metadata about each processed file including its path and layer.
- Chunked code segments adhering to the specified split options.
- Information related to the upload status or location in GCP Storage.
No binary data output is produced by this node; all results are returned as structured JSON.
Dependencies
- Requires an API key credential for accessing a backend service that performs the splitting and uploading operations.
- Requires a valid GitLab personal access token with appropriate permissions to read the repository.
- The node interacts with GitLab repositories via the provided GitLab URL and token.
- Uploads processed code chunks to Google Cloud Platform Storage (GCP), so proper GCP configuration and permissions are assumed on the backend service side.
- The node expects the backend API URL and API key to be configured in n8n credentials (generic API key credential).
- Network connectivity to GitLab and the backend API service is required.
Troubleshooting
- Invalid API URL: If the API URL configured in credentials is malformed or unreachable, the node will throw an error indicating an invalid API URL. Verify the URL format and network accessibility.
- Invalid API Key: The node validates the API key format strictly. Ensure the API key is 10-200 characters long, contains only alphanumeric characters, hyphens, and underscores, and is not a placeholder value.
- Invalid GitLab Token: The GitLab token must start with one of the accepted prefixes (
glpat-,gldt-, orgloas-). Using an incorrect token format will cause an error. - File Size Limits: Files exceeding the maximum file size (default 2MB) are skipped. Adjust the "Max File Size" property if needed.
- Timeouts: Processing large repositories or many files may take significant time (up to 1 hour timeout). Network issues or slow responses might cause failures.
- Permission Issues: Ensure the GitLab token has sufficient permissions to read the repository contents.
- Empty or Incorrect Layer Paths: Selecting an unsupported architectural layer or misconfiguring include/exclude paths may result in no files being processed.