Overview
This node performs data cleaning using Recursive Feature Elimination (RFE), a technique commonly used in machine learning to select the most relevant features for predictive modeling. It takes input data, applies RFE based on a specified target column and number of features to keep, and outputs the cleaned dataset with reduced features.
Typical use cases include:
- Preparing datasets by removing irrelevant or less important features before training models.
- Improving model performance by focusing on key predictors.
- Reducing dimensionality for easier data analysis.
For example, if you have a dataset with many columns but want to keep only the top 5 features that best predict a target variable, this node will help automate that selection.
Properties
| Name | Meaning |
|---|---|
| Target Column | The name of the target column for RFE; this is the dependent variable to predict. |
| Number of Features to Keep | Number of features to retain after applying RFE; determines how many top features remain. |
| Output Format | Choose the output format of the cleaned data: either JSON (default) or Table format. |
Output
The node outputs a single JSON object under the json field containing the cleaned dataset after feature elimination:
- When Output Format is set to
json, the output is:{ "cleanedData": { /* cleaned dataset as JSON */ } } - If an error occurs during execution, the output contains an error message and stack trace:
{ "error": "Error message", "stack": "Stack trace or 'No stack available'" }
The node does not output binary data.
Dependencies
- Requires Python to be installed and accessible via the command line.
- Depends on an external Python script (
rfe_script.py) located relative to the node's directory (../../model/rfe_script.py). - The Python script is expected to accept JSON data, target column name, and number of features as arguments, and return cleaned data in JSON format.
- No direct API keys or external web services are required.
Troubleshooting
Common issues:
- Python not installed or not in system PATH, causing the script execution to fail.
- The Python script file missing or path incorrect.
- Input data not properly formatted as JSON or missing the target column.
- Errors thrown by the Python script due to invalid parameters or data.
Error messages:
"Error executing RFE script": Indicates failure running the Python script. Check Python installation and script path.- Stack traces are provided when available to aid debugging.
Resolutions:
- Ensure Python is installed and accessible.
- Verify the presence and correctness of the
rfe_script.pyfile. - Confirm input data includes the specified target column.
- Validate the number of features parameter is a positive integer.