Overview
The Fuzzy Debugger node analyzes and compares fuzzy matching results from multiple algorithms. It is useful for debugging and evaluating the performance of different fuzzy matching algorithms applied to datasets, helping users understand agreement, conflicts, differences, and quality metrics among algorithms. Practical applications include comparing match results from algorithms like Levenshtein, Jaro-Winkler, or token-based methods to select the best approach for data deduplication or record linkage.
Use Case Examples
- Comparing match results from three fuzzy matching algorithms to identify where they agree or conflict.
- Generating detailed comparison reports showing match scores and matched records for quality assessment.
- Calculating statistics and quality metrics to recommend the best algorithm for a given dataset.
Properties
| Name | Meaning |
|---|---|
| Analysis Mode | Determines the type of analysis to perform on the fuzzy matching results, such as showing statistics, detailed comparisons, pairwise comparisons, quality metrics, agreements, conflicts, or differences. |
| Algorithm Names | Comma-separated names for each input algorithm in order. If empty, defaults to 'Algorithm 1', 'Algorithm 2', etc. |
| Identifier Configuration | Configuration for identifying source and target records, including field names for unique IDs and prefixes for source and target fields in output. |
| Filter Options | Options to filter the output records based on score variance, score difference, and whether to show only matched or unmatched records. |
| Output Options | Options controlling the output content, including whether to include field scores, source data, target data, limit results, and sorting order. |
Output
JSON
algorithm_name- Name of the algorithm (for statistics and quality metrics outputs)total_records- Total records compared in pairwise comparisonmatched_records- Number of records matched by the algorithmunmatched_records- Number of records unmatched by the algorithmmatch_rate_percent- Match rate percentage for the algorithmaverage_score- Average match score among algorithms for a recordmedian_score- Median match score for the algorithmmin_score- Minimum match score foundmax_score- Maximum match score foundstandard_deviation- Standard deviation of match scoressource_id- Unique identifier of the source record (for detailed comparisons)agreement_level- Level of agreement among algorithms for a record (e.g., full, partial, conflict)best_score- Best match score among algorithms for a recordworst_score- Worst match score among algorithms for a recordscore_variance- Variance of match scores among algorithms for a recordscore_difference- Difference between best and worst scores for a recordalgoX_name- Name of algorithm XalgoX_matched- Whether algorithm X matched the recordalgoX_score- Match score from algorithm XalgoX_target_id- Matched target record ID from algorithm XalgoX_target_field- Target record fields if included in outputalgoX_field_fieldName- Individual field match scores if included in outputalgorithm_1- Name of first algorithm in pairwise comparisonalgorithm_2- Name of second algorithm in pairwise comparisonsame_matches- Number of records where both algorithms matched the same targetdifferent_matches- Number of records where algorithms matched different targetsonly_algo1_matched- Number of records matched only by first algorithmonly_algo2_matched- Number of records matched only by second algorithmaverage_score_difference- Average difference in scores between the two algorithmsagreement_rate_percent- Percentage of agreement between the two algorithms
Dependencies
- Requires input from at least two FuzzyRecordLinking node outputs for comparison.
Troubleshooting
- Error if no input data is provided: Ensure at least two FuzzyRecordLinking node outputs are connected.
- Error if only one input is provided: Connect multiple algorithm outputs for comparison.
- Error if no valid records found: Check identifier configuration for source and target ID fields.
- Error if all match scores are zero: Enable 'Include Match Score' in FuzzyRecordLinking node output options.
- Unknown analysis mode error: Verify the selected analysis mode is valid.