Fuzzy Debugger icon

Fuzzy Debugger

Debug and compare fuzzy matching results from different algorithms

Overview

The Fuzzy Debugger node analyzes and compares fuzzy matching results from multiple algorithms. It is useful for debugging and evaluating the performance of different fuzzy matching algorithms applied to datasets, helping users understand agreement, conflicts, differences, and quality metrics among algorithms. Practical applications include comparing match results from algorithms like Levenshtein, Jaro-Winkler, or token-based methods to select the best approach for data deduplication or record linkage.

Use Case Examples

  1. Comparing match results from three fuzzy matching algorithms to identify where they agree or conflict.
  2. Generating detailed comparison reports showing match scores and matched records for quality assessment.
  3. Calculating statistics and quality metrics to recommend the best algorithm for a given dataset.

Properties

Name Meaning
Analysis Mode Determines the type of analysis to perform on the fuzzy matching results, such as showing statistics, detailed comparisons, pairwise comparisons, quality metrics, agreements, conflicts, or differences.
Algorithm Names Comma-separated names for each input algorithm in order. If empty, defaults to 'Algorithm 1', 'Algorithm 2', etc.
Identifier Configuration Configuration for identifying source and target records, including field names for unique IDs and prefixes for source and target fields in output.
Filter Options Options to filter the output records based on score variance, score difference, and whether to show only matched or unmatched records.
Output Options Options controlling the output content, including whether to include field scores, source data, target data, limit results, and sorting order.

Output

JSON

  • algorithm_name - Name of the algorithm (for statistics and quality metrics outputs)
  • total_records - Total records compared in pairwise comparison
  • matched_records - Number of records matched by the algorithm
  • unmatched_records - Number of records unmatched by the algorithm
  • match_rate_percent - Match rate percentage for the algorithm
  • average_score - Average match score among algorithms for a record
  • median_score - Median match score for the algorithm
  • min_score - Minimum match score found
  • max_score - Maximum match score found
  • standard_deviation - Standard deviation of match scores
  • source_id - Unique identifier of the source record (for detailed comparisons)
  • agreement_level - Level of agreement among algorithms for a record (e.g., full, partial, conflict)
  • best_score - Best match score among algorithms for a record
  • worst_score - Worst match score among algorithms for a record
  • score_variance - Variance of match scores among algorithms for a record
  • score_difference - Difference between best and worst scores for a record
  • algoX_name - Name of algorithm X
  • algoX_matched - Whether algorithm X matched the record
  • algoX_score - Match score from algorithm X
  • algoX_target_id - Matched target record ID from algorithm X
  • algoX_target_field - Target record fields if included in output
  • algoX_field_fieldName - Individual field match scores if included in output
  • algorithm_1 - Name of first algorithm in pairwise comparison
  • algorithm_2 - Name of second algorithm in pairwise comparison
  • same_matches - Number of records where both algorithms matched the same target
  • different_matches - Number of records where algorithms matched different targets
  • only_algo1_matched - Number of records matched only by first algorithm
  • only_algo2_matched - Number of records matched only by second algorithm
  • average_score_difference - Average difference in scores between the two algorithms
  • agreement_rate_percent - Percentage of agreement between the two algorithms

Dependencies

  • Requires input from at least two FuzzyRecordLinking node outputs for comparison.

Troubleshooting

  • Error if no input data is provided: Ensure at least two FuzzyRecordLinking node outputs are connected.
  • Error if only one input is provided: Connect multiple algorithm outputs for comparison.
  • Error if no valid records found: Check identifier configuration for source and target ID fields.
  • Error if all match scores are zero: Enable 'Include Match Score' in FuzzyRecordLinking node output options.
  • Unknown analysis mode error: Verify the selected analysis mode is valid.

Discussion