ReLinker 1.2.0

.NET 8.0

dotnet add package ReLinker --version 1.2.0

NuGet\Install-Package ReLinker -Version 1.2.0

This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.

<PackageReference Include="ReLinker" Version="1.2.0" />

For projects that support PackageReference, copy this XML node into the project file to reference the package.

<PackageVersion Include="ReLinker" Version="1.2.0" />
                    

                            Directory.Packages.props

<PackageReference Include="ReLinker" />
                    

                            Project file

For projects that support Central Package Management (CPM), copy this XML node into the solution Directory.Packages.props file to version the package.

paket add ReLinker --version 1.2.0

The NuGet Team does not provide support for this client. Please contact its maintainers for support.

#r "nuget: ReLinker, 1.2.0"

#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.

#:package ReLinker@1.2.0

#:package directive can be used in C# file-based apps starting in .NET 10 preview 4. Copy this into a .cs file before any lines of code to reference the package.

#addin nuget:?package=ReLinker&version=1.2.0
                    

                            Install as a Cake Addin

#tool nuget:?package=ReLinker&version=1.2.0
                    

                            Install as a Cake Tool

The NuGet Team does not provide support for this client. Please contact its maintainers for support.

ReLinker

ReLinker is a fast and flexible record linkage library for .NET that helps you find matching records across different datasets. Think of it as a smart way to connect customer records from different databases, even when the data isn't perfectly clean or consistent.

Built on the proven Fellegi-Sunter methodology, ReLinker handles the heavy lifting of comparing records at scale while giving you fine-grained control over how matches are found and scored.

Why ReLinker?

Record linkage is tricky. You need to balance accuracy with performance, handle messy real-world data, and often work with millions of records. ReLinker was designed to solve these challenges:

Smart blocking reduces comparisons from quadratic to manageable
Multi-level comparisons let you define nuanced similarity rules
EM training automatically learns the best parameters from your data
Hybrid memory caching keeps things fast even with large datasets
Parallel processing takes advantage of modern multi-core machines

Getting Started

Installation

dotnet add package ReLinker
dotnet add package Microsoft.Extensions.Logging.Console  # optional, for logging

Your First Linkage Job

Let's say you have two CSV files with customer data that you want to link together. Here's how you'd set that up:

1. Create Your Data Mapper

First, you need to tell ReLinker how to read and clean your data by implementing IRecordMapper:

using ReLinker;

public sealed class CustomerMapper : IRecordMapper
{
    public Record Map(Dictionary<string, string> row)
    {
        // Every record needs a unique ID - return null to skip this row
        if (!row.TryGetValue("Id", out var id) || string.IsNullOrWhiteSpace(id))
            return null;

        // Helper to safely get values
        string Get(string field) => row.TryGetValue(field, out var value) ? 
            value?.Trim() ?? string.Empty : string.Empty;
        
        // Normalize text for better matching
        string Normalize(string text) => text.Trim().ToLowerInvariant();

        var fields = new Dictionary<string, string>(StringComparer.OrdinalIgnoreCase)
        {
            ["FirstName"] = Get("FirstName"),
            ["LastName"] = Get("LastName"),
            ["Email"] = Get("Email"),
            ["Phone"] = Get("Phone"),
            
            // Pre-compute normalized versions for blocking and comparison
            ["FullNameNorm"] = Normalize($"{Get("FirstName")} {Get("LastName")}"),
            ["EmailNorm"] = Normalize(Get("Email")),
        };

        return new Record(id, fields);
    }
}

2. Set Up Data Sources

Create sources from your CSV files using CsvSource:

var leftFile = CsvSource.From("customers_db1.csv", new CustomerMapper());
var rightFile = CsvSource.From("customers_db2.csv", new CustomerMapper());

3. Configure Your Output Sink

Implement IScoredPairSink to handle the results:

public sealed class MatchSink : IScoredPairSink
{
    private readonly StreamWriter _writer;
    
    public MatchSink(string outputPath)
    {
        _writer = new StreamWriter(File.Create(outputPath));
        _writer.WriteLine("LeftId,RightId,MatchScore,LeftName,RightName");
    }

    public ValueTask WriteAsync(string leftId, string rightId, double score, 
                                Record left, Record right)
    {
        var leftName = left.Fields.GetValueOrDefault("FullNameNorm", "");
        var rightName = right.Fields.GetValueOrDefault("FullNameNorm", "");
        
        _writer.WriteLine($"{leftId},{rightId},{score:F4},{leftName},{rightName}");
        return ValueTask.CompletedTask;
    }

    public async ValueTask DisposeAsync()
    {
        await _writer.FlushAsync();
        _writer.Dispose();
    }
}

4. Configure Linkage Settings

var settings = new LinkSettings
{
    // Use probability-based scoring (more intuitive than raw scores)
    UseProbabilityThreshold = true,
    OutputScoreAsProbability = true,
    MatchThreshold = 0.90,  // Only accept matches with 90%+ confidence
    
    // Performance optimizations
    UseValueSpecificU = true,           // Better accuracy for exact matches
    EnableBucketMemoryCache = true,     // Keep frequently used data in memory
    BucketCacheMaxBuckets = 32,
    UseParallelism = true,
    MaxDegreeOfParallelism = 0         // Use all available CPU cores
};

5. Set Up Blocking Rules

// Only compare records that have similar names OR the same email domain
settings.Blocking.Add(Block.OnPrefix("FullNameNorm", 4));  // First 4 characters of name
settings.Blocking.Add(Block.OnPrefix("EmailNorm", 10));    // Email prefix matching

6. Define Comparison Logic

// Multi-level name comparison (most sophisticated approach)
settings.Comparisons.Add(
    Compare.Levels(
        "FullNameNorm",
        CompareLevels.Exact("exact_match"),                    // Perfect match
        CompareLevels.JaroWinklerAtLeast(0.95, "very_close"),  // Almost perfect
        CompareLevels.JaroWinklerAtLeast(0.85, "close"),       // Pretty similar
        CompareLevels.NullOrEmpty("missing_name"),             // Handle missing data
        CompareLevels.Else("different")                        // Everything else
    )
);

// Provide probabilities for each level (these will be refined during training)
var nameMatchProbs = new double[] { 0.95, 0.80, 0.50, 0.10, 0.05 };  // m probabilities
var nameNoMatchProbs = new double[] { 0.01, 0.05, 0.20, 0.30, 0.95 }; // u probabilities

settings.LevelMProbsPerComparison = new() { nameMatchProbs };
settings.LevelUProbsPerComparison = new() { nameNoMatchProbs };

7. Run the Linkage

// Set up logging (optional but helpful)
using var loggerFactory = LoggerFactory.Create(b => 
    b.AddSimpleConsole().SetMinimumLevel(LogLevel.Information));
var logger = loggerFactory.CreateLogger<Linker>();

// Configure EM training options
var trainingOptions = new EMOptions
{
    MaxIterations = 20,
    Tolerance = 1e-6,
    UseParallelism = true,
    UseBucketMemoryCache = true,
    BucketCacheMaxBuckets = 64
};

// Create output sink
await using var outputSink = new MatchSink("customer_matches.csv");

// Run the entire process: train parameters, then find matches
await Linker.RunAsync(
    settings,
    leftFile,
    rightFile, 
    outputSink,
    logger,
    train: true,  // Let EM training improve the initial parameters
    emOptions: trainingOptions
);

Console.WriteLine("Linkage complete! Check customer_matches.csv for results.");

Complete API Reference

Core Classes

`Record`

Represents a single record with an ID and field dictionary.

Constructor:

Record(string id, Dictionary<string, string> fields) - Creates a new record

Properties:

string Id { get; } - Unique identifier for this record
Dictionary<string, string> Fields { get; } - Case-insensitive field dictionary

Methods:

bool TryGet(string fieldName, out string value) - Safely retrieve a field value

`LinkSettings`

Main configuration class for the linkage process.

Linkage Behavior Properties:

LinkType LinkType { get; set; } - LinkOnly (default), DedupeOnly, or LinkAndDedupe
double MatchThreshold { get; set; } - Minimum score to accept a match (default: 0.0)
bool UseProbabilityThreshold { get; set; } - Threshold on probability vs raw LLR (default: true)
bool OutputScoreAsProbability { get; set; } - Output probability vs LLR (default: true)
double ProbabilityTwoRandomRecordsMatch { get; set; } - Prior probability (default: 1e-6)

Performance Properties:

bool UseParallelism { get; set; } - Enable parallel processing (default: false)
int MaxDegreeOfParallelism { get; set; } - CPU cores to use, 0 = all (default: 0)
int BucketCount { get; set; } - Number of disk buckets (default: 4096)
bool EnableBucketMemoryCache { get; set; } - Hybrid memory+disk cache (default: true)
int BucketCacheMaxBuckets { get; set; } - Max buckets in memory (default: 32)
int OutputBatchSize { get; set; } - Batch size for sink writes (default: 1024)

Advanced Properties:

bool UseValueSpecificU { get; set; } - Use term frequency for exact matches (default: false)
long TargetBucketFileSizeBytes { get; set; } - Target bucket file size (default: 64MB)
int RightSampleForSizing { get; set; } - Records to sample for auto-sizing (default: 10000)

Configuration Lists:

List<IBlockRule> Blocking { get; } - Blocking rules to reduce candidate pairs
List<IComparison> Comparisons { get; } - Field comparison rules
double[] MProbs { get; set; } - Match probabilities for non-levelled comparisons
double[] UProbs { get; set; } - Non-match probabilities for non-levelled comparisons
List<double[]> LevelMProbsPerComparison { get; set; } - Match probabilities per level
List<double[]> LevelUProbsPerComparison { get; set; } - Non-match probabilities per level

`Linker`

Main class for running record linkage.

Static Methods:

static Linker Create(LinkSettings settings, ILogger<Linker> logger) - Create a new linker instance
static Task RunAsync(LinkSettings settings, IRecordSource left, IRecordSource right, IScoredPairSink sink, ILogger<Linker> logger, bool train = false, EMOptions emOptions = null, CancellationToken cancellationToken = default) - One-shot convenience method to train and predict

Instance Methods:

Linker InputLeft(IRecordSource left) - Set the left dataset source (fluent)
Linker InputRight(IRecordSource right) - Set the right dataset source (fluent)
Task TrainAsync(EMOptions options = null, CancellationToken cancellationToken = default) - Run EM training to learn parameters
Task PredictAsync(IScoredPairSink sink, CancellationToken cancellationToken = default) - Find and output matches

`EMOptions`

Configuration for Expectation-Maximization training.

Convergence Properties:

int MaxIterations { get; set; } - Maximum EM iterations (default: 20)
double Tolerance { get; set; } - Convergence tolerance on log-likelihood (default: 1e-5)
double Smoothing { get; set; } - Laplace smoothing to avoid zeros (default: 1e-6)

Estimation Properties:

bool EstimateLambda { get; set; } - Learn the match prior probability (default: false)

Performance Properties:

bool UseParallelism { get; set; } - Parallel E-step processing (default: true)
bool DeduplicateCandidatesPerLeft { get; set; } - Remove duplicate right candidates (default: true)
bool UseBucketMemoryCache { get; set; } - Use hybrid memory cache (default: true)
int BucketCacheMaxBuckets { get; set; } - Max buckets in memory (default: 16)

Sampling Properties:

int? SampleLeftEveryN { get; set; } - Subsample left records (default: null)
int? MaxCandidatePairsPerIteration { get; set; } - Cap pairs per iteration (default: null)

Data Sources and Sinks

`IRecordSource`

Interface for reading records asynchronously.

Methods:

IAsyncEnumerable<Record> ReadAsync(CancellationToken cancellationToken = default) - Stream records

`CsvSource : IRecordSource`

Built-in CSV file reader.

Static Methods:

static CsvSource From(string path, IRecordMapper mapper) - Create from file path and mapper

`IRecordMapper`

Interface for converting raw CSV rows to Records.

Methods:

Record Map(Dictionary<string, string> row) - Convert a CSV row, return null to skip

`IScoredPairSink : IAsyncDisposable`

Interface for handling match results.

Methods:

ValueTask WriteAsync(string id1, string id2, double score, Record record1, Record record2) - Handle a match
ValueTask DisposeAsync() - Clean up resources

Blocking Rules

Blocking rules reduce the number of candidate pairs by only comparing records that share certain characteristics.

`Block` Static Factory Class

Methods:

static IBlockRule OnPrefix(string fieldName, int prefixLength, bool toLower = true) - Match on first N characters
static IBlockRule OnExact(string fieldName, bool toLower = true) - Exact field match
static IBlockRule OnConcatExact(char separator = '|', bool toLower = true, params string[] fields) - Exact match on concatenated fields
static IBlockRule OnInitialAndSurnamePrefix(string firstNameField, string surnameField, int surnamePrefix, bool toLower = true) - First initial + surname prefix
static IBlockRule OnSoundex(string fieldName) - Phonetic matching using Soundex

Examples:

settings.Blocking.Add(Block.OnPrefix("LastName", 3));           // First 3 chars of surname
settings.Blocking.Add(Block.OnExact("ZipCode"));               // Exact zip code match
settings.Blocking.Add(Block.OnConcatExact('_', true, "City", "State"));  // City_State key
settings.Blocking.Add(Block.OnSoundex("LastName"));            // Phonetically similar surnames

Comparison Methods

`Compare` Static Factory Class

Single-Field Continuous Comparisons:

static IComparison Jaro(string fieldName) - Jaro string similarity
static IComparison JaroWinkler(string fieldName) - Jaro-Winkler similarity (emphasizes common prefixes)
static IComparison Levenshtein(string fieldName) - Normalized Levenshtein edit distance
static IComparison TfIdf(string fieldName, Dictionary<string, double> idf) - TF-IDF cosine similarity

Multi-Level Comparisons:

static IComparison Levels(string fieldName, params IComparisonLevel[] levels) - Create a multi-level comparison

`CompareLevels` Static Factory Class

Factory for creating individual comparison levels.

String Matching Levels:

static IComparisonLevel Exact(bool ignoreCase = true, string label = "exact") - Perfect match
static IComparisonLevel JaroAtLeast(double threshold, string label = null) - Jaro ≥ threshold
static IComparisonLevel JaroWinklerAtLeast(double threshold, string label = null) - Jaro-Winkler ≥ threshold
static IComparisonLevel LevenshteinSimilarityAtLeast(double threshold, string label = null) - Levenshtein similarity ≥ threshold
static IComparisonLevel JaccardTokensAtLeast(double threshold, string label = null) - Jaccard token similarity ≥ threshold

Specialized Levels:

static IComparisonLevel NullOrEmpty(string label = "null_or_empty") - Either field is null/empty
static IComparisonLevel SoundexEqual(string label = "soundex_equal") - Phonetically equal
static IComparisonLevel NumericWithin(double tolerance, string label = null) - Numeric values within tolerance
static IComparisonLevel DateWithinDays(int days, string label = null) - Dates within N days
static IComparisonLevel Else(string label = "else") - Catch-all level (always matches)

Example Multi-Level Comparison:

settings.Comparisons.Add(
    Compare.Levels(
        "PersonName",
        CompareLevels.Exact("exact"),                           // Perfect match
        CompareLevels.JaroWinklerAtLeast(0.95, "very_close"),   // Almost identical
        CompareLevels.JaroWinklerAtLeast(0.85, "close"),        // Pretty similar
        CompareLevels.SoundexEqual("sounds_alike"),             // Phonetically similar
        CompareLevels.NullOrEmpty("missing"),                   // Handle missing data
        CompareLevels.Else("different")                         // Everything else
    )
);

String Similarity Classes

All similarity classes implement IStringSimilarity with a single method:

double Compute(string inputString1, string inputString2) - Returns similarity in [0,1]

`JaroSimilarity`

Classic Jaro string similarity algorithm. Good for names and short strings.

`JaroWinklerSimilarity`

Jaro-Winkler algorithm that gives extra weight to common prefixes.

Constructor:

JaroWinklerSimilarity(double prefixScale = 0.1, int maxPrefix = 4) - Customize prefix weighting

`LevenshteinSimilarity`

Optimized Levenshtein edit distance, converted to similarity (1 - distance/maxLength).

`JaccardTokenSimilarity`

Jaccard similarity on word tokens. Good for addresses and multi-word fields.

`TfIdfSimilarity`

TF-IDF cosine similarity using pre-computed IDF weights.

Constructor:

TfIdfSimilarity(Dictionary<string, double> idf) - Provide IDF dictionary

Advanced Features

Value-Specific U

When UseValueSpecificU = true, exact matches use the actual frequency of the matched value in the right dataset instead of the learned u parameter. This dramatically improves accuracy for rare exact matches.

Hybrid Memory Caching

ReLinker can keep frequently-accessed bucket files parsed in memory to avoid repeated disk reads:

EnableBucketMemoryCache = true - Enable the feature
BucketCacheMaxBuckets = N - Keep up to N buckets in memory (LRU eviction)

Parallel Processing

Enable parallel processing for both training and prediction:

UseParallelism = true - Enable parallel candidate scoring
MaxDegreeOfParallelism = 0 - Use all CPU cores (or specify a number)

Common Patterns and Recipes

High-Precision Linkage (Few False Positives)

settings.MatchThreshold = 0.95;          // Very strict threshold
settings.UseValueSpecificU = true;       // Better handling of rare exact matches

// Use restrictive blocking
settings.Blocking.Add(Block.OnExact("Email"));      // Only compare same email domain
settings.Blocking.Add(Block.OnPrefix("Phone", 6));  // Similar phone prefixes

// Multi-level comparison with exact match having high weight
settings.Comparisons.Add(
    Compare.Levels("FullName",
        CompareLevels.Exact("exact"),                    // Very high m, very low u
        CompareLevels.JaroWinklerAtLeast(0.95, "close"), // Still high confidence
        CompareLevels.Else("different")                  // Low confidence
    )
);

High-Recall Linkage (Find More Matches)

settings.MatchThreshold = 0.75;          // More lenient threshold

// Multiple blocking strategies for broader coverage
settings.Blocking.Add(Block.OnPrefix("LastName", 3));
settings.Blocking.Add(Block.OnPrefix("FirstName", 2));
settings.Blocking.Add(Block.OnSoundex("LastName"));         // Phonetic matching
settings.Blocking.Add(Block.OnInitialAndSurnamePrefix("FirstName", "LastName", 4));

// More granular comparison levels
settings.Comparisons.Add(
    Compare.Levels("FullName",
        CompareLevels.Exact("exact"),
        CompareLevels.JaroWinklerAtLeast(0.95, "very_close"),
        CompareLevels.JaroWinklerAtLeast(0.90, "close"),
        CompareLevels.JaroWinklerAtLeast(0.80, "somewhat_close"),
        CompareLevels.SoundexEqual("sounds_alike"),
        CompareLevels.Else("different")
    )
);

Large Dataset Processing

settings.BucketCount = 8192;                    // More, smaller bucket files
settings.EnableBucketMemoryCache = true;       
settings.BucketCacheMaxBuckets = 128;           // Use more memory for caching
settings.OutputBatchSize = 4096;                // Larger output batches
settings.UseParallelism = true;
settings.MaxDegreeOfParallelism = 0;            // Use all cores

var emOptions = new EMOptions
{
    UseBucketMemoryCache = true,
    BucketCacheMaxBuckets = 256,                // Even more cache for training
    UseParallelism = true,
    MaxCandidatePairsPerIteration = 1_000_000   // Cap training pairs per iteration
};

Multi-Field Comparison

// Name comparison with multiple levels
settings.Comparisons.Add(
    Compare.Levels("FullName",
        CompareLevels.Exact(),
        CompareLevels.JaroWinklerAtLeast(0.90),
        CompareLevels.SoundexEqual(),
        CompareLevels.Else()
    )
);

// Address comparison
settings.Comparisons.Add(
    Compare.Levels("Address", 
        CompareLevels.Exact(),
        CompareLevels.JaroWinklerAtLeast(0.85),
        CompareLevels.JaccardTokensAtLeast(0.75),  // Good for addresses
        CompareLevels.Else()
    )
);

// Phone number comparison
settings.Comparisons.Add(
    Compare.Levels("Phone",
        CompareLevels.Exact(),
        CompareLevels.NumericWithin(0),  // Treat as numbers if possible
        CompareLevels.Else()
    )
);

// Provide separate m/u arrays for each comparison
settings.LevelMProbsPerComparison = new()
{
    new double[] { 0.95, 0.80, 0.60, 0.05 },  // Name
    new double[] { 0.90, 0.70, 0.50, 0.05 },  // Address  
    new double[] { 0.98, 0.85, 0.02 }          // Phone
};

settings.LevelUProbsPerComparison = new()
{
    new double[] { 0.01, 0.05, 0.15, 0.95 },  // Name
    new double[] { 0.02, 0.10, 0.20, 0.95 },  // Address
    new double[] { 0.001, 0.01, 0.99 }        // Phone
};

Troubleshooting

"Not enough matches found"

Lower your MatchThreshold
Add more blocking rules for better coverage (Block.OnPrefix, Block.OnSoundex)
Check if your field names are consistent between left/right datasets
Verify your data mapper is working correctly with sample data
Use train: true to let EM improve your initial parameter guesses

"Too many false positives"

Increase your MatchThreshold
Add more discriminative comparison levels
Improve data normalization in your IRecordMapper
Set UseValueSpecificU = true for better exact match handling
Use more restrictive blocking rules

"Process is too slow"

Set UseParallelism = true and tune MaxDegreeOfParallelism
Increase BucketCacheMaxBuckets if you have available memory
Make your blocking rules more restrictive (fewer candidates per record)
Increase OutputBatchSize for slow output sinks
Consider using EMOptions.SampleLeftEveryN for faster training on large datasets

"Out of memory errors"

Reduce BucketCacheMaxBuckets
Increase BucketCount to create smaller bucket files
Use more restrictive blocking to reduce candidate set size
Process data in smaller chunks

"Training not converging"

Increase EMOptions.MaxIterations (try 30-50)
Decrease EMOptions.Tolerance (try 1e-7)
Check that your initial m/u parameters make sense
Ensure you have enough training data
Verify your comparison levels are well-designed

Performance Characteristics

ReLinker is designed to handle large datasets efficiently:

Memory usage: Configurable via bucket cache settings. Can run in low-memory mode (cache disabled) or high-memory mode (large cache).
Disk I/O: Minimized through hybrid caching and sequential file access patterns.
CPU scaling: Near-linear scaling with core count for candidate scoring (I/O remains single-threaded).
Typical throughput: 10K-100K candidate pairs per second on modern hardware, depending on comparison complexity.

Product	Compatible and additional computed target framework versions.
.NET	net8.0 is compatible. net8.0-android was computed. net8.0-browser was computed. net8.0-ios was computed. net8.0-maccatalyst was computed. net8.0-macos was computed. net8.0-tvos was computed. net8.0-windows was computed. net9.0 was computed. net9.0-android was computed. net9.0-browser was computed. net9.0-ios was computed. net9.0-maccatalyst was computed. net9.0-macos was computed. net9.0-tvos was computed. net9.0-windows was computed. net10.0 was computed. net10.0-android was computed. net10.0-browser was computed. net10.0-ios was computed. net10.0-maccatalyst was computed. net10.0-macos was computed. net10.0-tvos was computed. net10.0-windows was computed.

Product

.NET

Compatible target framework(s)

Included target framework(s) (in package)

Learn more about Target Frameworks and .NET Standard.

net8.0
- Microsoft.Extensions.Logging.Console (>= 9.0.6)

NuGet packages

This package is not used by any NuGet packages.

GitHub repositories

This package is not used by any popular GitHub repositories.

Version	Downloads	Last Updated
1.2.0	144	9/22/2025
1.1.0	111	6/21/2025
1.0.111	114	6/20/2025
1.0.15	111	6/20/2025
1.0.13	111	6/20/2025
1.0.12	113	6/20/2025
1.0.0	117	6/20/2025

ReLinker 1.2.0

ReLinker

Why ReLinker?

Getting Started

Installation

Your First Linkage Job

1. Create Your Data Mapper

2. Set Up Data Sources

3. Configure Your Output Sink

4. Configure Linkage Settings

5. Set Up Blocking Rules

6. Define Comparison Logic

7. Run the Linkage

Complete API Reference

Core Classes

Record

LinkSettings

Linker

EMOptions

Data Sources and Sinks

IRecordSource

CsvSource : IRecordSource

IRecordMapper

IScoredPairSink : IAsyncDisposable

Blocking Rules

Block Static Factory Class

Comparison Methods

Compare Static Factory Class

CompareLevels Static Factory Class

String Similarity Classes

JaroSimilarity

JaroWinklerSimilarity

LevenshteinSimilarity

JaccardTokenSimilarity

TfIdfSimilarity

Advanced Features

Value-Specific U

Hybrid Memory Caching

Parallel Processing

Common Patterns and Recipes

High-Precision Linkage (Few False Positives)

High-Recall Linkage (Find More Matches)

Large Dataset Processing

Multi-Field Comparison

Troubleshooting

"Not enough matches found"

"Too many false positives"

"Process is too slow"

"Out of memory errors"

"Training not converging"

Performance Characteristics

net8.0

NuGet packages

GitHub repositories

`Record`

`LinkSettings`

`Linker`

`EMOptions`

`IRecordSource`

`CsvSource : IRecordSource`

`IRecordMapper`

`IScoredPairSink : IAsyncDisposable`

`Block` Static Factory Class

`Compare` Static Factory Class

`CompareLevels` Static Factory Class

`JaroSimilarity`

`JaroWinklerSimilarity`

`LevenshteinSimilarity`

`JaccardTokenSimilarity`

`TfIdfSimilarity`