CsvReaderAdvanced 1.2.0

There is a newer version of this package available.
See the version list below for details.
dotnet add package CsvReaderAdvanced --version 1.2.0                
NuGet\Install-Package CsvReaderAdvanced -Version 1.2.0                
This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.
<PackageReference Include="CsvReaderAdvanced" Version="1.2.0" />                
For projects that support PackageReference, copy this XML node into the project file to reference the package.
paket add CsvReaderAdvanced --version 1.2.0                
#r "nuget: CsvReaderAdvanced, 1.2.0"                
#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.
// Install CsvReaderAdvanced as a Cake Addin
#addin nuget:?package=CsvReaderAdvanced&version=1.2.0

// Install CsvReaderAdvanced as a Cake Tool
#tool nuget:?package=CsvReaderAdvanced&version=1.2.0                

CsvReaderAdvanced

The faster and most modern CSV reader adapted to DI principles.

Combine the power of the configuration JSON files with customized CSV reading.

How to install

Via tha Package Manager:

Install-Package CsvReaderAdvanced

Via the .NET CLI

dotnet add package CsvReaderAdvanced

How to use

First add the service to the ServiceCollection.

 builder.ConfigureServices((context, services) =>
        {
            services.AddCsvReader(context.Configuration);
        ...

Csv schemas via appsettings.json

To understand exactly what the method does, it assumes that the current configuration file contains a csvSchemas section, typically in the appsettings.json file:

public static IServiceCollection AddCsvReader(this IServiceCollection services, IConfiguration configuration)
{
    services.AddSingleton<ICsvReader,CsvReader>();
    services.AddTransient<ICsvFile,CsvFile>();

    //Microsoft.Extensions.Hosting must be referenced
    services.Configure<CsvSchemaOptions>(configuration.GetSection(CsvSchemaOptions.CsvSchemasSection));
    return services;
}

The schema in the appsettings.json file typically contains a property named csvSchemas:

"csvSchemas": {
    "schemas": [
      {
        "name": "products",
        "fields": [
          {
            "name": "ProductID",
            "alternatives": [ "Product ID" ],
            "required": true
          },
          {
            "name": "Weight",
            "unit": "t",
            "alternativeFields": [ "Volume", "TEU" ],
            "required": true
          },
          {
            "name": "Volume",
            "unit": "m^3",
            "alternativeUnits": [ "m3", "m^3" ]
...

We assume that we get the options via DI like the following example:

public Importer(
    IUnitOfWork context,
    IMapper mapper,
    IServiceProvider provider,
    ILogger logger,
    IOptions<CsvSchemaOptions> options)
{
    _context = context;
    _mapper = mapper;
    _provider = provider;
    _logger = logger;
    _options = options.Value;
}

protected readonly IUnitOfWork _context;
protected readonly IMapper _mapper;
protected readonly IServiceProvider _provider;
protected readonly ILogger _logger;
protected readonly CsvSchemaOptions _options;

public CsvSchema? GetSchema(string name) =>
    _options?.Schemas?.FirstOrDefault(s => s.Name == name);

public ValidationResult CheckForSchema(string name)
{
    if (_options?.Schemas is null || !_options.Schemas.Any())
    {
        _logger.LogError("Could not retrieve csv schemas from settings");
        return new ValidationResult(
            new ValidationFailure[] { new ValidationFailure("CsvSchemas", "Cannot retrieve csv schemas from settings") });
    }

    var schema = GetSchema(name);

    if (schema is null)
    {
        _logger.LogError("Could not retrieve '{schemaName}' schema from settings",name);
        return new ValidationResult(
            new ValidationFailure[] { new ValidationFailure(name, $"Cannot retrieve '{name}' schema from settings") });
    }
    return new ValidationResult();

}

Read the file

We instantiate a CsvFile in order to read the file. Note that the aforementioned CsvSchema is not needed if we do not have a header and/or do not want to validate the existence of fields. For the example below, we assume that a CsvSchema is checked.

//We assume that _provider is an IServiceProvider which is injected via DI
var file = _provider.GetCsvFile();
file.ReadFromFile(path, Encoding.UTF8, withHeader:true);

//the line above is equivalent to the 2 commands:
file.ReadFromFile(path, Encoding.UTF8);
file.PopulateColumns();

The PopulateColumns() method updates the internal ExistingColumns dictionary. The ExistingColumns dictionary is case insensitive and stores the index location for each column. The index location is zero-based. To check the existence of fields against a schema we should call the CheckAgainstSchema() method as shown below:

CsvScema schema = _options.Schemas.FirstOrDefault(s => s.Name == "products");
file.CheckAgainstSchema(schema);

The CheckAgainstSchema() method also calls the PopulateColumns() method if the ExistingColumns property is not populated. It then updates the ExistingFieldColumns dictionary, which is a dictionary of the column index location based on the field name. Additional properties (Hashsets) are populated: MissingFields, MissingRequiredFields.

Lines and ParsedValue

The most important updated property after the ReadFromFile() call is the Lines property, whic is a List of TokenizedLine? objects. The TokenizedLine struct contains the Tokens property which is a List of string objects. The power of this library is that each TokenizedLine may potentially span more than 1 lines. This can occur in the case of quoted strings which may span to the next line. In general all cases where quoted strings are met, are cases where a simple string.Split() cannot work. That's why the properties FromLine to ToLine exist. The latter are important for debugging purposes. The GetDouble/GetFloat/GetString/GetInt/GetByte/GetLong/GetDateTime/GetDateTimeOffset methods return a ParsedValue<T> struct. The ParsedValue is a useful wrapper the contains a Value, a IsParsed and a IsNull property.

var c = file.ExistingFieldColumns;

//we can use the following instead, in case we want to use the original field names within the header the CSV file
//var c = file.ExistingColumns;

foreach (var line in file.Lines)
{
    TokenizedLine l = line.Value;
    
    //for strings we can immediately retrieve the token based on the field name
    string name = l.Tokens[c["ProductName"]];

    var weightValue = l.GetDouble("Weight", c);
    if (!weightValue.Parsed)
        _logger.LogError("Cannot parse Weight {value} at line {line}.", weightValue.Value, l.FromLine);
    else
    {
        //implicit conversion to double if value exists
        double weight = weightValue;
    ...
    }

    //or implicit conversion to double? - can be both null or non null
    double? weight2 = weightValue;
...

Example 1 - Simple case without schema

Let's assume that we have a simple csv file with known headers. The simplest case is to use the ExistingColumns property. This is populated after the call to ReadFromFile when the withHeader argument is set to true.

Suppose that there are 3 labels in the header, namely: FullName, DoubleValue and IntValue representing a string, double and int field for each record. The sample content of the file is the following:

FullName;DoubleValue;IntValue
name1;20.0,4
name2;30.0,5

The full code to read them is then:

//build the app
var host = Host.CreateDefaultBuilder(args).ConfigureServices((c, s) => s.AddCsvReader(c.Configuration));
var app = host.Build();

//read the file
string path = @".\samples\hard.csv";
var file = app.Services.GetCsvFile();
file.ReadFromFile(path, Encoding.UTF8, withHeader: true) ;

//get the values
var c = file.ExistingColumns;
foreach (var l in file.Lines!)
{
    if (!l.HasValue) return;
    var t = l.Value.Tokens;
    string? v1 = l.Value.GetString("FullName", c);
    double? v2 = l.Value.GetDouble("DoubleValue", c);
    int? v3 = l.Value.GetInt("IntValue", c);
    ...
}

IS THAT ALL? Of course not. More examples are pending. The library is more powerful than it seems!

Product Compatible and additional computed target framework versions.
.NET net7.0 is compatible.  net7.0-android was computed.  net7.0-ios was computed.  net7.0-maccatalyst was computed.  net7.0-macos was computed.  net7.0-tvos was computed.  net7.0-windows was computed.  net8.0 was computed.  net8.0-android was computed.  net8.0-browser was computed.  net8.0-ios was computed.  net8.0-maccatalyst was computed.  net8.0-macos was computed.  net8.0-tvos was computed.  net8.0-windows was computed. 
Compatible target framework(s)
Included target framework(s) (in package)
Learn more about Target Frameworks and .NET Standard.

NuGet packages (1)

Showing the top 1 NuGet packages that depend on CsvReaderAdvanced:

Package Downloads
EndpointProviders

A modern way to add Dependency Injection used for Minimal API apps. See README.

GitHub repositories

This package is not used by any popular GitHub repositories.

Version Downloads Last updated
2.4.0 89 11/15/2024
2.3.8 92 10/16/2024
2.3.7 138 7/27/2024
2.3.6 101 7/27/2024
2.3.5 81 7/26/2024
2.3.4 94 7/12/2024
2.3.3 112 6/22/2024
2.3.2 100 6/22/2024
2.3.1 263 3/1/2024
2.3.0 217 12/13/2023
2.2.1 152 11/28/2023
2.2.0 181 10/17/2023
2.1.1 147 10/15/2023
2.1.0 132 10/15/2023
1.3.3 125 10/14/2023
1.3.2 120 10/14/2023
1.3.0 139 10/13/2023
1.2.6 144 9/29/2023
1.2.5 196 7/18/2023
1.2.4 192 7/16/2023
1.2.2 159 7/16/2023
1.2.1 170 7/14/2023
1.2.0 165 7/14/2023
1.1.15 171 7/14/2023
1.1.14 170 7/14/2023
1.1.13 181 7/7/2023
1.1.12 273 7/6/2023
1.1.11 158 7/5/2023
1.1.10 180 7/5/2023
1.1.9 159 6/27/2023
1.1.8 146 6/26/2023
1.1.7 148 6/24/2023
1.1.6 144 6/24/2023
1.1.5 147 6/23/2023
1.1.2 154 6/23/2023
1.0.28 165 6/23/2023
1.0.27 157 6/23/2023
1.0.26 140 6/19/2023
1.0.25 167 6/18/2023
1.0.24 147 6/18/2023
1.0.23 166 6/18/2023
1.0.22 154 6/18/2023
1.0.21 153 6/17/2023
1.0.20 153 6/17/2023
1.0.19 151 6/17/2023
1.0.18 154 6/17/2023
1.0.17 164 6/17/2023
1.0.16 150 6/17/2023
1.0.15 152 6/17/2023
1.0.12 146 6/17/2023
1.0.11 155 6/17/2023
1.0.10 139 6/17/2023
1.0.9 148 6/17/2023
1.0.8 157 6/17/2023
1.0.7 143 6/17/2023
1.0.6 156 6/16/2023
1.0.5 167 6/16/2023
1.0.4 146 6/16/2023 1.0.4 is deprecated because it is no longer maintained.
1.0.2 137 6/16/2023 1.0.2 is deprecated because it is no longer maintained.