LinearTsvParser 1.0.7

Linear TSV Parser for .NET Core (read, write)

There is a newer version of this package available.
See the version list below for details.
Install-Package LinearTsvParser -Version 1.0.7
dotnet add package LinearTsvParser --version 1.0.7
<PackageReference Include="LinearTsvParser" Version="1.0.7" />
For projects that support PackageReference, copy this XML node into the project file to reference the package.
paket add LinearTsvParser --version 1.0.7
The NuGet Team does not provide support for this client. Please contact its maintainers for support.

Linear TSV Parser

Reading and writing Linear TSV files in a binary-safe, lossless way.

NuGet package

Available at: https://www.nuget.org/packages/LinearTsvParser

To include it in a .NET Core project:

dotnet add package LinearTsvParser

Examples

Reading a .tsv.gz file:

using System.IO;
using System.IO.Compression;
using System.Collections.Generic;
using LinearTsvParser;

public class Example {
    public void ReadTsv() {
        using (var input = File.OpenRead("/tmp/test.tsv.gz"))
        using (var gzip = new GZipStream(input, CompressionMode.Decompress))
        using (var tsvReader = new TsvReader(gzip)) {
            while(!tsvReader.EndOfStream) {
                List<string> fields = tsvReader.ReadLine();
            }
        }
    }
}

Writing a .tsv.gz file:

using System.IO;
using System.IO.Compression;
using System.Collections.Generic;
using LinearTsvParser;

public class Example {
    public void WriteTsv(List<string[]> data) {
        using (var outfile = File.Create("/tmp/test.tsv.gz"))
        using (var gzip = new GZipStream(outfile, CompressionMode.Compress))
        using (var tsvWriter = new TsvWriter(gzip)) {
            tsvWriter.WriteLine(new List<string>{ "One", "Two\tTwo", "Three" });

            foreach(string[] fields in data) {
                tsvWriter.WriteLine(fields);
            }
        }
    }
}

The writer accepts any enumerable of strings, let it be string[] or List&lt;string&gt;.

The Linear TSV format

  • Fields are separated by TAB characters
  • Text encoding is UTF-8
  • The reader can parse lines with any of these three endings: \n, \r\n, \r
  • The writer is restricted to output only the \n character as EOL
  • Special characters inside the fields are replaced (both ways):
    • Newline => &quot;\n&quot;
    • Carriage return => &quot;\r&quot;
    • Tab => &quot;\t&quot;
    • &quot;\&quot; (backslash) => &quot;\\&quot;
  • The column counts are not validated, they can vary per line.

Benchmark

The benchmark test compares the performace of this library with "native" solutions, which use string replace operations. The solution with string replace (native) uses slightly more memory and is a bit slover than this library (lib). The benchmark test can be found here: BenchTest.cs

| Method | Mean | Error | StdDev | Allocated |
|---------------- |---------:|---------:|---------:|----------:|
| LibReadTest | 275.5 ms | 7.15 ms | 21.08 ms | 62.31 MB |
| NativeReadTest | 309.8 ms | 10.08 ms | 29.26 ms | 66.66 MB |
| LibWriteTest | 110.8 ms | 2.81 ms | 8.25 ms | 23.52 MB |
| NativeWriteTest | 195.9 ms | 4.16 ms | 11.99 ms | 36.06 MB |

Linear TSV Parser

Reading and writing Linear TSV files in a binary-safe, lossless way.

NuGet package

Available at: https://www.nuget.org/packages/LinearTsvParser

To include it in a .NET Core project:

dotnet add package LinearTsvParser

Examples

Reading a .tsv.gz file:

using System.IO;
using System.IO.Compression;
using System.Collections.Generic;
using LinearTsvParser;

public class Example {
    public void ReadTsv() {
        using (var input = File.OpenRead("/tmp/test.tsv.gz"))
        using (var gzip = new GZipStream(input, CompressionMode.Decompress))
        using (var tsvReader = new TsvReader(gzip)) {
            while(!tsvReader.EndOfStream) {
                List<string> fields = tsvReader.ReadLine();
            }
        }
    }
}

Writing a .tsv.gz file:

using System.IO;
using System.IO.Compression;
using System.Collections.Generic;
using LinearTsvParser;

public class Example {
    public void WriteTsv(List<string[]> data) {
        using (var outfile = File.Create("/tmp/test.tsv.gz"))
        using (var gzip = new GZipStream(outfile, CompressionMode.Compress))
        using (var tsvWriter = new TsvWriter(gzip)) {
            tsvWriter.WriteLine(new List<string>{ "One", "Two\tTwo", "Three" });

            foreach(string[] fields in data) {
                tsvWriter.WriteLine(fields);
            }
        }
    }
}

The writer accepts any enumerable of strings, let it be string[] or List&lt;string&gt;.

The Linear TSV format

  • Fields are separated by TAB characters
  • Text encoding is UTF-8
  • The reader can parse lines with any of these three endings: \n, \r\n, \r
  • The writer is restricted to output only the \n character as EOL
  • Special characters inside the fields are replaced (both ways):
    • Newline => &quot;\n&quot;
    • Carriage return => &quot;\r&quot;
    • Tab => &quot;\t&quot;
    • &quot;\&quot; (backslash) => &quot;\\&quot;
  • The column counts are not validated, they can vary per line.

Benchmark

The benchmark test compares the performace of this library with "native" solutions, which use string replace operations. The solution with string replace (native) uses slightly more memory and is a bit slover than this library (lib). The benchmark test can be found here: BenchTest.cs

| Method | Mean | Error | StdDev | Allocated |
|---------------- |---------:|---------:|---------:|----------:|
| LibReadTest | 275.5 ms | 7.15 ms | 21.08 ms | 62.31 MB |
| NativeReadTest | 309.8 ms | 10.08 ms | 29.26 ms | 66.66 MB |
| LibWriteTest | 110.8 ms | 2.81 ms | 8.25 ms | 23.52 MB |
| NativeWriteTest | 195.9 ms | 4.16 ms | 11.99 ms | 36.06 MB |

  • .NETCoreApp 3.1

    • No dependencies.

NuGet packages (1)

Showing the top 1 NuGet packages that depend on LinearTsvParser:

Package Downloads
FatCatDB
Zero configuration, high performance database library for ETL workflows

GitHub repositories

This package is not used by any popular GitHub repositories.

Version History

Version Downloads Last updated
1.1.7 613 1/18/2020
1.1.6 230 1/14/2020
1.1.5 129 1/14/2020
1.1.4 177 1/12/2020
1.1.3 191 1/12/2020
1.1.2 196 1/12/2020
1.1.1 204 1/12/2020
1.1.0 196 1/12/2020
1.0.7 146 1/12/2020
1.0.6 142 1/12/2020
1.0.5 160 1/12/2020
1.0.4 159 1/12/2020
1.0.3 162 1/12/2020
1.0.2 220 1/12/2020
1.0.1 232 1/11/2020
1.0.0 242 1/11/2020
Show less