TikaOnDotnet.TextExtractor 1.17.1

Install-Package TikaOnDotnet.TextExtractor -Version 1.17.1
dotnet add package TikaOnDotnet.TextExtractor --version 1.17.1
<PackageReference Include="TikaOnDotnet.TextExtractor" Version="1.17.1" />
For projects that support PackageReference, copy this XML node into the project file to reference the package.
paket add TikaOnDotnet.TextExtractor --version 1.17.1
The NuGet Team does not provide support for this client. Please contact its maintainers for support.
#r "nuget: TikaOnDotnet.TextExtractor, 1.17.1"
#r directive can be used in F# Interactive, C# scripting and .NET Interactive. Copy this into the interactive tool or source code of the script to reference the package.
// Install TikaOnDotnet.TextExtractor as a Cake Addin
#addin nuget:?package=TikaOnDotnet.TextExtractor&version=1.17.1

// Install TikaOnDotnet.TextExtractor as a Cake Tool
#tool nuget:?package=TikaOnDotnet.TextExtractor&version=1.17.1
The NuGet Team does not provide support for this client. Please contact its maintainers for support.

Classes for running Apache Tika through **TikaOnDotNet**. Just use TextExtractor.Extract() and you'll be on your way.

NuGet packages (5)

Showing the top 5 NuGet packages that depend on TikaOnDotnet.TextExtractor:

Package Downloads
Contrib.Sitecore.ContentSearch.TikaOnDotnet

Contribution project for Sitecore ContentSearch

Cogworks.ExamineFileIndexer

An examine indexer that uses Apache TIKA

DevelopmentHelpers.FileContentReader

This package combine many open sources packages and allow one interface to read may types of content files. for example:use open.xml to read docx file

Skybrud.Umbraco.Search.DocumentIndexer

This package makes it possible to index and search a wide variety of filetypes in Umbraco, including .pdf and .docx

Jetsons.JetPack.Text

The wrapper library that provides smart extension methods to convert document formats to high quality text.

GitHub repositories

This package is not used by any popular GitHub repositories.

Version Downloads Last updated
1.17.1 210,407 4/3/2018
1.17.0 11,003 2/15/2018
1.16.0 111,623 7/30/2017
1.15.0 712 7/30/2017
1.14.2 38,134 4/22/2017
1.14.2-pre 655 4/15/2017
1.14.1 9,522 1/13/2017
1.14.0 2,100 12/8/2016
1.13.1 2,518 8/16/2016
1.13.0 3,706 6/30/2016
1.12.2 2,401 4/12/2016
1.12.1 682 4/12/2016
1.12.0 731 4/11/2016

- Add new overloads to the `TextExtractor.Extract` allowing users to provide their own extraction result assemblers. Example:
```cs
public class CustomResult
{
public string Text { get; set; }
public IDictionary&lt;string, string[]&gt; Metadata { get; set; }
}
public static CustomResult CreateCustomResult(string text, Metadata metadata)
{
var metaDataDictionary = metadata.names().ToDictionary(name =&gt; name, metadata.getValues);
return new CustomResult
{
Metadata = metaDataDictionary,
Text = text,
};
}
[Test]
public void should_extract_author_list_from_pdf()
{
var textExtractionResult = new TextExtractor().Extract("file_with_authors.pdf", CreateCustomResult);
textExtractionResult.Metadata["meta:author"].Should().ContainInOrder("Fred Jones, M. D.", "Donald Evans D. M.");
}
```