Tabula 0.1.3
There is a newer prerelease version of this package available.
See the version list below for details.
See the version list below for details.
dotnet add package Tabula --version 0.1.3
NuGet\Install-Package Tabula -Version 0.1.3
This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.
<PackageReference Include="Tabula" Version="0.1.3" />
For projects that support PackageReference, copy this XML node into the project file to reference the package.
paket add Tabula --version 0.1.3
The NuGet Team does not provide support for this client. Please contact its maintainers for support.
#r "nuget: Tabula, 0.1.3"
#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.
// Install Tabula as a Cake Addin
#addin nuget:?package=Tabula&version=0.1.3
// Install Tabula as a Cake Tool
#tool nuget:?package=Tabula&version=0.1.3
The NuGet Team does not provide support for this client. Please contact its maintainers for support.
tabula-sharp
tabula-sharp
is a library for extracting tables from PDF files — it is a port of tabula-java
- Supports .NET 5, .NET Core 3.1, .NET Standard 2.0, .NET Framework 4.5, 4.51, 4.52, 4.6, 4.61, 4.62, 4.7
- No java bindings
NuGet packages available on the releases page and on www.nuget.org:
Differences with tabula-java
- Uses PdfPig, and not PdfBox.
- Coordinate system starts from the bottom left point (going up) of the page, and not from the top left point (going down).
- The
NurminenDetectionAlgorithm
is replaced bySimpleNurminenDetectionAlgorithm
, because it requieres an image management library. - Table results might be different because of the way PdfPig builds Letters bounding box.
Usage
Stream mode - BasicExtractionAlgorithm
using (PdfDocument document = PdfDocument.Open("doc.pdf", new ParsingOptions() { ClipPaths = true }))
{
ObjectExtractor oe = new ObjectExtractor(document);
PageArea page = oe.Extract(1);
// detect canditate table zones
SimpleNurminenDetectionAlgorithm detector = new SimpleNurminenDetectionAlgorithm();
var regions = detector.Detect(page);
IExtractionAlgorithm ea = new BasicExtractionAlgorithm();
List<Table> tables = ea.Extract(page.GetArea(regions[0].BoundingBox)); // take first candidate area
var table = tables[0];
var rows = table.Rows;
}
Lattice mode - SpreadsheetExtractionAlgorithm
using (PdfDocument document = PdfDocument.Open("doc.pdf", new ParsingOptions() { ClipPaths = true }))
{
ObjectExtractor oe = new ObjectExtractor(document);
PageArea page = oe.Extract(1);
IExtractionAlgorithm ea = new SpreadsheetExtractionAlgorithm();
List<Table> tables = ea.Extract(page);
var table = tables[0];
var rows = table.Rows;
}
Results
Stream mode - BasicExtractionAlgorithm
Lattice mode - SpreadsheetExtractionAlgorithm
HELP WANTED
- The original java implementation uses STR trees in
RectangleSpatialIndex
. This is not the case here so it might be a bit slower. Any help implementing a similar approach is welcome.
Product | Versions Compatible and additional computed target framework versions. |
---|---|
.NET | net5.0 is compatible. net5.0-windows was computed. net6.0 is compatible. net6.0-android was computed. net6.0-ios was computed. net6.0-maccatalyst was computed. net6.0-macos was computed. net6.0-tvos was computed. net6.0-windows was computed. net7.0 was computed. net7.0-android was computed. net7.0-ios was computed. net7.0-maccatalyst was computed. net7.0-macos was computed. net7.0-tvos was computed. net7.0-windows was computed. net8.0 was computed. net8.0-android was computed. net8.0-browser was computed. net8.0-ios was computed. net8.0-maccatalyst was computed. net8.0-macos was computed. net8.0-tvos was computed. net8.0-windows was computed. |
.NET Core | netcoreapp2.0 was computed. netcoreapp2.1 was computed. netcoreapp2.2 was computed. netcoreapp3.0 was computed. netcoreapp3.1 is compatible. |
.NET Standard | netstandard2.0 is compatible. netstandard2.1 was computed. |
.NET Framework | net452 is compatible. net46 is compatible. net461 is compatible. net462 is compatible. net463 was computed. net47 is compatible. net471 was computed. net472 was computed. net48 was computed. net481 was computed. |
MonoAndroid | monoandroid was computed. |
MonoMac | monomac was computed. |
MonoTouch | monotouch was computed. |
Tizen | tizen40 was computed. tizen60 was computed. |
Xamarin.iOS | xamarinios was computed. |
Xamarin.Mac | xamarinmac was computed. |
Xamarin.TVOS | xamarintvos was computed. |
Xamarin.WatchOS | xamarinwatchos was computed. |
Compatible target framework(s)
Included target framework(s) (in package)
Learn more about Target Frameworks and .NET Standard.
-
.NETCoreApp 3.1
- PdfPig (>= 0.1.6)
-
.NETFramework 4.5.2
- PdfPig (>= 0.1.6)
-
.NETFramework 4.6
- PdfPig (>= 0.1.6)
-
.NETFramework 4.6.1
- PdfPig (>= 0.1.6)
-
.NETFramework 4.6.2
- PdfPig (>= 0.1.6)
-
.NETFramework 4.7
- PdfPig (>= 0.1.6)
-
.NETStandard 2.0
- PdfPig (>= 0.1.6)
-
net5.0
- PdfPig (>= 0.1.6)
-
net6.0
- PdfPig (>= 0.1.6)
NuGet packages (2)
Showing the top 2 NuGet packages that depend on Tabula:
Package | Downloads |
---|---|
Tabula.Json
Extract tables from PDF files (port of tabula-java using PdfPig). Json writer. |
|
Tabula.Csv
Extract tables from PDF files (port of tabula-java using PdfPig). Csv and Tsv writers. |
GitHub repositories
This package is not used by any popular GitHub repositories.
Version | Downloads | Last updated |
---|---|---|
0.1.4-alpha001 | 4,552 | 10/19/2023 |
0.1.3 | 133,986 | 6/1/2022 |
0.1.2 | 14,824 | 1/29/2022 |
0.1.1 | 8,274 | 7/18/2021 |
0.1.1-alpha001 | 459 | 3/6/2021 |
0.1.0 | 15,952 | 1/17/2021 |
0.1.0-alpha002 | 458 | 10/26/2020 |