Bytescout.PDFExtractor 13.1.0.4386

.NET Core 2.1 .NET Framework 2.0
Install-Package Bytescout.PDFExtractor -Version 13.1.0.4386
dotnet add package Bytescout.PDFExtractor --version 13.1.0.4386
<PackageReference Include="Bytescout.PDFExtractor" Version="13.1.0.4386" />
For projects that support PackageReference, copy this XML node into the project file to reference the package.
paket add Bytescout.PDFExtractor --version 13.1.0.4386
The NuGet Team does not provide support for this client. Please contact its maintainers for support.
#r "nuget: Bytescout.PDFExtractor, 13.1.0.4386"
#r directive can be used in F# Interactive, C# scripting and .NET Interactive. Copy this into the interactive tool or source code of the script to reference the package.
// Install Bytescout.PDFExtractor as a Cake Addin
#addin nuget:?package=Bytescout.PDFExtractor&version=13.1.0.4386

// Install Bytescout.PDFExtractor as a Cake Tool
#tool nuget:?package=Bytescout.PDFExtractor&version=13.1.0.4386
The NuGet Team does not provide support for this client. Please contact its maintainers for support.

Bytescout PDF Extractor SDK for .NET, ASP.NET, ActiveX - extract data from PDF documents

Product Versions
.NET net5.0 net5.0-windows net6.0 net6.0-android net6.0-ios net6.0-maccatalyst net6.0-macos net6.0-tvos net6.0-windows
.NET Core netcoreapp2.1 netcoreapp2.2 netcoreapp3.0 netcoreapp3.1
.NET Framework net20 net35 net40 net403 net45 net451 net452 net46 net461 net462 net463 net47 net471 net472 net48
Compatible target framework(s)
Additional computed target framework(s)
Learn more about Target Frameworks and .NET Standard.

NuGet packages (1)

Showing the top 1 NuGet packages that depend on Bytescout.PDFExtractor:

Package Downloads
BizDoc.Applications.Invoice-Scan

Invoice for BizDoc

GitHub repositories

This package is not used by any popular GitHub repositories.

Version Downloads Last updated
13.1.0.4386 2,817 1/25/2022
13.0.1.4281 771 11/8/2021
13.0.0.4254 337 10/4/2021
12.1.5.4183 911 7/5/2021
12.1.5.4181 268 7/5/2021
12.1.4.4171 363 6/17/2021
12.1.4.4169 201 6/17/2021
12.1.3.4167 238 6/16/2021
12.1.2.4156 322 5/28/2021
12.1.1.4149 252 5/26/2021
12.1.1.4145 245 5/26/2021
12.1.0.4136 319 5/18/2021
12.0.0.4062 841 2/8/2021
11.3.0.3983 1,069 10/26/2020
11.2.1.3959 1,670 9/1/2020
11.2.1.3929 895 7/14/2020
11.2.1.3926 476 7/9/2020
11.2.0.3919 481 6/30/2020
11.1.0.3869 2,765 4/10/2020
11.1.0.3864 590 4/4/2020
11.1.0.3849 648 3/27/2020
11.1.0.3845 632 3/19/2020
11.0.0.3834 752 3/6/2020
11.0.0.3832 513 3/4/2020
11.0.0.3830 492 3/4/2020
11.0.0.3815 673 2/21/2020
11.0.0.3805 808 2/11/2020
10.8.0.3758 1,554 12/19/2019
10.8.0.3750 557 12/17/2019
10.8.0.3744 498 12/12/2019
10.8.0.3741 479 12/10/2019
10.8.0.3736 594 12/6/2019
10.8.0.3732 580 12/4/2019
10.7.2.3710 930 11/13/2019
10.7.1.3705 520 11/11/2019
10.7.0.3697 637 11/2/2019
10.6.0.3666 1,495 10/1/2019
10.5.0.3637 1,276 9/2/2019
10.4.0.3618 978 8/15/2019
10.4.0.3613 580 8/13/2019
10.4.0.3602 679 8/7/2019
10.3.0.3566 1,189 7/2/2019
10.2.0.3548 1,548 6/13/2019
10.2.0.3534 553 6/11/2019
10.2.0.3525 594 6/7/2019
10.2.0.3514 584 5/28/2019
10.1.0.3444 1,149 4/5/2019
10.1.0.3439 589 4/4/2019
10.0.0.3429 653 3/25/2019
10.0.0.3427 555 3/25/2019
10.0.0.3424 563 3/23/2019
10.0.0.3423 577 3/23/2019
10.0.0.3422 592 3/23/2019
10.0.0.3421 632 3/21/2019
9.4.0.3398 703 3/12/2019
9.3.0.3366 1,655 2/12/2019
9.3.0.3357 676 2/4/2019
9.3.0.3354 581 1/31/2019
9.2.0.3293 1,685 11/20/2018
9.2.0.3262 908 10/24/2018
9.2.0.3259 640 10/24/2018
9.1.0.3170 1,362 7/26/2018
9.1.0.3167 887 7/18/2018
9.1.0.3165 769 7/18/2018
9.1.0.3163 837 7/18/2018
9.0.0.3095 1,993 4/23/2018
9.0.0.3087 1,103 4/13/2018
9.0.0.3080 918 4/11/2018
8.8.1.3046 1,445 2/20/2018
8.8.1.3025 1,701 1/29/2018
8.8.0.3021 951 1/23/2018
8.7.0.2981 4,012 11/8/2017
8.6.0.2917 1,884 8/2/2017
8.6.0.2912 825 8/1/2017
8.5.0.2863 1,098 6/9/2017
8.5.0.2861 990 6/8/2017
8.5.0.2856 959 6/1/2017
8.4.1.2829 5,163 4/12/2017
8.4.0.2821 942 3/29/2017
8.3.0.2809 1,338 3/13/2017
8.3.0.2806 858 3/12/2017
8.3.0.2803 871 3/6/2017
8.3.0.2801 839 3/6/2017
8.3.0.2800 846 3/6/2017
8.3.0.2798 830 3/6/2017
8.3.0.2796 845 3/6/2017
8.3.0.2794 854 3/6/2017
8.2.0.2699 1,278 1/11/2017
8.1.1.2606 1,798 10/25/2016
8.1.0.2600 912 10/21/2016
8.0.0.2542 1,205 9/1/2016
8.0.0.2541 885 9/1/2016
8.0.0.2528 940 8/23/2016
8.0.0.2523 960 8/19/2016
7.0.0.2493 27,689 6/27/2016
7.0.0.2489 838 6/27/2016
7.0.0.2480 2,062 6/10/2016
7.0.0.2474 1,331 5/26/2016
6.30.0.2421 1,079 3/24/2016
6.20.0.2354 1,097 1/20/2016
6.12.0.2239 4,093 9/22/2015
5.20.0.1871 1,623 2/5/2015
5.0.0.1626 1,629 8/14/2014
4.0.0.1487 1,137 5/31/2014
3.40.0.1349 1,316 3/11/2014
3.20.0.1092 1,300 8/5/2013
3.20.0.1075 2,311 7/12/2013
3.10.0.1051 1,294 6/29/2013
3.0.0.839 1,250 3/26/2013
2.50.0.769 1,273 2/25/2013

Bytescout PDF Extractor SDK for .NET, ASP.NET, ActiveX.

ByteScout, Inc. (c) 2008-2022.

Compatibility: .NET Framework 2.0 or later; .NET Core 2.0 or later.
Works with: .NET, ASP.NET, ActiveX, Visual Basic 6, Classic ASP, Delphi and others.

Features:

- Extracts data from PDF files in TXT, CSV, XML, XLS, XLSX, JSON formats;
- Extracts embedded images, files and attachments from PDF files;
- Splits and merges PDF files, extracts a single page or range of pages;
- Extracts data from whole document page or specified rectangular region;
- Extracts PDF document information (author, subject, producer etc);
- Detects tables;
- Searches text inside document with regex support;
- Extracts data from PDF forms;
- Reads text from scanned PDF documents using OCR (Optical Character Recognition);
- Provides ActiveX interface to use from legacy programming languages (Visual Basic 6, Delphi) and scripting (VBscript, JScript and others);
- And much more...

History of changes:

13.1.0.4386 (January 24, 2022)
==============================
+ DocumentMerger: Added property 'MergedDocumentTitle' allowing to override the title of merged document.
+ XLSExtractor: Added property 'CustomColumnWidths' allowing to specify exact column widths in generated Excel spreadsheet.
= JSONExtractor: The mode 'OutputStructure.Full' is renamed to 'OutputStructure.LegacyFixed' and made maximally compatible in field names with the mode 'OutputStructure.Legacy'.
+ Added support for UniKS-UCS2-H text encoding.
+ InfoExtractor: Added method 'GetFormFields()' returning information about form fields in PDF document.
= Improved COM/ActiveX interfaces for in-memory processing without file operations.
+ Extractors and SearchablePDFMaker: Added property 'OCRDisableAutoSegmentation' to solve OCR engine's segmentation issues.
= .NET Core min required version is 2.1 now (was 2.0).
- Line grouping was not affected by 'ConsiderFontSizes' and 'ConsiderFontColors' properties. Fixed now.
- Fixed disposing issue in 'SearchablePDFMaker'.
= Improved parsing of PDF documents.
= Other minor fixes and improvements.

13.0.0.4253 (October 4, 2021)
=============================
+ New column detection mode 'ColumnDetectionMode.ContentGroupsAI' that works better on tables without borders and on pages with multiple tables.
= Greatly improved tables detection in 'TableDetector2'.
= Improved filtering of shadow-like text ('ExtractShadowLikeText' option).
= Improved the 'LineGroupingMode.JoinOrphanedRows'.
= 'DocumentMerger': Improved merging of PDF forms. Now it can link fields with matching names or rename them to avoid unwanted linking. See the property 'RenameMatchingFieldsDuringMerge'.
= 'JSONExtractor' and 'XMLExtractor' now output the page size for each page.
= All extractor classes now support extraction of page ranges.
+ Added properties 'DetectUnderlineTextStyle' and 'DetectStrikeoutTextStyle' to `CSVExtractor` and `XLSExtractor`. They helps to prevent underlined text affecting the line grouping in table cells.
= Improved background color detection for the option 'ConsiderBackgroundColors'.
+ Added property 'NormalizeText' to all extractors. It replaced unicode spaces and hyphens in the extracted text with normal ' ' and '-' characters.
- 'Remover2': fixed handling of PDF page rotation.
- 'Remover2': making unsearchable now performed only for edited pages.
+ 'XMLExtractor': Added property 'IndentedXML' to control indentation.
+ 'JSONExtractor': Added property 'IndentedJSON' to control indentation.
- 'Stamper': fixed stamping of rotated pages.
+ Added new OCR mode - 'OCRMode.AutoRepairFonts'. It automatically tries to detect PDF documents with corrupted text and forces OCR font repair for them. Works only for English texts.
+ Added property 'PageSeparator' to CSV and XLS extractors.
= 'XLSExtractor': improved negative numbers detection.
- 'TextExtractor.FindAll()' method was ignoring the case sensitivity option. Fixed now.
+ Added property 'OCRDetectLines' that helps to detect table structure in scanned documents.
+ 'JSONExtractor' and 'XMLExtractor' now outputs number of pages in the result and number of pages for which OCR was performed.
+ Added property 'OCRPageCount' to extractors that contains number of pages for which OCR was performed during the last extraction.
+ 'JSONExtractor': Added property 'OutputStructure' that allows to select structure of output JSON.
+ 'JSONExtractor': Added property 'OutputTransformation' that allows to apply JSONPath expression to the output JSON.
= Performance improvements.
= Improved parsing of PDF documents.
= Other minor fixes and improvements.

12.1.0.4136 (May 18, 2021)
==========================
+ Added property 'TextExtractor.FuzzySearch' that enables 'fuzzy' text search algorithm. It allows to find 'approximately equal' strings.
+ Added 'DocumentSplitter2' class that splits document by found text.
+ Added 'CSVExtractor.NormalizeCSV' property. It makes CSV data produced from different document pages to contain the same number of columns.
+ Added property 'JSONExtractor.OutputStructure' that allows to change the structure of the generated JSON to one of predefined variants for easier postprocessing.
+ Added property 'JSONExtractor.OutputTransformation' that allows to apply JSONPath expression to the generated JSON.
+ Added property 'OCRPageCount' to extractor classes that contains number of pages for which OCR was performed.
+ 'JSONExtractor' and 'XMLExtractor' now add to the generated JSON and XML result the number of process pages and the number of pages for which OCR was performed.
+ Added property 'OCRDetectLines' to extractor classes that improves column detection in scanned documents.
+ Added property 'ConsiderBackgroundColors' to extractor classes that enables detection of background color under text objects. It may helps to improve row and column detection in tables without borders but with color stripes.
+ Added properties 'DocumentMerger.GenerateBookmarks' and 'DocumentMerger.BookmarkTitles' to enable automatic generation of bookmarks pointing to the merged parts.
= Improved PDF optimization in 'DocumentSplitter'.
= 'DocumentMerger' now uses the first input document as the base for the merged document. This allows to keep document information properties and outlines.
= DocumentMerger: added support for profiles.
= MultimediaExtractor: added support for more media types.
- 'TextExtractor.FindAll()' method was ignoring the case sensitivity option.
- Fixed issue with junk empty temporary files generated during OCR.
= Improved parsing of PDF documents.
= Other minor fixes and improvements.

12.0.0.4062 (February 8, 2021)
==============================
+ Added public 'BaseExtractor.ExtractionArea' property (in addition to 'SetExtractionArea()' method) for more intuitive use.
= Added the new property 'ColumnDetectionByTextAlignment' to extractors that affects the detection of table columns without separating lines between.
+ Added support for simplified profiles.
+ DocumentOptimizer: Added the property 'OptimizationOptions.GrayscaleImages' that converts all color images to grayscale.
+ UnsearchablePDFMaker: Added the new property 'KeepSkippedPages' that keeps pages excluded from the processing in the output document.
+ UnsearchablePDFMaker: Added the new property 'Grayscale' that converts all processed pages to grayscale.
+ Added the property 'BaseTextExtractor.TextAnalysisCorruptedTextThreshold' to fine-tune the text analysis.
= Member names in profiles are case-insensitive now.
= Improved filtering of invisible objects.
= Improved detection of bold fonts.
= Improved OCR rotation detection.
= Added missing OCR mode 'OCRMode.TextFromVectorsAndRepairedFonts'.
= RTL fonts detection is now enabled by default.
= JSON extractor now generates clean JSON (without the @ and# characters for attributes).
= Improved support for external Chinese fonts.
= Improved positioning of rotated PDF objects.
= Now the damaged CCITT and JBIG2 images are skipped from rendering avoiding crashes.
= SearchablePDFMaker: improved OCR when 'DiscardExistingDocumentText' is enabled.
= 'SearchablePDFMaker.GetPageOCRCells()' now detects text color.
= OCR in all extractors now detects text color if the 'ConsiderFontColors' property is enabled.
= 'LineGroupingMode.JoinOrphanedRows' now separates rows of different color if 'ConsiderFontColors' property is enabled.
- InfoExtractor: Fixed a crash if the input document is an image.
- Fixed OCR crash on rotated text.
- 'IsOCRRecommendedForPage()' now skips text objects outside the page crop box.
= Improved parsing of PDF documents.
= Other minor fixes and improvements.

...