Scraping Framework containing :
- a web client able to simulate a web browser.
- an HtmlAgilityPack extension to select elements using css selector (like JQuery)
dcsoup is a .NET library for working with real-world HTML. It provides a very convenient API for extracting and manipulating data, using the best of DOM, CSS, and jquery-like methods.
This library is basically a port of jsoup, a Java HTML parser library. see also: http://jsoup.org/
API reference is...
More information
Iron WebScraper is a C# web scraping library, allowing developers to simulate & automate human browsing behavior to extract content, files & images from web applications as native .Net objects. Iron Web Scraper manages politeness & multithreading in the background, leaving a developer’s own...
More information
Linear-progressive text discovery engine exposing functionality through simple service APIs. Break plain text into a sequence of slices which can be reconstituted as annotated text. Generate meta-rich tokens from a search expression to then be used to annotate source text matches; noise-word...
More information
TextDiscovery HtmlAgilityPack implementations of IDomInterpreter, IDomNodeFactory, and IHtmlConverter. Enables the following capabilities: mark search hits in the DOM, create HTML excerpts at a given word count with configurable element-breaking rules, and more.
TextDiscovery AngleSharp implementations of IDomInterpreter, IDomNodeFactory, and IHtmlConverter. Enables the following capabilities: mark search hits in the DOM, create HTML excerpts at a given word count with configurable element-breaking rules, and more.
Scraping Framework containing :
- a web client able to simulate a web browser.
- an HtmlAgilityPack extension to select elements using css selector (like JQuery)
.NET library to scrape content from the Internet. Use it to extract information from Web pages in your own application. Extracted data is written to a CSV file. Supports paging and can cycle through all combinations of any number of replacement tags.
Now targets .NET Standard 2.0 or .NET 5.0, and...
More information
The WebScraper_dotNET Class Library is a high level wrapper around the WebRequest.
The library is currently compiled at .Net 4.0 so it should work with any application running .Net 4.0 and above.
WebScraper_dotNET is a library for scraping web data. It converts HTML code into a structured array of...
More information