GitHubCrawler 1.0.0
dotnet add package GitHubCrawler --version 1.0.0
NuGet\Install-Package GitHubCrawler -Version 1.0.0
<PackageReference Include="GitHubCrawler" Version="1.0.0" />
<PackageVersion Include="GitHubCrawler" Version="1.0.0" />
<PackageReference Include="GitHubCrawler" />
paket add GitHubCrawler --version 1.0.0
#r "nuget: GitHubCrawler, 1.0.0"
#:package GitHubCrawler@1.0.0
#addin nuget:?package=GitHubCrawler&version=1.0.0
#tool nuget:?package=GitHubCrawler&version=1.0.0
GitHubCrawler
GitHubCrawler is a lightweight C# library for recursively discovering and downloading files from GitHub repositories via the GitHub REST API v3. It provides simple asynchronous access to repository contents with support for cancellation, proper resource management, and modern .NET async streams.
New in v1.x
- ๐ Authentication Support - Use personal access tokens for private repos and higher rate limits
- ๐ Async Enumerable - Modern async streaming API for efficient memory usage
- โ Cancellation Support - All operations support
CancellationToken
for graceful termination - ๐งน Proper Resource Management - Implements
IDisposable
for clean HttpClient disposal - ๐ Recursive Discovery - Automatically traverses entire repository structure
- ๐ Metadata Included - Returns full HTTP response metadata alongside file content
- ๐ Minimal Dependencies - Lightweight with minimal external dependencies
Installation
dotnet add package GitHubCrawler
Or via Package Manager:
Install-Package GitHubCrawler
Quick Start
using GitHubCrawler;
using System;
using System.Text;
using System.Threading;
using System.Threading.Tasks;
static async Task Main(string[] args)
{
// Create crawler with optional GitHub token
using var crawler = new GitHubRepoCrawler("your-github-token");
// Enumerate all files in a repository
var cts = new CancellationTokenSource();
await foreach (var url in crawler.GetRepositoryContentsAsync(
"https://github.com/owner/repo",
cts.Token))
{
Console.WriteLine(url);
}
// Download a specific file
var file = await crawler.GetFileContentsAsync(
"https://raw.githubusercontent.com/owner/repo/main/file.txt",
cts.Token);
Console.WriteLine(Encoding.UTF8.GetString(file.Content));
}
API Reference
Constructor
public GitHubRepoCrawler(string token = null)
Creates a new crawler instance. Supply a personal access token for:
- Access to private repositories
- Higher API rate limits (5,000 requests/hour vs 60 for unauthenticated)
- Avoiding rate limit errors in large repositories
Methods
GetRepositoryContentsAsync
public async IAsyncEnumerable<string> GetRepositoryContentsAsync(
string gitUrl,
CancellationToken cancellationToken = default)
Recursively discovers all file download URLs in a repository.
Parameters:
gitUrl
: Repository URL (supports multiple formats):https://github.com/owner/repo
https://github.com/owner/repo.git
git@github.com:owner/repo.git
cancellationToken
: Optional cancellation token
Returns: An async enumerable of raw file download URLs
Exceptions:
ArgumentException
: Invalid repository URL formatObjectDisposedException
: Crawler has been disposedOperationCanceledException
: Operation was cancelledException
: API errors (rate limits, network issues, etc.)
GetFileContentsAsync
public async Task<GitHubFileResponse> GetFileContentsAsync(
string url,
CancellationToken cancellationToken = default)
Downloads file content from a GitHub raw URL.
Parameters:
url
: Raw file URL (e.g., fromGetRepositoryContentsAsync
)cancellationToken
: Optional cancellation token
Returns: GitHubFileResponse
containing:
byte[] Content
: Raw file bytesstring ContentType
: MIME typeHttpStatusCode StatusCode
: HTTP response statusUri FinalUrl
: Final URL after redirectsDictionary<string, IEnumerable<string>> Headers
: Response headers
Exceptions:
ArgumentException
: URL is null or emptyObjectDisposedException
: Crawler has been disposedOperationCanceledException
: Operation was cancelledException
: Download failed
Resource Management
The crawler implements IDisposable
and should be used with a using
statement:
using var crawler = new GitHubRepoCrawler(token);
// Use crawler...
// Automatically disposed when leaving scope
Advanced Examples
Handling Cancellation
using var cts = new CancellationTokenSource();
// Cancel after 30 seconds
cts.CancelAfter(TimeSpan.FromSeconds(30));
// Or cancel on user input
Console.CancelKeyPress += (s, e) => {
e.Cancel = true;
cts.Cancel();
};
try
{
await foreach (var url in crawler.GetRepositoryContentsAsync(gitUrl, cts.Token))
{
Console.WriteLine(url);
}
}
catch (OperationCanceledException)
{
Console.WriteLine("Operation cancelled");
}
Filtering Files
// Get only C# source files
await foreach (var url in crawler.GetRepositoryContentsAsync(gitUrl))
{
if (url.EndsWith(".cs"))
{
var file = await crawler.GetFileContentsAsync(url);
// Process C# file...
}
}
Error Handling
try
{
await foreach (var url in crawler.GetRepositoryContentsAsync(gitUrl))
{
Console.WriteLine(url);
}
}
catch (ArgumentException ex)
{
Console.WriteLine($"Invalid URL: {ex.Message}");
}
catch (Exception ex) when (ex.Message.Contains("rate limit"))
{
Console.WriteLine("GitHub API rate limit exceeded. Please authenticate or wait.");
}
catch (Exception ex)
{
Console.WriteLine($"Error: {ex.Message}");
}
Progress Tracking
int fileCount = 0;
await foreach (var url in crawler.GetRepositoryContentsAsync(gitUrl))
{
fileCount++;
Console.Write($"\rDiscovered {fileCount} files...");
}
Console.WriteLine($"\nTotal files: {fileCount}");
Best Practices
- Always use authentication for production applications to avoid rate limits
- Implement cancellation for user-facing applications
- Handle rate limit errors gracefully with retry logic
- Dispose properly using
using
statements - Consider memory usage when downloading large files
- Validate URLs before passing to the crawler
Rate Limits
Authentication | Requests per Hour |
---|---|
None | 60 |
Personal Access Token | 5,000 |
GitHub App | 5,000-15,000 |
When rate limited, the API returns status code 403 with a "rate limit exceeded" message.
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
License
This project is licensed under the MIT License - see the LICENSE file for details.
Product | Versions Compatible and additional computed target framework versions. |
---|---|
.NET | net8.0 is compatible. net8.0-android was computed. net8.0-browser was computed. net8.0-ios was computed. net8.0-maccatalyst was computed. net8.0-macos was computed. net8.0-tvos was computed. net8.0-windows was computed. net9.0 was computed. net9.0-android was computed. net9.0-browser was computed. net9.0-ios was computed. net9.0-maccatalyst was computed. net9.0-macos was computed. net9.0-tvos was computed. net9.0-windows was computed. net10.0 was computed. net10.0-android was computed. net10.0-browser was computed. net10.0-ios was computed. net10.0-maccatalyst was computed. net10.0-macos was computed. net10.0-tvos was computed. net10.0-windows was computed. |
-
net8.0
- Inputty (>= 1.0.12)
NuGet packages
This package is not used by any NuGet packages.
GitHub repositories
This package is not used by any popular GitHub repositories.
Version | Downloads | Last Updated |
---|---|---|
1.0.0 | 438 | 7/24/2025 |
Initial release