GitHubCrawler 1.0.0

dotnet add package GitHubCrawler --version 1.0.0
                    
NuGet\Install-Package GitHubCrawler -Version 1.0.0
                    
This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.
<PackageReference Include="GitHubCrawler" Version="1.0.0" />
                    
For projects that support PackageReference, copy this XML node into the project file to reference the package.
<PackageVersion Include="GitHubCrawler" Version="1.0.0" />
                    
Directory.Packages.props
<PackageReference Include="GitHubCrawler" />
                    
Project file
For projects that support Central Package Management (CPM), copy this XML node into the solution Directory.Packages.props file to version the package.
paket add GitHubCrawler --version 1.0.0
                    
#r "nuget: GitHubCrawler, 1.0.0"
                    
#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.
#:package GitHubCrawler@1.0.0
                    
#:package directive can be used in C# file-based apps starting in .NET 10 preview 4. Copy this into a .cs file before any lines of code to reference the package.
#addin nuget:?package=GitHubCrawler&version=1.0.0
                    
Install as a Cake Addin
#tool nuget:?package=GitHubCrawler&version=1.0.0
                    
Install as a Cake Tool

alt tag

GitHubCrawler

GitHubCrawler is a lightweight C# library for recursively discovering and downloading files from GitHub repositories via the GitHub REST API v3. It provides simple asynchronous access to repository contents with support for cancellation, proper resource management, and modern .NET async streams.

NuGet License: MIT

New in v1.x

  • ๐Ÿ” Authentication Support - Use personal access tokens for private repos and higher rate limits
  • ๐Ÿ”„ Async Enumerable - Modern async streaming API for efficient memory usage
  • โŒ Cancellation Support - All operations support CancellationToken for graceful termination
  • ๐Ÿงน Proper Resource Management - Implements IDisposable for clean HttpClient disposal
  • ๐Ÿ“ Recursive Discovery - Automatically traverses entire repository structure
  • ๐Ÿ” Metadata Included - Returns full HTTP response metadata alongside file content
  • ๐Ÿš€ Minimal Dependencies - Lightweight with minimal external dependencies

Installation

dotnet add package GitHubCrawler

Or via Package Manager:

Install-Package GitHubCrawler

Quick Start

using GitHubCrawler;
using System;
using System.Text;
using System.Threading;
using System.Threading.Tasks;

static async Task Main(string[] args)
{
    // Create crawler with optional GitHub token
    using var crawler = new GitHubRepoCrawler("your-github-token");

    // Enumerate all files in a repository
    var cts = new CancellationTokenSource();
    await foreach (var url in crawler.GetRepositoryContentsAsync(
        "https://github.com/owner/repo", 
        cts.Token))
    {
        Console.WriteLine(url);
    }

    // Download a specific file
    var file = await crawler.GetFileContentsAsync(
        "https://raw.githubusercontent.com/owner/repo/main/file.txt",
        cts.Token);
    
    Console.WriteLine(Encoding.UTF8.GetString(file.Content));
}

API Reference

Constructor

public GitHubRepoCrawler(string token = null)

Creates a new crawler instance. Supply a personal access token for:

  • Access to private repositories
  • Higher API rate limits (5,000 requests/hour vs 60 for unauthenticated)
  • Avoiding rate limit errors in large repositories

Methods

GetRepositoryContentsAsync
public async IAsyncEnumerable<string> GetRepositoryContentsAsync(
    string gitUrl, 
    CancellationToken cancellationToken = default)

Recursively discovers all file download URLs in a repository.

Parameters:

  • gitUrl: Repository URL (supports multiple formats):
    • https://github.com/owner/repo
    • https://github.com/owner/repo.git
    • git@github.com:owner/repo.git
  • cancellationToken: Optional cancellation token

Returns: An async enumerable of raw file download URLs

Exceptions:

  • ArgumentException: Invalid repository URL format
  • ObjectDisposedException: Crawler has been disposed
  • OperationCanceledException: Operation was cancelled
  • Exception: API errors (rate limits, network issues, etc.)
GetFileContentsAsync
public async Task<GitHubFileResponse> GetFileContentsAsync(
    string url, 
    CancellationToken cancellationToken = default)

Downloads file content from a GitHub raw URL.

Parameters:

  • url: Raw file URL (e.g., from GetRepositoryContentsAsync)
  • cancellationToken: Optional cancellation token

Returns: GitHubFileResponse containing:

  • byte[] Content: Raw file bytes
  • string ContentType: MIME type
  • HttpStatusCode StatusCode: HTTP response status
  • Uri FinalUrl: Final URL after redirects
  • Dictionary<string, IEnumerable<string>> Headers: Response headers

Exceptions:

  • ArgumentException: URL is null or empty
  • ObjectDisposedException: Crawler has been disposed
  • OperationCanceledException: Operation was cancelled
  • Exception: Download failed

Resource Management

The crawler implements IDisposable and should be used with a using statement:

using var crawler = new GitHubRepoCrawler(token);
// Use crawler...
// Automatically disposed when leaving scope

Advanced Examples

Handling Cancellation

using var cts = new CancellationTokenSource();

// Cancel after 30 seconds
cts.CancelAfter(TimeSpan.FromSeconds(30));

// Or cancel on user input
Console.CancelKeyPress += (s, e) => {
    e.Cancel = true;
    cts.Cancel();
};

try 
{
    await foreach (var url in crawler.GetRepositoryContentsAsync(gitUrl, cts.Token))
    {
        Console.WriteLine(url);
    }
}
catch (OperationCanceledException)
{
    Console.WriteLine("Operation cancelled");
}

Filtering Files

// Get only C# source files
await foreach (var url in crawler.GetRepositoryContentsAsync(gitUrl))
{
    if (url.EndsWith(".cs"))
    {
        var file = await crawler.GetFileContentsAsync(url);
        // Process C# file...
    }
}

Error Handling

try 
{
    await foreach (var url in crawler.GetRepositoryContentsAsync(gitUrl))
    {
        Console.WriteLine(url);
    }
}
catch (ArgumentException ex)
{
    Console.WriteLine($"Invalid URL: {ex.Message}");
}
catch (Exception ex) when (ex.Message.Contains("rate limit"))
{
    Console.WriteLine("GitHub API rate limit exceeded. Please authenticate or wait.");
}
catch (Exception ex)
{
    Console.WriteLine($"Error: {ex.Message}");
}

Progress Tracking

int fileCount = 0;
await foreach (var url in crawler.GetRepositoryContentsAsync(gitUrl))
{
    fileCount++;
    Console.Write($"\rDiscovered {fileCount} files...");
}
Console.WriteLine($"\nTotal files: {fileCount}");

Best Practices

  1. Always use authentication for production applications to avoid rate limits
  2. Implement cancellation for user-facing applications
  3. Handle rate limit errors gracefully with retry logic
  4. Dispose properly using using statements
  5. Consider memory usage when downloading large files
  6. Validate URLs before passing to the crawler

Rate Limits

Authentication Requests per Hour
None 60
Personal Access Token 5,000
GitHub App 5,000-15,000

When rate limited, the API returns status code 403 with a "rate limit exceeded" message.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Product Compatible and additional computed target framework versions.
.NET net8.0 is compatible.  net8.0-android was computed.  net8.0-browser was computed.  net8.0-ios was computed.  net8.0-maccatalyst was computed.  net8.0-macos was computed.  net8.0-tvos was computed.  net8.0-windows was computed.  net9.0 was computed.  net9.0-android was computed.  net9.0-browser was computed.  net9.0-ios was computed.  net9.0-maccatalyst was computed.  net9.0-macos was computed.  net9.0-tvos was computed.  net9.0-windows was computed.  net10.0 was computed.  net10.0-android was computed.  net10.0-browser was computed.  net10.0-ios was computed.  net10.0-maccatalyst was computed.  net10.0-macos was computed.  net10.0-tvos was computed.  net10.0-windows was computed. 
Compatible target framework(s)
Included target framework(s) (in package)
Learn more about Target Frameworks and .NET Standard.

NuGet packages

This package is not used by any NuGet packages.

GitHub repositories

This package is not used by any popular GitHub repositories.

Version Downloads Last Updated
1.0.0 438 7/24/2025

Initial release