ChatGptNet.Tokenizer 1.0.1

Suggested Alternatives

ChatGptNet.Tokenizer 1.0.3

There is a newer version of this package available.
See the version list below for details.
dotnet add package ChatGptNet.Tokenizer --version 1.0.1                
NuGet\Install-Package ChatGptNet.Tokenizer -Version 1.0.1                
This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.
<PackageReference Include="ChatGptNet.Tokenizer" Version="1.0.1" />                
For projects that support PackageReference, copy this XML node into the project file to reference the package.
paket add ChatGptNet.Tokenizer --version 1.0.1                
#r "nuget: ChatGptNet.Tokenizer, 1.0.1"                
#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.
// Install ChatGptNet.Tokenizer as a Cake Addin
#addin nuget:?package=ChatGptNet.Tokenizer&version=1.0.1

// Install ChatGptNet.Tokenizer as a Cake Tool
#tool nuget:?package=ChatGptNet.Tokenizer&version=1.0.1                

ChatGPT Tokenizer for .NET

A ChatGPT tokenizer implementation for .NET. You can use this library to estimate the cost (in terms of tokens) of your request to ChatGPT.

About

This is a C# implementation of OpenAI's original python encoder/decoder which can be found here. This implementation was strongly inspired from the JavaScript version created by latitudegames, that can be found here.

Setup

The library can be used in any .NET application built with .NET 6.0 or later. Just create an instance of the GptTokenizer and you are ready to go.

You can either create the default tokenizer

var tokenizer = await GptTokenizer.CreateTokenizerAsync();

Or, optionally, you can create a tokenizer with a custom vocab.bpe and encodings.json source files with:

using var vocabStream = File.OpenRead(@"path/to/vocab.bpe");
using var encodingStream = File.OpenRead(@"path/to/encodings.json");

var tokenizer = await GptTokenizer.CreateTokenizerAsync(vocabStream, encodingsStream);

The default vocab.bpe and encodings.json are already included in this library as gpt_tokenizer_vocab.bpe and gpt_tokenizer_encodings.json.

Usage

Once you obtain an instance of the GptTokenizer, you can perform the tokenization of your text.

var tokens = tokenizer.GetTokens(@"Lorem ipsum dolor sit amet")

var tokensCount = tokens.Length;
var tokensAsText = tokenizer.GetTextFromTokens(tokens);

Contribute

Contributions are welcome. Feel free to file issues and pull requests on the repo and we'll address them as we can.

Product Compatible and additional computed target framework versions.
.NET net6.0 is compatible.  net6.0-android was computed.  net6.0-ios was computed.  net6.0-maccatalyst was computed.  net6.0-macos was computed.  net6.0-tvos was computed.  net6.0-windows was computed.  net7.0 is compatible.  net7.0-android was computed.  net7.0-ios was computed.  net7.0-maccatalyst was computed.  net7.0-macos was computed.  net7.0-tvos was computed.  net7.0-windows was computed.  net8.0 was computed.  net8.0-android was computed.  net8.0-browser was computed.  net8.0-ios was computed.  net8.0-maccatalyst was computed.  net8.0-macos was computed.  net8.0-tvos was computed.  net8.0-windows was computed. 
Compatible target framework(s)
Included target framework(s) (in package)
Learn more about Target Frameworks and .NET Standard.
  • net6.0

    • No dependencies.
  • net7.0

    • No dependencies.

NuGet packages

This package is not used by any NuGet packages.

GitHub repositories

This package is not used by any popular GitHub repositories.