SharpToken 1.2.12
See the version list below for details.
dotnet add package SharpToken --version 1.2.12
NuGet\Install-Package SharpToken -Version 1.2.12
<PackageReference Include="SharpToken" Version="1.2.12" />
paket add SharpToken --version 1.2.12
#r "nuget: SharpToken, 1.2.12"
// Install SharpToken as a Cake Addin #addin nuget:?package=SharpToken&version=1.2.12 // Install SharpToken as a Cake Tool #tool nuget:?package=SharpToken&version=1.2.12
SharpToken
SharpToken is a C# library that serves as a port of the Python tiktoken library. It provides functionality for encoding and decoding tokens using GPT-based encodings. This library is built for .NET 6 and .NET Standard 2.0, making it compatible with a wide range of frameworks.
Installation
To install SharpToken, use the NuGet package manager:
Install-Package SharpToken
Or, if you prefer using the .NET CLI:
dotnet add package SharpToken
For more information, visit the NuGet package page.
Usage
To use SharpToken in your project, first import the library:
using SharpToken;
Next, create an instance of GptEncoding by specifying the desired encoding or model:
// Get encoding by encoding name
var encoding = GptEncoding.GetEncoding("cl100k_base");
// Get encoding by model name
var encoding = GptEncoding.GetEncodingForModel("gpt-4");
You can then use the Encode method to encode a string:
var encoded = encoding.Encode("Hello, world!"); // Output: [9906, 11, 1917, 0]
And use the Decode method to decode the encoded tokens:
var decoded = encoding.Decode(encoded); // Output: "Hello, world!"
Supported Models
SharpToken currently supports the following models:
r50k_base
p50k_base
p50k_edit
cl100k_base
You can use any of these models when creating an instance of GptEncoding:
var r50kBaseEncoding = GptEncoding.GetEncoding("r50k_base");
var p50kBaseEncoding = GptEncoding.GetEncoding("p50k_base");
var p50kEditEncoding = GptEncoding.GetEncoding("p50k_edit");
var cl100kBaseEncoding = GptEncoding.GetEncoding("cl100k_base");
Model Prefix Matching
Apart from specifying direct model names, SharpToken also provides functionality to map model names based on specific prefixes. This allows users to retrieve an encoding based on a model's prefix.
Here are the current supported prefixes and their corresponding encodings:
Model Prefix | Encoding |
---|---|
gpt-4- |
cl100k_base |
gpt-3.5-turbo- |
cl100k_base |
gpt-35-turbo |
cl100k_base |
Examples of model names that fall under these prefixes include:
- For the prefix
gpt-4-
:gpt-4-0314
,gpt-4-32k
, etc. - For the prefix
gpt-3.5-turbo-
:gpt-3.5-turbo-0301
,gpt-3.5-turbo-0401
, etc. - For the Azure deployment name
gpt-35-turbo
.
To retrieve the encoding name based on a model name or its prefix, you can use the GetEncodingNameForModel
method:
string encodingName = GetEncodingNameForModel("gpt-4-0314"); // This will return "cl100k_base"
If the provided model name doesn't match any direct model names or prefixes, the method will return null
.
Understanding Encoded Values
When you encode a string using the Encode method, the returned value is a list of integers that represent tokens in the specified encoding. These tokens are a compact way of representing the input text and can be processed more efficiently by various algorithms.
For example, encoding the text "Hello world!" using the cl100k_base encoding might produce the following list of integers:
var encoded = cl100kBaseEncoding.Encode("Hello world!"); // Output: [9906, 1917, 0]
You can then use the Decode
method to convert these tokenized integer values back into the original text:
var decoded = cl100kBaseEncoding.Decode(encoded); // Output: "Hello world!"
With SharpToken, you can seamlessly switch between different encodings to find the one that best suits your needs. Just
remember to use the same encoding for both the Encode
and Decode
methods to ensure accurate results.
Advanced usage
Custom Allowed Sets
SharpToken allows you to specify custom sets of allowed special tokens when encoding text. To do this, pass a HashSet<string> containing the allowed special tokens as a parameter to the Encode method:
const string encodingName = "cl100k_base";
const string inputText = "Some Text <|endofprompt|>";
var allowedSpecialTokens = new HashSet<string> { "<|endofprompt|>" };
var encoding = GptEncoding.GetEncoding(encodingName);
var encoded = encoding.Encode(inputText, allowedSpecialTokens);
var expectedEncoded = new List<int> { 8538, 2991, 220, 100276 };
Assert.Equal(expectedEncoded, encoded);
Custom Disallowed Sets
Similarly, you can specify custom sets of disallowed special tokens when encoding text. Pass a HashSet<string>
containing the disallowed special tokens as a parameter to the Encode method:
const string encodingName = "cl100k_base";
const string inputText = "Some Text";
var encoding = GptEncoding.GetEncoding(encodingName);
void TestAction()
{
encoding.Encode(inputText, disallowedSpecial: new HashSet<string> { "Some" });
}
Assert.Throws<ArgumentException>(TestAction);
In this example, an ArgumentException
is thrown because the input text contains a disallowed special token
Testing and Validation
SharpToken includes a set of test cases in the TestPlans.txt file to ensure its compatibility with the Python tiktoken library. These test cases validate the functionality and behavior of SharpToken, providing a reliable reference for developers. Running the unit tests and verifying the test cases helps maintain consistency between the C# SharpToken library and the original Python implementation.
Contributions and Feedback
If you encounter any issues or have suggestions for improvements, please feel free to open an issue or submit a pull request on the project's repository.
Hope you find SharpToken useful for your projects and welcome any feedback you may have.
Product | Versions Compatible and additional computed target framework versions. |
---|---|
.NET | net5.0 was computed. net5.0-windows was computed. net6.0 is compatible. net6.0-android was computed. net6.0-ios was computed. net6.0-maccatalyst was computed. net6.0-macos was computed. net6.0-tvos was computed. net6.0-windows was computed. net7.0 was computed. net7.0-android was computed. net7.0-ios was computed. net7.0-maccatalyst was computed. net7.0-macos was computed. net7.0-tvos was computed. net7.0-windows was computed. net8.0 was computed. net8.0-android was computed. net8.0-browser was computed. net8.0-ios was computed. net8.0-maccatalyst was computed. net8.0-macos was computed. net8.0-tvos was computed. net8.0-windows was computed. |
.NET Core | netcoreapp2.0 was computed. netcoreapp2.1 was computed. netcoreapp2.2 was computed. netcoreapp3.0 was computed. netcoreapp3.1 was computed. |
.NET Standard | netstandard2.0 is compatible. netstandard2.1 was computed. |
.NET Framework | net461 was computed. net462 was computed. net463 was computed. net47 was computed. net471 was computed. net472 was computed. net48 was computed. net481 was computed. |
MonoAndroid | monoandroid was computed. |
MonoMac | monomac was computed. |
MonoTouch | monotouch was computed. |
Tizen | tizen40 was computed. tizen60 was computed. |
Xamarin.iOS | xamarinios was computed. |
Xamarin.Mac | xamarinmac was computed. |
Xamarin.TVOS | xamarintvos was computed. |
Xamarin.WatchOS | xamarinwatchos was computed. |
-
.NETStandard 2.0
- No dependencies.
-
net6.0
- No dependencies.
NuGet packages (17)
Showing the top 5 NuGet packages that depend on SharpToken:
Package | Downloads |
---|---|
EachShow.AI
OpenAI, ChatGPT |
|
AICentral
Package Description |
|
Encamina.Enmarcha.SemanticKernel.Abstractions
Package Description |
|
MyIA.SemanticKernel.Connectors.AI.MultiConnector
Extend your Semantic Kernel-powered apps with a fleet of specialized connectors, managed by a superior LLM as your fleet captain. |
|
Musoq.DataSources.OpenAI
Package Description |
GitHub repositories (6)
Showing the top 5 popular GitHub repositories that depend on SharpToken:
Repository | Stars |
---|---|
microsoft/semantic-kernel
Integrate cutting-edge LLM technology quickly and easily into your apps
|
|
axzxs2001/Asp.NetCoreExperiment
原来所有项目都移动到**OleVersion**目录下进行保留。新的案例装以.net 5.0为主,一部分对以前案例进行升级,一部分将以前的工作经验总结出来,以供大家参考!
|
|
PowerShell/AIShell
An interactive shell to work with AI-powered assistance providers
|
|
AIDotNet/Thor
Thor(雷神托尔) 是一款强大的人工智能模型管理工具,其主要目的是为了实现多种AI模型的统一管理和使用。通过Thor(雷神托尔),用户可以轻松地管理和使用众多AI模型,而且Thor(雷神托尔)兼容OpenAI的接口格式,使得使用更加方便。
|
|
aiqinxuancai/TiktokenSharp
Token calculation for OpenAI models, using `o200k_base` `cl100k_base` `p50k_base` encoding.
|
Version | Downloads | Last updated |
---|---|---|
2.0.3 | 285,275 | 5/17/2024 |
2.0.2 | 56,342 | 4/8/2024 |
2.0.1 | 34,948 | 3/26/2024 |
1.2.33 | 2,888 | 3/25/2024 |
1.2.17 | 111,517 | 2/19/2024 |
1.2.16 | 9,390 | 2/15/2024 |
1.2.15 | 29,811 | 2/5/2024 |
1.2.14 | 214,992 | 12/10/2023 |
1.2.13 | 196 | 12/10/2023 |
1.2.12 | 286,392 | 9/12/2023 |
1.2.10 | 6,970 | 9/7/2023 |
1.2.8 | 32,191 | 8/28/2023 |
1.2.7 | 4,307 | 8/23/2023 |
1.2.6 | 37,665 | 8/2/2023 |
1.2.5 | 2,062 | 8/1/2023 |
1.2.2 | 27,020 | 7/1/2023 |
1.2.1 | 206 | 7/1/2023 |
1.1.3 | 221 | 7/1/2023 |
1.0.30 | 4,041 | 6/26/2023 |
1.0.29 | 818 | 6/25/2023 |
1.0.28 | 76,204 | 4/25/2023 |
1.0.27 | 2,318 | 4/20/2023 |
1.0.26 | 278 | 4/18/2023 |
1.0.25 | 16,557 | 3/28/2023 |
1.0.24 | 277 | 3/28/2023 |
1.0.23 | 275 | 3/28/2023 |
1.0.22 | 281 | 3/28/2023 |
1.0.21 | 277 | 3/28/2023 |
1.0.20 | 274 | 3/28/2023 |
1.0.19 | 310 | 3/28/2023 |
1.0.18 | 311 | 3/28/2023 |
1.0.17 | 302 | 3/28/2023 |
1.0.16 | 288 | 3/28/2023 |
1.0.12 | 289 | 3/28/2023 |
1.0.11 | 287 | 3/28/2023 |
1.0.2 | 5,886 | 7/1/2023 |