Luthor 2.3.0
dotnet add package Luthor --version 2.3.0
NuGet\Install-Package Luthor -Version 2.3.0
<PackageReference Include="Luthor" Version="2.3.0" />
paket add Luthor --version 2.3.0
#r "nuget: Luthor, 2.3.0"
// Install Luthor as a Cake Addin #addin nuget:?package=Luthor&version=2.3.0 // Install Luthor as a Cake Tool #tool nuget:?package=Luthor&version=2.3.0
Luthor
Extract structure from any text using a tokenising lexer.
Using Luthor you can convert any single or multiple line text into a collection containing runs of token types and their content. This provides access to the content at a higher level of abstraction, allowing further processing without having to worry about the specifics of the raw text.
For each token you get the offest, the line number, the column within the line, and the content.
For example:
Sample text.
Across 3 lines.
With a "multi 'word' string".
This gives a list of tokens like this (also including line number etc):
Letters : "Sample"
Whitespace : " "
Letters : "text"
Symbols : "."
EOL : \n
Letters : "Across"
Whitespace : " "
Digits : "3"
Whitespace : " "
Letters : "lines"
Symbols : "."
EOL : \n
Letters : "With"
Whitespace : " "
Letters : "a"
Whitespace : " "
String : ""multi 'word' string""
Symbols : "."
EOF : ""
- Note the difference between
Letters
andString
, the latter of which is quoted (single, double, or backticks) and can have other quotation symbols embedded within it.
Usage
To get the tokens from a given source text:
var tokens = new Lexer(sourceAsString).GetTokens();
tokens.ForEach(x => Console.WriteLine($"{x.Location.Offset,3}: {x.TokenType} => {x.Content}"));
To do the same, but with each whitespace run compressed to a single space:
var tokens = new Lexer(sourceAsString).GetTokens(true);
tokens.ForEach(x => Console.WriteLine($"{x.Location.Offset,3}: {x.TokenType} => {x.Content}"));
To get the tokens from a given source text as a collection of lines:
var lines = new Lexer(sourceAsString).GetTokensAsLines();
foreach (var line in lines)
{
Console.WriteLine($"Line: {line.Key}");
line.Value.ForEach(x => Console.WriteLine($" {x.Location.Column,3}: {x.TokenType} => {x.Content}"));
}
This call also supports the whitespace compression optional argument to GetTokensAsLines()
.
The output tokens
Token types
These are the default definitions of the available tokens.
- Whitespace - spaces, tabs
- Letters - upper and lower case English alphabet
- Digits -
0
to9
- Symbols - any of
!£$%^&*()-_=+[]{};:'@#~,.<>/?\|
- String - anything enclosed in either
"
,'
, or a backtick - Other - input characters not covered by other types
- EOL - an LF (
\n
); any CRs (\r
) are ignored - EOF - automatically added
Redefining the tokens
You can change the characters underlying the different token types:
var lexer = new Lexer(sourceAsString)
{
Chars = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz",
Digits = "0123456789",
Symbols = "!£$%^&*()-_=+[]{};:'@#~,.<>/?\\|",
Whitespace = " \t",
Quotes = "'\"`",
};
var tokens = lexer.GetTokens();
The Quotes
characters are handled differently from the others. Each one represents a valid start/end character ('terminator'), and the same character must be used to close the string as to open it.
Other quote characters within the string (i.e. between the terminators) are considered plain content within the current string rather than terminators for new strings in their own right.
General comments
- Linux/Unix, Mac OS, and Windows all have a
\n
(LF) in their line endings, so\r
(CR) is discarded and won't appear in any tokens. - There will always be a final EOF token, even for an empty input string.
Product | Versions Compatible and additional computed target framework versions. |
---|---|
.NET | net5.0 was computed. net5.0-windows was computed. net6.0 was computed. net6.0-android was computed. net6.0-ios was computed. net6.0-maccatalyst was computed. net6.0-macos was computed. net6.0-tvos was computed. net6.0-windows was computed. net7.0 was computed. net7.0-android was computed. net7.0-ios was computed. net7.0-maccatalyst was computed. net7.0-macos was computed. net7.0-tvos was computed. net7.0-windows was computed. net8.0 was computed. net8.0-android was computed. net8.0-browser was computed. net8.0-ios was computed. net8.0-maccatalyst was computed. net8.0-macos was computed. net8.0-tvos was computed. net8.0-windows was computed. |
.NET Core | netcoreapp3.0 was computed. netcoreapp3.1 was computed. |
.NET Standard | netstandard2.1 is compatible. |
MonoAndroid | monoandroid was computed. |
MonoMac | monomac was computed. |
MonoTouch | monotouch was computed. |
Tizen | tizen60 was computed. |
Xamarin.iOS | xamarinios was computed. |
Xamarin.Mac | xamarinmac was computed. |
Xamarin.TVOS | xamarintvos was computed. |
Xamarin.WatchOS | xamarinwatchos was computed. |
-
.NETStandard 2.1
- No dependencies.
NuGet packages
This package is not used by any NuGet packages.
GitHub repositories
This package is not used by any popular GitHub repositories.