MetaphysicsIndustries.Giza 0.4.0

dotnet add package MetaphysicsIndustries.Giza --version 0.4.0
NuGet\Install-Package MetaphysicsIndustries.Giza -Version 0.4.0
This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.
<PackageReference Include="MetaphysicsIndustries.Giza" Version="0.4.0" />
For projects that support PackageReference, copy this XML node into the project file to reference the package.
paket add MetaphysicsIndustries.Giza --version 0.4.0
#r "nuget: MetaphysicsIndustries.Giza, 0.4.0"
#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.
// Install MetaphysicsIndustries.Giza as a Cake Addin
#addin nuget:?package=MetaphysicsIndustries.Giza&version=0.4.0

// Install MetaphysicsIndustries.Giza as a Cake Tool
#tool nuget:?package=MetaphysicsIndustries.Giza&version=0.4.0

MetaphysicsIndustries.Giza

A parser

Intro

Giza is a parser, but not in the same vein as what you're used to. For one, it's not a parser-generator. Rather than turning a grammar into a bunch of source code, it instead converts the grammar into a ready-to-use form, and can then start parsing immediately. No compiling necessary!

One of the cool benefits of this is that you can try out and edit your grammar in real time. Take a look at the included giza program. It's actually a REPL for editing grammars and parsing text. How cool is that!?

Giza would probably be classified as a kind of GLR parser. It goes through all possible paths without the need for backtracking. It can also handle ambiguities, by producing multiple parse trees, if needed. The format of its grammars is not quite BNF; instead the syntax is more inspired by programming languages and regexes. The syntax of grammars is described entirely by the "Supergrammar", which is written in it's own syntax. Take a look at Supergrammar.txt.

Giza maintains a distinction between "tokenized" and "non-tokenized" processing. The former is what parsers usually do, constructing a parse tree from a stream of input tokens. The latter is more akin to what regexes do, matching against a stream of input characters instead. Herein, non-tokenized processing often goes by the name "spanning" and is done by a "spanner".

Example

Here's how you would work with a grammar in the REPL:

>>> expr = ( mult-expr | add-expr | sub-expr );
>>> mult-expr = sub-expr ( [*/%] sub-expr )+;
>>> add-expr = ( mult-expr | sub-expr ) ( [-+] ( mult-expr | sub-expr ) )+;
>>> sub-expr = ( number | var | paren );
>>> <token> number = [\d]+;
>>> <token> var = [\l]+;

Then you can check the definitions for errors like this:

>>> check --tokenized
There are errors in the grammar:
  Definition 'sub-expr' references a definition 'paren' which is not defined.

Oops, we forgot to define paren:

>>> paren = '(' expr ')';
>>> check --tokenized
There are no errors or warnings.

Once that's settled, we can parse some text:

>>> parse expr '1 + 2 * 3 - four / 5 % 6 + seven'
There is 1 valid parse of the input.

To get more info, use the --verbose flag to print out the parse tree:

>>> parse expr --verbose '1 + 2 * 3 - four / 5 % 6 + seven'
There is 1 valid parse of the input.
          expr
            add-expr
              sub-expr
1               number
+             $implicit char class +-
              mult-expr
                sub-expr
2                 number
*               $implicit char class %*/
                sub-expr
3                 number
-             $implicit char class +-
              mult-expr
                sub-expr
four              var
/               $implicit char class %*/
                sub-expr
5                 number
%               $implicit char class %*/
                sub-expr
6                 number
+             $implicit char class +-
              sub-expr
seven           var

In the left column, you see the sequence of tokens in the input. On the right is the parse tree with indentation to show hierarchy, with each matching node on the same line of text as the token that matched it.

But that's not all. There's tons of stuff the REPL can do. It has a built-in help system to explain everything:

>>> help
Usage:
    >>> [options]
    >>> help [command_or_topic]
    >>> command [args...]

Commands:

    help       Display general help, or help on a specific topic.
    list       List all of the definitions currently defined.
    print      Print definitions as text in giza grammar format
    delete     Delete the specified definitions.
    save       Save definitions to a file as text in giza grammar format
    load       Load definitions from a file
    check      Check definitions for errors
    parse      Parse one or more inputs with a tokenized grammar, starting with a given definition, and print how many valid parse trees are found
    span       Span one or more inputs with a non-tokenized grammar, starting with a given definition, and print how many valid span trees are found
    render     Convert definitions to state machine format and render the state machines to a C# class.

Example in Code

Once you've worked out your language's grammar, you typically want to use it in some other application. To do that, follow these steps:

  1. Save all relevant definitions to a file. By convention, the file extension is .giza, but you can use anything. See an example here.
  2. Use the render command. This takes your grammar, converts all of the definitions into state machine format, and then emits the C# code for a class that creates the same state machine representation<sup>1</sup>. See example here.
  3. Create a class that takes your *Grammar class and plugs it into a Parser object. Whenever your want to parse some input text, pass it to the Parser.Parse method to get a list of parse trees. See example here.
  4. [Optional] If there's more than one parse tree, then there's some ambiguity in your grammar with that particular input. You can do some kind of semantic analysis to choose which of the parse trees is the 'right' one. Or you can just pick the first one, if you're lazy.
  5. Convert the generalized parse tree(s) into whatever domain objects you need. See example here.

<sup>1</sup> This kinda violates the "not turning a grammar into a bunch of source code" assertion above, but not entirely. The code so generated is not capable of parsing anything. It's basically a data structure serialized as C#. We could just as well store the raw text of the grammar file and run it through SupergrammarSpanner to generate the state machine representation, which is what the render command does for you. Whatever. We're working on making it prettier.

State of the project

Unfortunately, you caught us right in the middle of a major architecture overhaul. Much of the internals of how the system work are being completely re-worked. In particular, we're trying to treat tokenized and non-tokenized processing as special cases of a generalized pattern-matching system. Hence, the hideous Spanner1/Spanner2 dichotomy. It's a work in progress. Code and API may change at any time, although the command-line tool will be pretty stable. There's a lot of cosmetic stuff to be fixed up as well, such as the fact that Supergrammar.txt should really be named Supergrammar.giza. Stay tuned!

Product Compatible and additional computed target framework versions.
.NET net6.0 is compatible.  net6.0-android was computed.  net6.0-ios was computed.  net6.0-maccatalyst was computed.  net6.0-macos was computed.  net6.0-tvos was computed.  net6.0-windows was computed.  net7.0 is compatible.  net7.0-android was computed.  net7.0-ios was computed.  net7.0-maccatalyst was computed.  net7.0-macos was computed.  net7.0-tvos was computed.  net7.0-windows was computed.  net8.0 was computed.  net8.0-android was computed.  net8.0-browser was computed.  net8.0-ios was computed.  net8.0-maccatalyst was computed.  net8.0-macos was computed.  net8.0-tvos was computed.  net8.0-windows was computed. 
.NET Framework net40 is compatible.  net403 was computed.  net45 is compatible.  net451 is compatible.  net452 is compatible.  net46 is compatible.  net461 is compatible.  net462 is compatible.  net463 was computed.  net47 is compatible.  net471 is compatible.  net472 is compatible.  net48 is compatible.  net481 was computed. 
Compatible target framework(s)
Included target framework(s) (in package)
Learn more about Target Frameworks and .NET Standard.
  • .NETFramework 4.0

    • No dependencies.
  • .NETFramework 4.5

    • No dependencies.
  • .NETFramework 4.5.1

    • No dependencies.
  • .NETFramework 4.5.2

    • No dependencies.
  • .NETFramework 4.6

    • No dependencies.
  • .NETFramework 4.6.1

    • No dependencies.
  • .NETFramework 4.6.2

    • No dependencies.
  • .NETFramework 4.7

    • No dependencies.
  • .NETFramework 4.7.1

    • No dependencies.
  • .NETFramework 4.7.2

    • No dependencies.
  • .NETFramework 4.8

    • No dependencies.
  • net6.0

    • No dependencies.
  • net7.0

    • No dependencies.

NuGet packages (1)

Showing the top 1 NuGet packages that depend on MetaphysicsIndustries.Giza:

Package Downloads
MetaphysicsIndustries.Solus

Mathematics system

GitHub repositories

This package is not used by any popular GitHub repositories.

Version Downloads Last updated
0.4.0 473 12/10/2022
0.3.0 1,497 7/21/2021
0.2.0 1,081 7/16/2021

Target newer frameworks.