Axis.Pulsar.Core
0.0.1
See the version list below for details.
dotnet add package Axis.Pulsar.Core --version 0.0.1
NuGet\Install-Package Axis.Pulsar.Core -Version 0.0.1
<PackageReference Include="Axis.Pulsar.Core" Version="0.0.1" />
paket add Axis.Pulsar.Core --version 0.0.1
#r "nuget: Axis.Pulsar.Core, 0.0.1"
// Install Axis.Pulsar.Core as a Cake Addin #addin nuget:?package=Axis.Pulsar.Core&version=0.0.1 // Install Axis.Pulsar.Core as a Cake Tool #tool nuget:?package=Axis.Pulsar.Core&version=0.0.1
Axis.Pulsar.Core
Pulsar.Core
is a library built to recognize context-free grammars.
Contents
Introduction
<a id="Introduction"></a>
Pulsar.Core
is at it's core a grammar recognition library. It does it's business in two steps:
- It presents units that when assembled together, can define any context-free grammar. Each of these units represents specific recognition logic, to be applied according to the assembling order.
- Given any input string, the assembled grammar recognition structure is executed against the input, and a result, success or failure, is reported.
This simplified process is what underpins the working of Pulser.Core
. Internally, it is modelled to closely resemble
the canonical representation of context-free grammar, the famous 4-element tuple: (N, T, P, S)
, where:
N
is a finite set of non-terminals symbols, whichPulsar.Core
refers loosely to asComposites
;T
is a finite set of terminals, whichPulsar.Core
refers to asAtoms
;P
is the finite set of productions;S
is the starting non-terminal symbol, an element ofN
.
Tying all of these structues together is the concept of a Rule, represented internally by the interface Axis.Pulsar.Core.Grammar.IRule
.
A rule is a unit that can ingest a Token
, report if it contains a sequence of characters that conform to a pattern. Pulsar.Core
implements
all elements of the context-free grammar's 4-element tuple as rules.
Tokens
<a id="Tokens"></a>
A good time to mention the Tokens
structure is now. Strings are ingested into Pulsar.Core
in the form of token instances, represented
internally by the Axis.Pulsar.Core.Utils.Tokens
struct. This is akin to the ReadOnlySpan<char>
struct type, without the restriction of being
a ref struct
, and with the addition that the Tokens
instance publicly holds on to the original input string. This affords
Pulsar.Core
a lot of performance benefits as, rather than having new string instances generated by all of the string comparison
operations, the Tokens
are passed around, much like ReadOnlySpan<char>
instances.
Symbol
<a id="Symbol"></a>
A symbol is a string that conforms to the pattern: ^[a-zA-Z]([a-zA-Z0-9-])*\\z
.
Concrete Syntax Trees (CST)
<a id="Concrete-Syntax-Trees-(CST)"></a>
The result of recognizing a string as a product of a grammar, is an instance of Pulsar.Core
's implimentation of a concrete syntax tree.
The Tree is made of nodes which can be one of two types:
Atom
Node: a successfully recognized terminal.Composite
Node: a successfully recognized non-terminal.
Each Node holds at a minimum, the symbol it is designated to. Individually, Atoms
hold a reference to the Tokens
instance, a section of the
input string that was recognized by the terminal; while Composites
hold a reference to all the other nodes that were recognized by the rule
of it's designated symbol. See composites for more details.
Walking the Concrete Syntax Tree
<a id="Walking-the-Concrete-Syntax-Tree"></a>
Pulsar.Core
provides a minimalistic API for searching for nodes within the CST. Rather than using a Tree Walking/Visiting implementation,
Interaction with the CST
is achieved by calling:
FindNodes(...)
: This takes aNodePath
instance, which is converted from it'sNode Query Language
string representation. If the query succeeds, all nodes matching the query are returned.FindAllNodes(...)
: This takes a string representing the node symbol. And the entire tree is searched for all nodes with the given symbol.
Node Query Langauage (NQL)
<a id="Node-Query-Language-(NQL)"></a> NQL is an extremely simple query language used to describe conditions to match nodes against. Considering it is used against a tree of nodes to retrieve nodes that match conditions, it is expressed in heirarchical segments, each segment describing conditions for nodes at that level to meet. It is akin to the file-path format, where each file name can be thought of as a node symbol name, or, conditions that nodes at that level must match.
E.g, node-a/node-b/node-c
. This is a simple query that checks the given node if it has direct children nodes with the name node-a
,
whatever is returned is in turn checked if they have direct children nodes with node-b
as the symbol name, and so on. Executing the query is
done in the following fashion:
var node = ...
var resultNodes = node.FindNodes("node-a/node-b/node-c");
In the example above, the string "node-a/node-b/node-c"
is implicitly converted into an instance of Axis.Pulsar.Core.CST.NodePath
, and the
search operation is run using the instance.
NQL Syntax
<a id="NQL-Syntax"></a>
NQL
can be visualized as a Path
, which consists of Segment
instances, each of which in turn consists of Filter
instances.
Path
: A path is the overall query expression. It contains one or moreSegment
instances, separated by a/
character. Paths represent criteria that the descendants of the target node must meet.Syntax:
'segment'[/'segment']*
Segment
: A segment is just a group of one or more filters. The filters in a segment represent alternative criteria against which each node is tested - in layman's terms, segments constitute a logicalOR
combination of predicates, executed in the order they appear.Syntax:
'filter'[|'filter']*
Filter
: A filter is a predicate that has at least one of a few optional components that the nodes will be tested against. These components include:Node type: a character depicting the type of the node being sought. There 3 possible values for the node type:
t
orT
- forAtom
/Termianl
nodes.n
orN
- forComposite
/Non Terminal
nodes.u
orU
- for when the node type is unspecified, and as such all nodes are accepted.
Syntax:
@'node-character'
e.g@n
or@U
.Symbol name: a literal string value that the name of the symbol must exactly match. Valid names conform to the pattern described in the Symbol section. Note that the delimiter
:
is optional when there is no node-type specified.Syntax:
:'symbol-name'
e.g:'symbol-1'
,'symbol-2'
Tokens: a literal string value that the tokens of the symbol must exactly match. Considering that tokens are wrapped in the delimiters
<
and>
, if the character>
must be included in the tokens, it must be escaped using\>
.Syntax:
<'token text'>
e.g<the quick brown fox>
.
Complete filter syntax examples:
@n:decimal-part<.000034>
- matches nodes that are non-terminal, with the namedecimal-part
, and have the tokens.000034
.day-of-the-week
or:day-of-the-week
- matches any node with the symbol nameday-of-the-week
.@t
- matches any terminal/atomic node.<protected>
, or@u<protected>
- matches any node with the tokenprotected
.
Production
<a id="Production"></a>
A production is, simply put, a rule that maps from a symbol name, to either an AtomicRule
, or a CompositeRule
.
Atomic Symbols
<a id="Atomic-Symbols"></a>
An atom is a symbol whose rule matches the ingested Tokens
to an internally encapsulated pattern. Pulsar.Core
comes with
5 Atom symbols/rules out of the box, all of which are located in the Axis.Pulsar.Core.Grammar.Atomic
namespace:
EOF
: This matches the end of the inputTokens
- meaning, an attempt to read another character from the input will yield "nothing".Literal
: This matches from the current position in the inputTokens
, a string of characters that are equal to a literal string encapsulated by the symbol/rule. By default, this uses case-sensitive matching, but this can be turned off by passing the appropriate argument to the symbol upon creation.Pattern
: This matches input characters against an internally encapsulated regular expression. The pattern rule operates under two configurations:Open Match
: In this state, characters are read from the input, one at a time, and matched against the regular expression, until a given number of failures is encountered, at which point, whatever was matched prior to the first failure is reported as the matched tokens.Closed Match
: In this state, a given max number of characters are read from the input, then matched against the encapsulated regular expression. With each failure, a character is removed from the end of the tokens, till either a match is found, or a minimum number of characters is reached.
CharRange
: This is setup of with 2 lists, each of disjointed ranges of characters; one list represents an "include" list, while the other represetns an "exclude" list. TheCharRange
rule works by reading a character from the input, and making sure it is in the include list, and not in the exclude list. Note that either of the lists may be absent, but not both. A character range is simply a 2-element tuple of characters, with one character being a lower boundary, and the other being the upper boundary.DelimitedString
: This rule matches matches tokens that conform to a pattern that contains the following elements:- Start Delimiter: a sequence of characters that marks the start of this parttern.
- End Delimiter: a sequence of characters that marks the end of this pattern.
- Accepts empty value: boolean indicating if the start and end delimiters can exist without any characters between them.
- End Delimiter Escape Sequence: a sequence of characters that "escape" the end-delimiter, allowing the rule to recognize cases where an existence of the end-delimiter is not recognized as the end-delimiter.
- Legal/Illegal sequeces: sequences of characters that must be found or must not be found bewteen the start and end delimiters.
- Legal/Illegal character ranges: ranges of characters that must or must not be found between the start and end delimiters.
The point of extension is the Axis.Pulsar.Core.Grammar.Atomic.IAtomicRule
interface. Any instance of this interface is accepted
while building up the Grammer
via the Axis.Pulsar.Core.Lang.ILanguageImporter
API.
Composite and Group Symbols
<a id="Composite-and-Group-Symbols"></a>
A composite symbol is a rule that recognizes one or more rules according to a specified order. Composite rules are Pulsar.Core
's
implementation of Non-Terminal symbols, and as such, there is only one implementation of the ICompositeRule
interface, the
Axis.Pulsar.Core.Grammar.Composite.NonTerminal
class.
The non-terminal simply delegates recognition responsibilities to it's encapculated IGroupRule
instance.
Group Symbols
<a id="Group-Symbols"></a>
Group symbols are what implement the specialized ordering of encapsulated symbols; they each contain one or more Group Element
.
Each element possesses a Cardinality
property that facilitates recognizing repetitions of the element. Pulsar.Core
comes with 5
types of groups:
Choice
: A choice functions by searching for the first of it's element rules to successsfully recognize characters from the input tokens. Soon as a successful recognition is found, it is reported.Sequence
: A sequence functions by ensuring all element rules successfully recognize characters from the input string, in the order that they exist internally.Set
: A set functions by ensureing all element rules successfully recognize characters from the input string, but unlike it's sibling,Sequence
,Set
does not enforce any order. It only enforces that a given minimum recognition-count be successful.Production Ref
: A production ref is a reference to another production. This is howPulsar.Core
is able to nest symbols inside other symbols, constituting the composition of symbols to make a production.Atom Ref
:Pulsar.Core
enables inline-expression of atomic symbols/rules. What this means is, rather than ONLY having atomic symbols appearing as the single symbol of a production, atoms can be placed right within a composite rule.
Cardinality
<a id="Cardinality"></a>
Cardinality
expresses the number of occurences expected for a Group Element
. Cardinality provides two values, a min occurence
,
and a max occurence
value. Obviously, the min cannot be greater than the max. However, the max value can also indicate "infinity",
signifying a situation where no limit is specified. This is modeled similar to cardinality in regular expressions.
Syntax Validation
<a id="Syntax-Validation"></a>
A simple validation API is provided by Pulsar.Core
, enabling users plug into the recognition execution and provide validation
services. Valitation is triggered after every Production
is successfully recognized, and the recognized CST Node
is passed into
the validation function, for which any exceptions thrown signals a an invalid recognition.
Recognition
<a id="Recognition"></a>
Recognition within Pulsar.Core
is a process where every Rule
reads a sufficient number of characters from the TokenBuffer
that conforms to the rules internal expectations. The following assumptions/invariants MUST always be true for every recognition
operation:
- The
TokenBuffer
is initially at a position where the next read character has not be recognized by any previous rule. - Upon successful recognition, the
TokenBuffer
is in a position pointing to the last recognized character of the current rule. - Upon failed recognition, the
TokenBuffer
is left in a state exactly as it was initially passed into the rule. - There are 3 recognition results
Success - returns the designated success object, e.g the
CST Node
.FailedRecognition - signifies that a recognition was not possible, and other alternative rules may be tried if possible.
PartialRecognition - signifies that recognition was only partially possible. This is a fatal error and all further recongition operations are aborted. This usually happens when the unique "delimiters" for symbols were recognized, but further characters failed to be recognized. E.g, a rule that must match a signed decimal integer defined by the following pseudo-production:
decimal-integer = {/[+-]/, \d+\}
Given the input
"+abc"
, the production will return aPartialRecognition
result, because a+
was recognized, but the rest of the expected characters weren't found.
Language IO
<a id="Language-IO"></a>
Pular.Core
only facilitates building grammars with the programatic structures described above, as such, it has no language of
its own to enable the declaritive expression of context-free grammar. It however provides an extension point for any
grammar-expression language to plug into it's APIs. This happens through the interfaces in the Axis.Pulsar.Core.Lang
namespace.
ILanguageContext
: At the heart ofPulsar.Core
is the language context. This contains all information needed to ingest a string of characters, and return aCST
representation of the recognized grammar.ILanguageExporter
: This interface provides an API for grammar-expression languages to convert theIGrammar
within aILanguageContext
into a string representation of it's grammar-expression langauge.ILanguageImporter
: This interfaces provides an API for grammar-expression languages to convert their string representations into the various constructs provided byPulsar.Core
.
Product | Versions Compatible and additional computed target framework versions. |
---|---|
.NET | net7.0 is compatible. net7.0-android was computed. net7.0-ios was computed. net7.0-maccatalyst was computed. net7.0-macos was computed. net7.0-tvos was computed. net7.0-windows was computed. net8.0 was computed. net8.0-android was computed. net8.0-browser was computed. net8.0-ios was computed. net8.0-maccatalyst was computed. net8.0-macos was computed. net8.0-tvos was computed. net8.0-windows was computed. |
-
net7.0
- Axis.Luna.Common (>= 6.0.29)
- System.Collections.Immutable (>= 8.0.0)
NuGet packages (2)
Showing the top 2 NuGet packages that depend on Axis.Pulsar.Core:
Package | Downloads |
---|---|
Axis.Pulsar.Core.XBNF
eXtended BNF implementation of the Axis.Pulsar.Core grammar recognizer |
|
Axis.Dia.PathQuery
Package Description |
GitHub repositories
This package is not used by any popular GitHub repositories.