|
AsmGrader 0.0.0
|
Token of a very basic C++ expression. The primary use case is for rudemtary console syntax coloring. More...
#include <expression_inspection.hpp>
Public Types | |
| enum class | Kind { Unknown , StringLiteral , RawStringLiteral , CharLiteral , BoolLiteral , IntBinLiteral , IntOctLiteral , IntDecLiteral , IntHexLiteral , FloatLiteral , FloatHexLiteral , Identifier , Grouping , BinaryOperator , Operator , EndDelimiter } |
| The kind of token. More... | |
Public Member Functions | |
| constexpr bool | operator== (const Token &) const =default |
Public Attributes | |
| Kind | kind |
| std::string_view | str |
Token of a very basic C++ expression. The primary use case is for rudemtary console syntax coloring.
Tokenization is essentially implemented as a fancy lexer, where there is no resultant syntax tree and instead a simple 1D stream of tokens.
For instance, parsing the expression x + y > "abc" would generate the following stream of tokens: Identifier, Operator, Identifier, Operator, StringLiteral
|
strong |
The kind of token.
A modified version of EBNF is used to document enumerators.
low and high are both inclusive and either may be ommitted.All definitions are implicitly defined with the maximal munch rule. https://en.wikipedia.org/wiki/Maximal_munch
See this for the basic version: https://en.wikipedia.org/wiki/Extended_Backus%E2%80%93Naur_form
| Enumerator | |
|---|---|
| Unknown | Under normal cases, this should be impossible. It's a saner option for a default, though, in case of a bad parse. |
| StringLiteral | https://en.cppreference.com/w/cpp/language/string_literal.html StringLiteral = [ strlike-prefix ] '"' { character } '''
strlike-prefix = 'L' | 'u'i [ '8' ]
character = ANY_CHAR / '"\' | ESCAPE_SEQ
|
| RawStringLiteral | https://en.cppreference.com/w/cpp/language/string_literal.html RawStringLiteral = [ strlike-prefix ] 'R"' d-char-seq '(' { character } ')' d-char-seq '"'
strlike-prefix = 'L' | 'u'i [ '8' ]
d-char-seq = ( character / '\()' - WHITESPACE ){,16}
character = ANY_CHAR
|
| CharLiteral | https://en.cppreference.com/w/cpp/language/character_literal.html CharLiteral = [ strlike-prefix ] "'" char "'" | c-multi-char
strlike-prefix = 'L' | 'u'i [ '8' ]
char = ANY_CHAR / "'\" | ESCAPE_SEQ
c-multi-char = [ 'L' ] "'" { char } "'"
I can't think of any good reasons to use a multi-char literal, but let's support it anyways as it's
trivial to implement.
|
| BoolLiteral | 'true' or 'false'. That's it. |
| IntBinLiteral | https://en.cppreference.com/w/cpp/language/integer_literal.html See IntDecLiteral |
| IntOctLiteral | https://en.cppreference.com/w/cpp/language/integer_literal.html See IntDecLiteral This includes '0' |
| IntDecLiteral | https://en.cppreference.com/w/cpp/language/integer_literal.html Not all terminals are defined, but they should be rather obvious anyways.
IntLiteral = ( '0x'i hex-seq | dec-seq | '0' oct-seq | '0b'i bin-seq ) [ integer-suffix ]
hex-digits = ( '0'..'9' | 'a'i..'f'i ) { '0'..'9' | 'a'i..'f'i | DIGIT_SEP }
dec-digits = ( '1'..'9' ) { '0'..'9' | DIGIT_SEP }
oct-digits = ( '0'..'7' ) { '0'..'8' | DIGIT_SEP }
bin-digits = ( '0' | '1' ) { '0' | '1' | DIGIT_SEP }
integer-suffix = 'u'i [ 'l'i | 'll'i ]
DIGIT_SEP = "'"
|
| IntHexLiteral | https://en.cppreference.com/w/cpp/language/integer_literal.html See IntDecLiteral |
| FloatLiteral | https://en.cppreference.com/w/cpp/language/floating_literal.html FloatLiteral = dec-value floating-point-suffix
dec-value = dec-digits dec-exp
| dec-digits '.' [ dec-exp ]
| [ dec-digits ] '.' dec-digits [ dec-exp ]
dec-digits = ( '1'..'9' ) { '0'..'9' | DIGIT_SEP }
dec-exp = 'e'i [ SIGN ] dec-seq
SIGN = '+' | '-'
floating-point-suffix = 'f'i | 'l'i
|
| FloatHexLiteral | https://en.cppreference.com/w/cpp/language/floating_literal.html FloatHexLiteral = hex-val floating-point-suffix
hex-value = '0x'i hex-val-nopre
hex-val-nopre = hex-digits hex-exp
| hex-digits '.' hex-exp
| [ hex-digits ] '.' hex-digits hex-exp
hex-digits = ( '0'..'9' | 'a'i..'f'i ) { '0'..'9' | 'a'i..'f'i | DIGIT_SEP }
hex-exp = 'p'i [ SIGN ] dec-seq
SIGN = '+' | '-'
floating-point-suffix = 'f'i | 'l'i
|
| Identifier | https://en.cppreference.com/w/cpp/language/identifiers.html Identifier = ident-start { ident-start | '0'..'9' }
ident-start = 'a'i..'z'i | '_'
|
| Grouping | Imperatively defined as: '{', '}' '(', ')' - when not as a function call '<', '>' - in template context. |
| BinaryOperator | https://en.cppreference.com/w/cpp/language/operator_precedence.html (Note that, contrary to the title of this link, this impl has no concept of operator precedence) Only operators with 2 operands. Imperatively defined any of the symbols in the list below when they are not part of a previously defined token and do not meet the requirements for Grouping. Perhaps a little confusingly, since the token stream is flat, operators like '::' '.', '->' '.*', '->' '+', '-', '', '/', '', '<<', ">>', '^', '|', '&', '&&', '||' '==', '!=', '<=>', '<', '<=', '>', '>=' '=', '+=', '-=', '*=', '/=', '%=', '<<=', '>>=', '&=', '^=', '|=', ',' |
| Operator | https://en.cppreference.com/w/cpp/language/operator_precedence.html (Note that, contrary to the title of this link, this impl has no concept of operator precedence) Unary, ternary, and (n>3)-ary (i.e., function call) operators Imperatively defined any of the symbols in the list below when they are not part of a previously defined token and do not meet the requirements for Grouping. Perhaps a little confusingly, since the token stream is flat, operators like |
| EndDelimiter | Deliminates the end of the token sequence. Also serves to obtain a count of the number of token types, as this is guaranteed to be defined as the last enumerator. |
|
constexprdefault |
| Kind asmgrader::inspection::Token::kind |
| std::string_view asmgrader::inspection::Token::str |