Token of a very basic C++ expression. The primary use case is for rudemtary console syntax coloring. More...

#include <expression_inspection.hpp>

Public Types
enum class	Kind { Unknown , StringLiteral , RawStringLiteral , CharLiteral , BoolLiteral , IntBinLiteral , IntOctLiteral , IntDecLiteral , IntHexLiteral , FloatLiteral , FloatHexLiteral , Identifier , Grouping , BinaryOperator , Operator , EndDelimiter }
	The kind of token. More...

Public Member Functions
constexpr bool	operator== (const Token &) const =default

Public Attributes
Kind	kind

std::string_view	str

Detailed Description

Token of a very basic C++ expression. The primary use case is for rudemtary console syntax coloring.

Tokenization is essentially implemented as a fancy lexer, where there is no resultant syntax tree and instead a simple 1D stream of tokens.

For instance, parsing the expression x + y > "abc" would generate the following stream of tokens: Identifier, Operator, Identifier, Operator, StringLiteral

Member Enumeration Documentation

◆ Kind

enum class asmgrader::inspection::Token::Kind

strong

The kind of token.

A modified version of EBNF is used to document enumerators.

".." is a contiguous alternation over ASCII encoded values, inclusive
An 'i' after a string terminal means case insensitive
Sequences are implicitly concatenated without ','
'/' denotes removal of chars on rhs from the lhs Ex: "abcdef" / "cd" - this is equiv. to "abef"
A "{<low>,<high>}" qualifier after a token means limited repitition, where low and high are both inclusive and either may be ommitted.

All definitions are implicitly defined with the maximal munch rule. https://en.wikipedia.org/wiki/Maximal_munch

See this for the basic version: https://en.wikipedia.org/wiki/Extended_Backus%E2%80%93Naur_form

Enumerator
Unknown	Under normal cases, this should be impossible. It's a saner option for a default, though, in case of a bad parse.
StringLiteral	https://en.cppreference.com/w/cpp/language/string_literal.html StringLiteral = [ strlike-prefix ] '"' { character } ''' strlike-prefix = 'L' \| 'u'i [ '8' ] character = ANY_CHAR / '"\' \| ESCAPE_SEQ
RawStringLiteral	https://en.cppreference.com/w/cpp/language/string_literal.html RawStringLiteral = [ strlike-prefix ] 'R"' d-char-seq '(' { character } ')' d-char-seq '"' strlike-prefix = 'L' \| 'u'i [ '8' ] d-char-seq = ( character / '\‍()' - WHITESPACE ){,16} character = ANY_CHAR
CharLiteral	https://en.cppreference.com/w/cpp/language/character_literal.html CharLiteral = [ strlike-prefix ] "'" char "'" \| c-multi-char strlike-prefix = 'L' \| 'u'i [ '8' ] char = ANY_CHAR / "'\" \| ESCAPE_SEQ c-multi-char = [ 'L' ] "'" { char } "'" I can't think of any good reasons to use a multi-char literal, but let's support it anyways as it's trivial to implement.
BoolLiteral	'true' or 'false'. That's it.
IntBinLiteral	https://en.cppreference.com/w/cpp/language/integer_literal.html See IntDecLiteral
IntOctLiteral	https://en.cppreference.com/w/cpp/language/integer_literal.html See IntDecLiteral This includes '0'
IntDecLiteral	https://en.cppreference.com/w/cpp/language/integer_literal.html Not all terminals are defined, but they should be rather obvious anyways. IntLiteral = ( '0x'i hex-seq \| dec-seq \| '0' oct-seq \| '0b'i bin-seq ) [ integer-suffix ] hex-digits = ( '0'..'9' \| 'a'i..'f'i ) { '0'..'9' \| 'a'i..'f'i \| DIGIT_SEP } dec-digits = ( '1'..'9' ) { '0'..'9' \| DIGIT_SEP } oct-digits = ( '0'..'7' ) { '0'..'8' \| DIGIT_SEP } bin-digits = ( '0' \| '1' ) { '0' \| '1' \| DIGIT_SEP } integer-suffix = 'u'i [ 'l'i \| 'll'i ] DIGIT_SEP = "'"
IntHexLiteral	https://en.cppreference.com/w/cpp/language/integer_literal.html See IntDecLiteral
FloatLiteral	https://en.cppreference.com/w/cpp/language/floating_literal.html FloatLiteral = dec-value floating-point-suffix dec-value = dec-digits dec-exp \| dec-digits '.' [ dec-exp ] \| [ dec-digits ] '.' dec-digits [ dec-exp ] dec-digits = ( '1'..'9' ) { '0'..'9' \| DIGIT_SEP } dec-exp = 'e'i [ SIGN ] dec-seq SIGN = '+' \| '-' floating-point-suffix = 'f'i \| 'l'i
FloatHexLiteral	https://en.cppreference.com/w/cpp/language/floating_literal.html FloatHexLiteral = hex-val floating-point-suffix hex-value = '0x'i hex-val-nopre hex-val-nopre = hex-digits hex-exp \| hex-digits '.' hex-exp \| [ hex-digits ] '.' hex-digits hex-exp hex-digits = ( '0'..'9' \| 'a'i..'f'i ) { '0'..'9' \| 'a'i..'f'i \| DIGIT_SEP } hex-exp = 'p'i [ SIGN ] dec-seq SIGN = '+' \| '-' floating-point-suffix = 'f'i \| 'l'i
Identifier	https://en.cppreference.com/w/cpp/language/identifiers.html Identifier = ident-start { ident-start \| '0'..'9' } ident-start = 'a'i..'z'i \| '_'
Grouping	Imperatively defined as: '{', '}' '(', ')' - when not as a function call '<', '>' - in template context.
BinaryOperator	https://en.cppreference.com/w/cpp/language/operator_precedence.html (Note that, contrary to the title of this link, this impl has no concept of operator precedence) Only operators with 2 operands. Imperatively defined any of the symbols in the list below when they are not part of a previously defined token and do not meet the requirements for Grouping. Perhaps a little confusingly, since the token stream is flat, operators like `a[]` produce 2 seperate operator tokens of '[' and ']'. '::' '.', '->' '.', '->' '+', '-', '', '/', '', '<<', ">>', '^', '\|', '&', '&&', '\|\|' '==', '!=', '<=>', '<', '<=', '>', '>=' '=', '+=', '-=', '=', '/=', '%=', '<<=', '>>=', '&=', '^=', '\|=', ','
Operator	https://en.cppreference.com/w/cpp/language/operator_precedence.html (Note that, contrary to the title of this link, this impl has no concept of operator precedence) Unary, ternary, and (n>3)-ary (i.e., function call) operators Imperatively defined any of the symbols in the list below when they are not part of a previously defined token and do not meet the requirements for Grouping. Perhaps a little confusingly, since the token stream is flat, operators like `a[]` produce 2 seperate operator tokens of '[' and ']'. '++', '–' *** no distinction between pre and post '(', ')', '[', ']' '.', '->' '+', '-', *** unary only '~' '!', '*', '&' 'throw', 'sizeof', 'alignof', 'new', 'delete', 'const_cast', 'static_cast', 'dynamic_cast', 'reinterpret_cast', '?', ':' also includes literal operators
EndDelimiter	Deliminates the end of the token sequence. Also serves to obtain a count of the number of token types, as this is guaranteed to be defined as the last enumerator.

Member Function Documentation

◆ operator==()

bool asmgrader::inspection::Token::operator== ( const Token & ) const

constexprdefault

Member Data Documentation

◆ kind

Kind asmgrader::inspection::Token::kind

◆ str

std::string_view asmgrader::inspection::Token::str

The documentation for this struct was generated from the following file:

include/asmgrader/api/expression_inspection.hpp

Public Types

Public Member Functions

Public Attributes