AsmGrader 0.0.0
Loading...
Searching...
No Matches
asmgrader::inspection::Token Struct Reference

Token of a very basic C++ expression. The primary use case is for rudemtary console syntax coloring. More...

#include <expression_inspection.hpp>

Public Types

enum class  Kind {
  Unknown , StringLiteral , RawStringLiteral , CharLiteral ,
  BoolLiteral , IntBinLiteral , IntOctLiteral , IntDecLiteral ,
  IntHexLiteral , FloatLiteral , FloatHexLiteral , Identifier ,
  Grouping , BinaryOperator , Operator , EndDelimiter
}
 The kind of token. More...
 

Public Member Functions

constexpr bool operator== (const Token &) const =default
 

Public Attributes

Kind kind
 
std::string_view str
 

Detailed Description

Token of a very basic C++ expression. The primary use case is for rudemtary console syntax coloring.

Tokenization is essentially implemented as a fancy lexer, where there is no resultant syntax tree and instead a simple 1D stream of tokens.

For instance, parsing the expression x + y > "abc" would generate the following stream of tokens: Identifier, Operator, Identifier, Operator, StringLiteral

Member Enumeration Documentation

◆ Kind

The kind of token.

A modified version of EBNF is used to document enumerators.

  • ".." is a contiguous alternation over ASCII encoded values, inclusive
  • An 'i' after a string terminal means case insensitive
  • Sequences are implicitly concatenated without ','
  • '/' denotes removal of chars on rhs from the lhs Ex: "abcdef" / "cd" - this is equiv. to "abef"
  • A "{<low>,<high>}" qualifier after a token means limited repitition, where low and high are both inclusive and either may be ommitted.

All definitions are implicitly defined with the maximal munch rule. https://en.wikipedia.org/wiki/Maximal_munch

See this for the basic version: https://en.wikipedia.org/wiki/Extended_Backus%E2%80%93Naur_form

Enumerator
Unknown 

Under normal cases, this should be impossible. It's a saner option for a default, though, in case of a bad parse.

StringLiteral 

https://en.cppreference.com/w/cpp/language/string_literal.html

 StringLiteral =  [ strlike-prefix ] '"' { character } '''
 strlike-prefix = 'L' | 'u'i  [ '8' ]
 character = ANY_CHAR / '"\' | ESCAPE_SEQ
RawStringLiteral 

https://en.cppreference.com/w/cpp/language/string_literal.html

 RawStringLiteral = [ strlike-prefix ] 'R"' d-char-seq '(' { character } ')' d-char-seq '"'
 strlike-prefix = 'L' | 'u'i  [ '8' ]
 d-char-seq = ( character / '\‍()' - WHITESPACE ){,16}
 character = ANY_CHAR
CharLiteral 

https://en.cppreference.com/w/cpp/language/character_literal.html

 CharLiteral = [ strlike-prefix ] "'" char "'" | c-multi-char
 strlike-prefix = 'L' | 'u'i  [ '8' ]
 char = ANY_CHAR / "'\" | ESCAPE_SEQ
 c-multi-char = [ 'L' ] "'" { char } "'"

 I can't think of any good reasons to use a multi-char literal, but let's support it anyways as it's
 trivial to implement.
BoolLiteral 

'true' or 'false'. That's it.

IntBinLiteral 

https://en.cppreference.com/w/cpp/language/integer_literal.html See IntDecLiteral

IntOctLiteral 

https://en.cppreference.com/w/cpp/language/integer_literal.html See IntDecLiteral This includes '0'

IntDecLiteral 

https://en.cppreference.com/w/cpp/language/integer_literal.html

 Not all terminals are defined, but they should be rather obvious anyways.

 IntLiteral = ( '0x'i hex-seq | dec-seq | '0' oct-seq | '0b'i bin-seq ) [ integer-suffix ]

 hex-digits = ( '0'..'9' | 'a'i..'f'i ) { '0'..'9' | 'a'i..'f'i | DIGIT_SEP }
 dec-digits = ( '1'..'9' ) { '0'..'9'  | DIGIT_SEP }
 oct-digits =  ( '0'..'7' ) { '0'..'8'  | DIGIT_SEP }
 bin-digits = ( '0' | '1' ) { '0' | '1' | DIGIT_SEP }

 integer-suffix = 'u'i  [ 'l'i | 'll'i ]
 DIGIT_SEP = "'"
IntHexLiteral 

https://en.cppreference.com/w/cpp/language/integer_literal.html See IntDecLiteral

FloatLiteral 

https://en.cppreference.com/w/cpp/language/floating_literal.html

 FloatLiteral = dec-value floating-point-suffix

 dec-value = dec-digits dec-exp
           | dec-digits '.' [ dec-exp ]
           | [ dec-digits ] '.' dec-digits [ dec-exp ]

 dec-digits = ( '1'..'9' ) { '0'..'9'  | DIGIT_SEP }
 dec-exp = 'e'i [ SIGN ] dec-seq

 SIGN = '+' | '-'

 floating-point-suffix = 'f'i | 'l'i
FloatHexLiteral 

https://en.cppreference.com/w/cpp/language/floating_literal.html

 FloatHexLiteral = hex-val floating-point-suffix

 hex-value = '0x'i hex-val-nopre
 hex-val-nopre = hex-digits hex-exp
               | hex-digits '.' hex-exp
               | [ hex-digits ] '.' hex-digits hex-exp

 hex-digits = ( '0'..'9' | 'a'i..'f'i ) { '0'..'9' | 'a'i..'f'i | DIGIT_SEP }
 hex-exp = 'p'i [ SIGN ] dec-seq

 SIGN = '+' | '-'

 floating-point-suffix = 'f'i | 'l'i
Identifier 

https://en.cppreference.com/w/cpp/language/identifiers.html

 Identifier = ident-start { ident-start | '0'..'9' }
 ident-start = 'a'i..'z'i | '_'
Grouping 

Imperatively defined as: '{', '}' '(', ')' - when not as a function call '<', '>' - in template context.

BinaryOperator 

https://en.cppreference.com/w/cpp/language/operator_precedence.html (Note that, contrary to the title of this link, this impl has no concept of operator precedence)

Only operators with 2 operands. Imperatively defined any of the symbols in the list below when they are not part of a previously defined token and do not meet the requirements for Grouping. Perhaps a little confusingly, since the token stream is flat, operators like a[] produce 2 seperate operator tokens of '[' and ']'.

'::' '.', '->' '.*', '->' '+', '-', '', '/', '', '<<', ">>', '^', '|', '&', '&&', '||' '==', '!=', '<=>', '<', '<=', '>', '>=' '=', '+=', '-=', '*=', '/=', '%=', '<<=', '>>=', '&=', '^=', '|=', ','

Operator 

https://en.cppreference.com/w/cpp/language/operator_precedence.html (Note that, contrary to the title of this link, this impl has no concept of operator precedence)

Unary, ternary, and (n>3)-ary (i.e., function call) operators Imperatively defined any of the symbols in the list below when they are not part of a previously defined token and do not meet the requirements for Grouping. Perhaps a little confusingly, since the token stream is flat, operators like a[] produce 2 seperate operator tokens of '[' and ']'. '++', '–' *** no distinction between pre and post '(', ')', '[', ']' '.*', '->' '+', '-', *** unary only '~' '!', '', '&' 'throw', 'sizeof', 'alignof', 'new', 'delete', 'const_cast', 'static_cast', 'dynamic_cast', 'reinterpret_cast', '?', ':' also includes literal operators

EndDelimiter 

Deliminates the end of the token sequence. Also serves to obtain a count of the number of token types, as this is guaranteed to be defined as the last enumerator.

Member Function Documentation

◆ operator==()

bool asmgrader::inspection::Token::operator== ( const Token & ) const
constexprdefault

Member Data Documentation

◆ kind

Kind asmgrader::inspection::Token::kind

◆ str

std::string_view asmgrader::inspection::Token::str

The documentation for this struct was generated from the following file: