Tokens

Next: Reserved Identifiers Up: Concrete Syntax Previous: Ambiguity Resolution Contents

Tokens

Table A.10 gives a BNF description of the tokens of the language. The token classes Delim, Integer, Id, and Text, and the individual tokens in the classes Punc, TwoPunc, and Keyword, serve as terminals in the BNF of earlier sections.

**Table A.10:** BNF for the tokens of the Vesta SDL.
$\begin{table}{\footnotesize\begin{verbatim}TokenSeq ::= Token* Token ::= Integ... ... NonNewlineChar* Newline \vert \lq /' CommentBody \lq /'\end{verbatim}} \end{table}$

We define Newline as an ASCII new line sequence, either CR, LF, or CRLF. NonNewlineChar is any ASCII character other than CR and LF. CommentBody is any sequence of ASCII characters that does not contain `*/'. Tab is the ASCII TAB character.

The ambiguities in the token grammar are resolved as follows. The tokenizer interprets the program as a TokenSeq. It scans from left to right, repeatedly matching the longest possible Token beginning with the next unmatched character. The tokens Whitespace and Comment are discarded after matching; other tokens are passed on for parsing by the main grammar. When a string of characters matches both Integer and Id, it is tokenized as Integer. When a string matches both Keyword and Id, it is tokenized as Keyword.

Next: Reserved Identifiers Up: Concrete Syntax Previous: Ambiguity Resolution Contents

Allan Heydon, Roy Levin, Timothy Mann, Yuan Yu