The token definitions in the grammar file consist of a token
name and a token pattern. The token name must consist of
characters from the set [a-zA-Z0-9_]
and may not
conflict with any other token name, nor with any of the production
names.
The token patterns can be specified either as a string, in
double quotes ("
), or as a regular expression, in
between special delimiters (<<
and
>>
). The regular expression syntax is largely
the one supported by JDK 1.4, as documented in the Java API to the
java.util.regexp.Pattern
class. See the figure below for two example token
definitions.
STRING_TOKEN = "Value" REGEXP_TOKEN = <<.>>
Figure 1. Two example token definitions. The first for a simple verbatim string, and the second for a regular expression.
It is also possible to set an ignore or an error flag on a token definition. The ignore flag is used to signal that the token should be discarded after being read, whereas the error flag is used to cause a parse error whenever the token is found. Two example token declarations using these flags are listed in the figure below.
WHITESPACE = <<[ \t\n\r]+>> %ignore% UNKNOWN_CHAR = <<.>> %error unexpected token%
Figure 3. Two example token definitions with ignore and error flags. The error flag also allows adding a specific error message to the parse error thrown when encountered.