net.percederberg.grammatica.parser
Class Tokenizer

java.lang.Object
  extended by net.percederberg.grammatica.parser.Tokenizer

public class Tokenizer
extends java.lang.Object

A character stream tokenizer. This class groups the characters read from the stream together into tokens ("words"). The grouping is controlled by token patterns that contain either a fixed string to search for, or a regular expression. If the stream of characters don't match any of the token patterns, a parse exception is thrown.


Field Summary
protected  boolean ignoreCase
          The ignore character case flag.
 
Constructor Summary
Tokenizer(java.io.Reader input)
          Creates a new case-sensitive tokenizer for the specified input stream.
Tokenizer(java.io.Reader input, boolean ignoreCase)
          Creates a new tokenizer for the specified input stream.
 
Method Summary
 void addPattern(TokenPattern pattern)
          Adds a new token pattern to the tokenizer.
 int getCurrentColumn()
          Returns the current column number.
 int getCurrentLine()
          Returns the current line number.
 java.lang.String getPatternDescription(int id)
          Returns a description of the token pattern with the specified id.
 boolean getUseTokenList()
          Checks if the token list feature is used.
protected  Token newToken(TokenPattern pattern, java.lang.String image, int line, int column)
          Factory method for creating a new token.
 Token next()
          Finds the next token on the stream.
 void reset(java.io.Reader input)
          Resets this tokenizer for usage with another input stream.
 void setUseTokenList(boolean useTokenList)
          Sets the token list feature flag.
 java.lang.String toString()
          Returns a string representation of this object.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

ignoreCase

protected boolean ignoreCase
The ignore character case flag.

Constructor Detail

Tokenizer

public Tokenizer(java.io.Reader input)
Creates a new case-sensitive tokenizer for the specified input stream.

Parameters:
input - the input stream to read

Tokenizer

public Tokenizer(java.io.Reader input,
                 boolean ignoreCase)
Creates a new tokenizer for the specified input stream. The tokenizer can be set to process tokens either in case-sensitive or case-insensitive mode.

Parameters:
input - the input stream to read
ignoreCase - the character case ignore flag
Since:
1.5
Method Detail

getUseTokenList

public boolean getUseTokenList()
Checks if the token list feature is used. The token list feature makes all tokens (including ignored tokens) link to each other in a linked list. By default the token list feature is not used.

Returns:
true if the token list feature is used, or false otherwise
Since:
1.4
See Also:
setUseTokenList(boolean), Token.getPreviousToken(), Token.getNextToken()

setUseTokenList

public void setUseTokenList(boolean useTokenList)
Sets the token list feature flag. The token list feature makes all tokens (including ignored tokens) link to each other in a linked list when active. By default the token list feature is not used.

Parameters:
useTokenList - the token list feature flag
Since:
1.4
See Also:
getUseTokenList(), Token.getPreviousToken(), Token.getNextToken()

getPatternDescription

public java.lang.String getPatternDescription(int id)
Returns a description of the token pattern with the specified id.

Parameters:
id - the token pattern id
Returns:
the token pattern description, or null if not present

getCurrentLine

public int getCurrentLine()
Returns the current line number. This number will be the line number of the next token returned.

Returns:
the current line number

getCurrentColumn

public int getCurrentColumn()
Returns the current column number. This number will be the column number of the next token returned.

Returns:
the current column number

addPattern

public void addPattern(TokenPattern pattern)
                throws ParserCreationException
Adds a new token pattern to the tokenizer. The pattern will be added last in the list, choosing a previous token pattern in case two matches the same string.

Parameters:
pattern - the pattern to add
Throws:
ParserCreationException - if the pattern couldn't be added to the tokenizer

reset

public void reset(java.io.Reader input)
Resets this tokenizer for usage with another input stream. This method will clear all the internal state in the tokenizer as well as close the previous input stream. It is normally called in order to reuse a parser and tokenizer pair with multiple input streams, thereby avoiding the cost of re-analyzing the grammar structures.

Parameters:
input - the new input stream to read
Since:
1.5
See Also:
Parser.reset(Reader)

next

public Token next()
           throws ParseException
Finds the next token on the stream. This method will return null when end of file has been reached. It will return a parse exception if no token matched the input stream, or if a token pattern with the error flag set matched. Any tokens matching a token pattern with the ignore flag set will be silently ignored and the next token will be returned.

Returns:
the next token found, or null if end of file was encountered
Throws:
ParseException - if the input stream couldn't be read or parsed correctly

newToken

protected Token newToken(TokenPattern pattern,
                         java.lang.String image,
                         int line,
                         int column)
Factory method for creating a new token. This method can be overridden to provide other token implementations than the default one.

Parameters:
pattern - the token pattern
image - the token image (i.e. characters)
line - the line number of the first character
column - the column number of the first character
Returns:
the token created
Since:
1.5

toString

public java.lang.String toString()
Returns a string representation of this object. The returned string will contain the details of all the token patterns contained in this tokenizer.

Overrides:
toString in class java.lang.Object
Returns:
a detailed string representation