Grammatica Version Information
Version 1.6 (2015-05-17):
-
License modified to revised BSD
The license agreement for this library has been changed
to revised BSD license. This change was made to avoid any confusion
related to usage in commercial (closed source) applications.
-
Fixed minor API issues for reusing parsers
The Parser.reset
method is now available
in a new flavor that allows replacing the Analyzer
being used. Also the Tokenizer
now disposes the
input buffer and references to parsed tokens already when the
end of file is encountered.
Version 1.5 (2009-03-07):
-
License modified to GNU LGPL version 3
The license agreement for this library has been changed
to GNU LGPL,
version 3. This change was made for compatibility reasons
and most users will remain unaffected.
-
Added Visual Basic.NET code generation
A Visual Basic.NET source code generator has been
contributed by Adrian Moore. It uses the same run-time library
(DLL) as the C# .NET version. Bug #8199
-
Added support for case-insensitive parsing
A CASESENSITIVE
grammar declaration has
been added to make it possible to parse in case-insensitive
mode. By default all parsing is case-sensitive. Thanks to
Adrian Moore for providing a partial implementation.
Bug #5060
-
Added support for reusing parsers and tokenizers
A reset()
method has been added to both the
Parser and the Tokenizer classes, allowing to reuse a parser
with a new input stream. By reusing a single parser and
tokenizer for various input files speed and memory gains can
be made, especially for complex grammars.
Bug #4500
-
Added factory methods for creating various objects
New factory methods for creating tokens, productions,
tokenizers and analyzers have been added. These can be overridden
in subclasses to provide more specific classes if desired.
-
Improved the tokenizer processing speed
The tokenizer speed has been improved by using a custom
DFA implementation for regular expressions. The tokenization
phase now runs some 20-30% faster in the general case.
Bug #3603 & Bug #8202
-
Changed to use native Java regular expressions
Since the built-in regular expression library is not used
as frequently after the implementation of DFA regular expression
handling, the native Java regular expression classes are now
used. This implies improved support for advanced regular
expression syntax (at a performance penalty).
Bug #3597 & Bug #17189
-
Changed .NET parser namespace for C# 2.0
The C# parser namespace has been modified from
PerCederberg.Grammatica.Parser
to
PerCederberg.Grammatica.Runtime
due to the new
C# 2.0 compiler strictness. Bug #14302
-
Changed .NET parser API for better integration
The .NET parser API has been modified for almost all
classes, introducing properties and indices instead of getter
and setter methods. The old methods have been deprecated, but
are still available to avoid breaking existing applications.
Bug #8693
-
Changed built-in regular expression API subtly
Due to fixes required to be able to reuse tokenizers,
the built-in regular expression class API had to be changed
subtly. In essence, all the methods can now throw IOException
which breaks a Java compile. Bug #4500
-
Fixed possible divide by zero in profiling
When profiling grammars with small input files, the time
measured could previously be zero (0). This caused a division
by zero exception when calculating the number of tokens parsed
per second. Bug #7998
-
Fixed parse error on some grammar comments
Grammar comments ending with "**/" previously caused a
parse error. The comment token regular expression has now been
corrected to fix this. Bug #12767
-
Fixed premature EOF reported in some cases
The tokenizer previously assumed EOF had been encountered
when a full look-ahead buffer couldn't be read. Thanks to
Jeremy M Stone for finding this. Bug #23818
-
Fixed token match priority to use the grammar order
String tokens were previously always checked and considered
first, unless a regular expression token matched more characters.
This has now been changed to depend on the grammar file ordering
instead. Bug #13009
Version 1.4 (2003-08-27):
-
Added Apache Ant task
An Apache Ant task has been added to the distribution.
See the reference manual for details on how to use it.
Bug #3620
-
Added support for retrieving all tokens
The Tokenizer
class now supports enabling
a useTokenList
feature, that will link all
tokens in a double-linked list. This can be used to access
whitespace and comment tokens, although otherwise ignored by
the parser. By default this feature is off, to avoid potetial
garbage collection problems when parsing large files.
Bug #3605
-
Removed API documentation for unsupported API
Java API documentation for the unsupported API:s in
Grammatica is no longer created or distributed.
Bug #4969
-
Corrected a null pointer error in
--profile
When using --profile
a null pointer
exception was thrown during the parsing of the file. This
was due to the analyzer being set to null, something that has
now been made impossible. Bug #4967
Version 1.3 (2003-07-27):
-
Added C# API documentation
C# API documentation has now been added to the
release documentation. The build file uses CppDoc to
generate the HTML documents. Bug #3612
-
Added source code examples to documentation
Some examples of Java source code for creating a
parser has been added to the reference manual.
Bug #4093
-
Improved and clarified the Analyzer API
The Analyzer class API has been clarified by improving
the generated comments slightly and by adding a
getChildValues()
method. Bug #4181
-
Corrected generation of C# analyzer classes
The generation of C# analyzer classes was completely
broken in all previous releases, due to missing keywords and
erroneous casing of method names. Tests have now been added
to verify the generated analyzers. This was only an issue in
the C# code generation, as the Java version has been tested
continuously. Bug #4416 & Bug #4498
Version 1.2 (2003-06-10):
-
Added profiling for grammars
A new action --profile
has been added that
prints out the statistics from a profiling run of the
grammar. This is useful for improving the grammar and parsing
speed. Bug #3936
-
Improved tokenizer performance
The tokenizer performance has been much improved for
string tokens. The tokenization speed is now about twice as fast
(on average), meaning that the tokenizer will use half as much
time for most input and grammars. Bug #3603
-
Improved the parse tree output on error
The parse tree was previously only printed when a file
could be parsed correctly. Now the parse tree is printed
until the first syntax error is found.
Bug #3930
-
Improved error message for missing '%' character
When missing the final '%' character in the grammar
file, the error message printed was not logical. This has
been improved by printing an 'unterminated directive' error
instead. Bug #3931
-
Improved the reference manual
The reference manual has been improved to the point
where it is considered finished. There is still room for
much improvement, however, and such fixes will added in
later versions. Bug #3598
-
Corrected unexpected token error messages
In some circumstances the unexpected token error
messages didn't list all possible tokens. This only ocurred
for productions containing alternatives with ambiguities that
were resolved with one alternative being set as the default.
This has now been corrected, which should also improve the
error recovery in these cases. Bug #3929
Version 1.1 (2003-05-26):
-
Added automatic error recovery
Automatic error recovery is now always attempted
without any changes to the grammar. Unfortunately the parser
library API had to be changed a bit to make this possible.
Bug #3601
-
Added debug printout of the interpreted grammar
A new action --debug
has been added that
prints out the internal representation of the grammar. This
is useful for debugging, as it also prints the look-ahead
token sets calculated. Bug #3692
-
Improved speed of the look-ahead calculation
Calculating look-ahead sequences longer than four (4)
tokens, could be very slow in previous versions. This was
due to the algorithm being used. A new algorithm has now
been implemented, speeding up the process and allowing
calculation of look-aheads of at least 10 tokens length.
Bug #3674
-
Corrected stack overflow in regular expressions
Regular expressions containing reluctant quantifiers
(such as '*?') caused stack overflow errors when matching
long input strings (about 4096 characters). This has now
been fixed in both the C# and Java versions, and new
automatic test cases have been added.
Bug #3632
-
Corrected failing regular expression match
Tokens with regular expression patterns sometimes
didn't find a proper match. This only occured for expressions
containing quantifiers ('*', '+', '*?' and '+?'), and only
extremely seldom due to a failure to reset previous matches.
This was only ever an issue in the Java version, where it has
now been fixed. Bug #3653
Version 1.0 (2003-05-14):
-
Improved grammar error reporting
Error messages printed by Grammatica have been much
improved in various ways. First, the exception traces are
no longer printed by default. Secondly, the faulting line
is printed on many errors (as done by javac). Thirdly,
ambiguity errors are now reported with a list of conflicting
tokens, making them easier to understand.
-
Corrected parse tree analyzer child handling
A bug in the base Analyzer class caused the child()
methods not to be called when processing a parse tree after
creation (both for C# & Java). When analyzing through
callbacks things did work as expected, though.
Version 0.4 (2003-05-08):
-
Added creation of C# parsers (for .NET)
The parser runtime library has been ported to C#, so it
is now possible to generate parsers for .NET languages. A C#
parser output has also been added to the main program.
-
Added unit tests for generated parsers
Validation of the parsers generated for both C# and
Java has been automated in unit tests. A bug in the previous
release was detected through these tests (see below).
-
Added a first draft reference manual
A first short reference manual has been added to the
documentation. The reference manual is generated to HTML from
XML just as the release notes.
-
Corrected parse exception creation bug
When creating a parse exception, the LL(k) parser in
some situations caused a NullPointerException. This has now
been corrected.
Version 0.3 (2003-05-04):
-
Created XML generated documentation
The release documentation is now available in both text
and HTML format. Both formats are generated with XSLT from a
single source.
-
Added look-ahead loop detection
The look-ahead calculation now detects grammar loops
and all grammar ambiguities. It should no longer be possible
to cause infinite loops with a malformed grammar.
-
Added ambiguity resolution inside alternatives
Some ambiguities inside productions couldn't previously
be resolved by the LL(k) parser. Productions like
["one"] "one" "two"
contain an ambiguity between
the first element and the second, but it is not inherent. This
is now handled correctly by adding look-ahead sets for
individual production elements.
-
Added expected tokens to parse errors
The parse error exceptions now contains a list of the
expected tokens for the unexpected token errors.
-
Added automatic unit tests for the parser
A JUnit class testing the LL(k) parser has been added.
These tests are executed upon every build to verify the
integrity of the parser.
Version 0.2 (2003-04-20):
-
Addition of an internal regular expression library
This improves tokenizer performance by at least 100%
compared to using the GNU RegExp library.
-
Addition of an analyzer framework with parser callbacks
A code generator has also been added to create default
methods for all tokens and productions in the grammar.
-
Major refactorings of the tokenizer and parser classes
Various other classes have also been improved to simplify
future feature additions.
Version 0.1 (2003-03-29):
-
Initial release
The first alpha release. This version was only available
to a limited audience.