It can be used either as a class library from hand-written code; as a parser for grammar specifications with hand-written action routines; or as a code generator for the same, creating stand-alone (but easy to read and debug) parsers.
Enough with this equivocation, on with some examples.
The ZIP file paths are all prefixed with the yaccl-1.0 prefix (or whatever the latest release is), so no need to worry about naming collisions.
Before you proceed with this tutorial, you need to configure the software. You need to build shell-scripts to launch the components in the correct manner.
This should require only execution of two commands:
configure bin\setup (or source bin/setup on Unix)
As with most installations, there is plenty that can go wrong. If you see a message such as the following:
C:\yaccl>configure WARNING: Unable to location 'tools.jar' on your path or classpath. WARNING: the 'yacclc' tool will probably fail to compile code. WARNING: java.home=C:\PROGRAM FILES\JAVA\J2RE1.4.0 WARNING: java.class.path=.;d:\Blazix\blazix.jar;classesyou will find that the examples which demonstrate automatic code compilation of grammar derived source will fail to build. In this instance the problem is that the 'java.exe' JVM runtime is from a JRE, not a JDK.
There are two ways to solve this problem:
YACCL is a developer tool, and you will need to have a JDK on your path for things to work properly. It was developed and tested with JDK 1.3.1-b24, the JDK which comes with JBuilder6, and should work properly with any later production-release JDK (avoid the beta's).
Before embarking on the tutorial it's a good idea to compile all the examples as follows:
cd src javac examples/*.java
At this point there are two tracks you can take in this introduction document:
com.simontuffs.yaccl
package:
// A simple four function calculator (+,-,*,/). RDP parser = new RDP(); Terminal sum = parser.new NonTerminal("sum"); Terminal mul = parser.new NonTerminal("mul"); Terminal integer = parser.new Chars('0','9').plus(); sum.match(mul.star(parser.terminal("+").or("-").and(mul))); mul.match(integer.star(parser.terminal("*").or("/").and(integer))); parser.expression("1+2*3-4/5"); System.out.println(parser.parse(sum));Now if you've used parser generators before, you will know that writing eight lines of Java to do this job isn't bad. First instantiate the
RDP
(Recursive Descent Parser) object called parser
, then use it to create two
NonTerminal
objects. NonTerminal
is an inner class of RDP
, as are most of the
useful classes you will come across in this software.
The non-terminals sum
and mul
represent addition/multiplication operations
as we'll see in a minute. The other non-terminal is integer
, which uses the
Chars
character matching class to match one-or-more integer characters.
Note: there is no need to worry about tokenizers in YACCL, tokenizing is performed
as part of the parsing operation, which is a simple but very powerful approach.
If you compile and run this program (in src/example1.java) you should see "true" showing that the parser accepted the expression.
$ cd src $ javac examples/example1.java $ java examples.example1 true
Not the most illuminating result, granted, but you can get a much more interesting view into the operation of the parser by dumping the parse-tree that it built, by adding the code shown below after the call to parser.parse() has been made:
$ java examples.example2 parser.root.dump(); dump: <root#0> dump: <sum#1> dump: <mul#2> dump: 1 dump: + dump: <mul#5> dump: 2 dump: * dump: 3 dump: - dump: <mul#10> dump: 4 dump: / dump: 5Reading this parse-tree, it shows a root node #0, followed by a single sum node, under which are three mul nodes that contain the higher precedence operations, and two terminals + and - which separate them, as expected from the input expression. It is clear that this tree could be evaluated as a stack-based expression:
-(+(1,2*3),4/5)as would be expected from the infix operations in the mathematical expression. (In case you're interested, with integer arithmetic the result is 7, as we will find out shortly).
Again, unlike some other parser-generators, YACCL automatically builds a parse-tree for an expression while it is parsing it. This parse-tree can be used to complete evaluation of the expression by traversing it after parsing is complete, but there is another (and often better) way.
Note: YACCL can also call action routines during intermediate matching, although you need to be careful if your grammar requires backtracking for its evaluation since the result of an action routine might need to be negated if the rule it was fired from was negated by backtracking -- more on this later in the backtracking section.
First, a couple of action routines must be created to handle nodes during the
creation of the grammar. These extend the abstract RDP.Action
class,
and implement the fire(Node)
method. We call them onSum
and onMul
to reflect the production/non-terminal names they are attached
to in the constructors below:
Action onSum = parser.new Action() { public void fire(Node node) { System.out.println("onSum(" + node.flatten() + ")"); } }; Action onMul = parser.new Action() { public void fire(Node node) { System.out.println("onMul(" + node.flatten() + ")"); } }; // A simple four function calculator (+,-,*,/). Terminal sum = parser.new NonTerminal("sum", onSum); Terminal mul = parser.new NonTerminal("mul", onMul);The output from this parser is now slightly more interesting:
$ java examples.example2 onMul(1) onMul(2*3) onMul(4/5) onSum(1+2*3-4/5) true ...As each of the action routines is fired, it prints out a flattened version of the node it is passed. At this point a way to actually evaluate the mathematical expression should suggest itself.
onMul()
we can peform
the multiplication, and replace the node with the result. Actually, all that
has to happen is to replace the node's token with the result, since that is
what the Node.token
is intended for. The following change
(example3.java) shows this.
Action onMul = parser.new Action() { public void fire(Node node) { // Evaluate the expression below here. int result = new Integer(node.tokenAt(0)).intValue(); for (int i=2; i<node.children.size(); i+=2) { int val = new Integer(node.tokenAt(i)).intValue(); if (node.tokenAt(i-1).equals("*")) result *= val; else result /= val; } node.token = "" + result; } };Note the subtleties here: there are potentially multiple children opf
node
,
for example 1*2*3 will result in five children, '1', '*', '2', '*' and '3'.
The resulting tree dump is almost complete now, showing that the mul non-terminals
have been evaluated:
dump: <sum#1> dump: 1 dump: 1 dump: + dump: 6 dump: 2 dump: * dump: 3 dump: - dump: 0 dump: 4 dump: / dump: 5
onSum
, and
the final result will be the token of that node, the value "7" (example4.java):
Action onSum = parser.new Action() { public void fire(Node node) { // Evaluate the expression below here. int result = new Integer(node.tokenAt(0)).intValue(); for (int i=2; i<node.children.size(); i+=2) { int val = new Integer(node.tokenAt(i)).intValue(); if (node.tokenAt(i-1).equals("+")) result += val; else result -= val; } node.token = "" + result; } }; $java examples.example4 dump: <root#0> dump: 7 dump: 1 dump: 1 dump: + dump: 6 dump: 2 dump: * dump: 3 dump: - dump: 0 dump: 4 dump: / dump: 5
RDP
class directly. However, it more convenient (and often less confusing) to use
a textual grammar to build the parser. YACCL supports this through a class called
(not too surprisingly) YACCL
, which takes a textual grammar and
builds an in-memory parser for that grammar.
The simplest way to use YACCL
is to
feed it a textual grammar using the yaccl
script, and let it
build an in-memory parser which can then be used to parse an expression or file.
$ yaccl examples/example5.g
Note: if you get an exception regarding class not found for examples.example5b
,
you need to compile the examples code as per the introduction to this
document:
$ javac examples/*.java
The grammar file in this case is a BNF-like representation of the Java code we wrote in example1.
// This is the grammar file for example5.java. sum: mul (('+'|'-') mul)*; mul: integer (('*'|'/') integer)*; integer: ['0'-'9']+; // Specify the action class. action-class = "examples.example5"; expression = "1+2*3";
The structure of this file should be familiar to anyone who has worked with tools like YACC or JavaCC. This example shows two kinds entities, productions, and controls. Productions are of the form "x: a (b|c);", meaning that production/non-terminal 'x' is matched if the input stream contains 'a' followed by either 'b' or 'c'. Repeat operations include '*', '?' and '+' with the usual meanings of match zero-or-more, zero-or-one, one-or more. Character patterns are indicated by use of square brackets [].
Controls are directives to the parser controlling things such as (in this case) the expression to be parsed, and in this case which action-class to use.
An action-class is a class that contains public fields or methods which
accept a single Node argument and which follow a simple naming convention. The RDP
parser will automatically
invoke an action routine when a production is matched, if it has a name starting
with 'on', and the rest of the name matches the name of the production. So in
our example we need methods called onSum(Node)
and onMul(Node)
.
The code in example5.java is just a reworking of the action routines in example4.java
RDP parser = new RDP(); public Action onSum = parser.new Action() { public void fire(Node node) { // Evaluate the expression below here. int result = new Integer(node.tokenAt(0)).intValue(); for (int i=2; i<node.children.size(); i+=2) { int val = new Integer(node.tokenAt(i)).intValue(); if (node.tokenAt(i-1).equals("+")) result += val; else result -= val; } node.token = "" + result; node.dump(); } }; public Action onMul = parser.new Action() { public void fire(Node node) { // Evaluate the expression below here. int result = new Integer(node.tokenAt(0)).intValue(); for (int i=2; i<node.children.size(); i+=2) { int val = new Integer(node.tokenAt(i)).intValue(); if (node.tokenAt(i-1).equals("*")) result *= val; else result /= val; } node.token = "" + result; } };Pay particular attention to the somewhat strange-looking use of an instance of the RDP class to create inner-class objects which resulted from the cut-and-paste of hand-written code. It is actually possible to simplify the action routines so that they only need be public methods accepting a single Node argument, as shown below (this code is in example5b.java), and to hook it up you need to change the
action-class
specifiation in example5.g to
action-class="examples.example5b"
public class example5b { public void onSum(Node node) { // Evaluate the expression below here. int result = new Integer(node.tokenAt(0)).intValue(); for (int i=2; i<node.children.size(); i+=2) { int val = new Integer(node.tokenAt(i)).intValue(); if (node.tokenAt(i-1).equals("+")) result += val; else result -= val; } node.token = "" + result; node.dump(); } public void onMul(Node node) { // Evaluate the expression below here. int result = new Integer(node.tokenAt(0)).intValue(); for (int i=2; i<node.children.size(); i+=2) { int val = new Integer(node.tokenAt(i)).intValue(); if (node.tokenAt(i-1).equals("*")) result *= val; else result /= val; } node.token = "" + result; } }
yacclc
tool on the sum.g
grammar in the examples directory:
$ cd .. $ yacclc src/examples/sum.gThe grammar is parsed and is (or should be) converted into Java code in the file src/examples/sum/Sum.java. The
Sum
class contains a method called parser
which returns an instance of an RDP
. The action routines are
also compiled into the Sum
class, and the result is compiled into
a stand-alone parser which can be used directly from the command line as follows:
Failures during compilation at this point mean that tools.jar
is
not on your classpath: go back and follow the configuration instructions
at the start of this document.
$ java -classpath ./gen:./jars/yaccl.jar examples.sum.Sum src/examples/sum.txtNote that the choice of ':' or ';' as a classpath separator depends on your platform (Windows or Unix). Cygwin counts as Windows, so use '\;' to escape ';' from the shell.
The entire YACCL grammar is given below
bootstrap: (production | control)+; production: prod action? ':' any ';'; action: '{' literal '}' ; any: or ( '%' or )*; or: and ( '|' and )*; and: ( (item action? | '!'? '(' or ')' {onRp} action? ) repeat)+; item: ( ( '!'? (string | literal | char | range) ) repeat | caction); repeat: ["*+?"] | limit | empty; literal: ['a'-'z' 'A'-'Z' '0'-'9' "_.-$"]+; prod: ['a'-'z' 'A'-'Z' '0'-'9' "_.-$"]+; ctl: ['a'-'z' 'A'-'Z' '0'-'9' "_.-$"]+; string: '"' [' '-'~' ~'"']* '"'; char: ''' ('\\')? [' '-'~'] '''; int: ( "0x" ['0'-'9' 'a'-'f' 'A'-'F']+ | ['0'-'9']+); crange: ('~')? ( (char | int) '-' (char | int) | string | char); range: ('~')? '[' (crange | literal)* (stop)* ']' (["*+?"])?; stop: ('!')? (string | literal); limit: '<' int ('-' (int | "inf"))? '>'; control: ctl '=' (bool | string | literal) ';'; bool: ("true" | "false"); caction: '<' control '>'; linecomment: "//" [' '-'~' ~"\n"]* '\n'; blockcomment: "/*" [' '-'~' " \t\n\r\f" !"*/"]* "*/";
A mere 23 productions, three of which are duplicates to simplify the
implementation of action routines (literal
, prod
and
ctl
).
This code is taken from the file src/grammars/yaccl.g
,
which forms the core of the yaccl
tool (take a look at bin/yaccl
).
By including the yaccl.g
grammar you can easily build new
grammars, which is how the yacclc
compiler works: it uses the file
src/grammars/actions.g
to extend yaccl.g
with a new
section that allows in-line Java action routines to be specified as a class.
The extra productions needed to parse the Java class are shown below:
action-class: "action" '{' class '}' ; class: package? import* class-spec '{' block '}' ; package: nl sp "package" ws name ';'; class-spec: "public" ws "class" ws lit ws ; import: nl sp "import" ws name ".*"? ';' ws ; name: lit ('.' lit)* ; oc: '{' ; cc: '}' ; ws: [" \n"]+ ; sp: [' ']* ; nl: ['\n']? ; block: ( (ws | '.' | name | string | char | oper | oc block cc ) )* ; lit: ['a'-'z' 'A'-'Z' '0'-'9' "_-$"]+; oper: ["+-/*&|%();[]:?!=<>,"]+;In the above production-set, a pass-through parser for the Java language is specified in 13 productions! This allows Java action routines to be placed inside the grammar, and parsed through to the output code with processing of package and class-names.
So I did (IMHO) the right thing, and put full LL(inf) backtracking into YACCL. What does this mean to you as a grammar writer? Simple. Don't worry about it. YACCL will do the right thing, sometimes faster than others :) You can turn on backtracking messages if you're worried about performance, or turn it off if you can get away with an LL(1) parser:
backtrack = false;but really, just forget it and concentrate on writing your grammar. At some future point I may install a LL(k) capability, which is really just a limitation on the decision history maintained by the parser.
Unable to resist the challenge, I decided to implement a recursive-descent parser in Java, and the end result is YACCL. It was both easier and harder than I'd imagined: easier at the beginning, then a lot harder at the end. Typical, enough early success to suck you in, then enough pain to remind you why you don't do this stuff very often.
YACCL was done for the best of all reasons: for the sheer challenge.
Oh, and by the way, thanks for reading this far.
YACCL is released under the terms of a BSD-style license, as found in the header of source-code and grammar files. See bsd-license. In addition, if you use YACCL to develop a product, or embed it in one (especially if you are building a commercial product), I require you to contact me (simon@simontuffs.com) to let me know that you are doing so, and help satisfy my curiosity as to whether this effort was worthwhile. Enjoy!