Another consideration would be to use a parser combinator framework such as Jparsec. This is a classic Java implementation of a combinator parser exhibiting all the usual OO characteristics. It is also a very large library with many classes, many methods, etc.
While seeking a solution to some of these issues I also sought one that would exhibit some of the strengths of Groovy. The approach described in [Combinator Parser] exploits Groovy closures. Each closure defines a simple combinator parser that can be combined into more elaborate parsers. The parser is then transformed with yet another combinator into a compiler [External Domain Specific Languages]. Writing a parser/compiler is then no more complex than writing assignments that reflect the DSL grammar. For example:Here, the altC combinator parsers alternatives, while the sepByC parses one or more occurences of the first parser separated by the second parser. The sepByC combinator is, in fact, an assembly of the more primitive combinators seqC (a sequence of two parsers) and noneOrMoreC (none or more occurrences of a parser).
def factor = altC(identifier, integer) // factor: identifier integer
def term = sepByC(factor, multiply) // term: factor { '*' factor }*
def expr = sepByC(term, add) // expr: term { '+' term }*
Each combinator is defined as a method that returns a closure (the parser). For example, the satisfyC combinator appears as:
def satisfyC = { pred ->The closure determines if the first item in the input stream satisfies some predicate (a closure with a single parameter, returning a boolean). Observe how the predicate is a 'free' variable in the closure definition and is set by the call to satisfyC. Examples might include:
return { inp -> if(pred(inp[0])) ... success ... else ... fail ... }
}
def isLetter = { ch -> ... is ch a letter? ... }
def isDigit = { ch -> ... is ch a digit? ... }
def letter = satisfyC(isLetter)
def digit = satisfyC(isDigit)
from which it is easy to define:
def identifier = seqC(letter, noneOrMoreC(altC(letter, digit)))
def integer = oneOrMoreC(digit)
Pleasingly, each combinator is only a few lines of Groovy code [ Combinators.groovy]. The complete set is defined in less than 200 LOC! The combinators are generic insofar as the input stream may be a character sequence (a character parser) or a token stream ( a parser).
Currently, I am working to extend the code with support for error reporting. Additionally, I have defined a combinator parser that reads a BNF definition and generates the combinator parser for that grammar - a kind of combinator parser compiler compiler.
I should be pleased to receive feedback from other developers.
Ken Barclay