Another consideration would be to use a parser combinator framework such as Jparsec. This is a classic Java implementation of a combinator parser exhibiting all the usual OO characteristics. It is also a very large library with many classes, many methods, etc.
While seeking a solution to some of these issues I also sought one that would exhibit some of the strengths of Groovy. The approach described in [Combinator Parser] exploits Groovy closures. Each closure defines a simple combinator parser that can be combined into more elaborate parsers. The parser is then transformed with yet another combinator into a compiler [External Domain Specific Languages]. Writing a parser/compiler is then no more complex than writing assignments that reflect the DSL grammar. For example:Here, the altC combinator parsers alternatives, while the sepByC parses one or more occurences of the first parser separated by the second parser. The sepByC combinator is, in fact, an assembly of the more primitive combinators seqC (a sequence of two parsers) and noneOrMoreC (none or more occurrences of a parser).
def factor = altC(identifier, integer) // factor: identifier integer
def term = sepByC(factor, multiply) // term: factor { '*' factor }*
def expr = sepByC(term, add) // expr: term { '+' term }*
Each combinator is defined as a method that returns a closure (the parser). For example, the satisfyC combinator appears as:
def satisfyC = { pred ->The closure determines if the first item in the input stream satisfies some predicate (a closure with a single parameter, returning a boolean). Observe how the predicate is a 'free' variable in the closure definition and is set by the call to satisfyC. Examples might include:
return { inp -> if(pred(inp[0])) ... success ... else ... fail ... }
}
def isLetter = { ch -> ... is ch a letter? ... }
def isDigit = { ch -> ... is ch a digit? ... }
def letter = satisfyC(isLetter)
def digit = satisfyC(isDigit)
from which it is easy to define:
def identifier = seqC(letter, noneOrMoreC(altC(letter, digit)))
def integer = oneOrMoreC(digit)
Pleasingly, each combinator is only a few lines of Groovy code [ Combinators.groovy]. The complete set is defined in less than 200 LOC! The combinators are generic insofar as the input stream may be a character sequence (a character parser) or a token stream ( a parser).
Currently, I am working to extend the code with support for error reporting. Additionally, I have defined a combinator parser that reads a BNF definition and generates the combinator parser for that grammar - a kind of combinator parser compiler compiler.
I should be pleased to receive feedback from other developers.
Ken Barclay
6 comments:
Hi Ken, nice work!
Here are some comments from my side:
Closures are fine, but is there a real benefit from closure usage instead of members of the Combinators class?
As far as I understand,
static satisfyC = { pred -> ...
is equivalent to the
static satisfyC(pred){...} function
Some thoughts on the gparsec -
You've mentioned the powerful jparsec lib, and the gparsec that you've started is like a groovy equivalent of the jparsec, right?
I've just thought of Groovy nature - Groovy can really benefit from re-using Java libs, so probably a Groovy binding to the JParsec can be a good idea, what do you think?
It would be really great to add some syntax sugar to JParsec by introducing some Groovy classs with operators defined that will build JParsec parsers as the result, like that will be perfect:
def group = '(' >> expression >> ')';
def factor = integer | group;
...
where integer is an instance of the parser that accepts integers and has the rightShift operator defined
Thanks,
Alex
Alex
Thanks for this. Yes, a method is equally valid. My first implementation used methods. I switched to closures, partly to give consistency, but also as objects I can curry them, pass as parameters, put in lists, etc.
I am now working on the implementation to provide better error reporting. It also includes code with overloaded operators for alt(|), seq (+) and (n)oneOrMore (*). This works fine.
Ken
Hi Ken
What's the licence for your linked-to GParsec code?
I want to embed it in some Apache 2.0 licensed code I'm writing. Is that a problem?
Thanks, Gavin
Gavin
There is no licensing associated with gparsec. You are free to use it as-is.
I will shortly issue a new release. This includes the tryC combinator which can be used to handle ambiguous grammars.
Ken
Mr. Barclay,
I purchased your Groovy Programming book and it made mention to the following website: http://www.dcs.napier.ac.uk/~kab/groovy/groovy.html. The websites were to contain working scripts of examples, case studies and end-of-chapter exercises. The site is no longer valid. Any chance I could obtain this information elsewhere?
kelle.hester@ttu.edu
Hi Barclay,
The link to externalDSL.zip seems broken.
Can I found that file anywhere?
Post a Comment