C# 용 Parser Generator에 대해서 몇가지 소개하고자 한다. 모든 글은 Wikipedia로부터 발췌했다.
Coco/R
Coco/R is a compiler generator that takes an L-attributed Extended Backus–Naur Form (EBNF) grammar of a source language and generates a scanner and a parser for that language.
The scanner works as a deterministic finite-state machine. It supports Unicode characters in UTF-8 encoding and can be made case-sensitive or case-insensitive. It can also recognize tokens based on their right-hand-side context. In addition to terminal symbols the scanner can also recognize pragmas, which are tokens that are not part of the syntax but can occur anywhere in the input stream (e.g. compiler directives or end-of-line characters).
The parser uses recursive descent; LL(1) conflicts can be resolved by either a multi-symbol lookahead or by semantic checks. Thus the class of accepted grammars is LL(k) for an arbitrary k. Fuzzy parsing is supported by so-called ANY symbols that match complementary sets of tokens. Semantic actions are written in the same language as the generated scanner and parser. The parser's error handling can be tuned by specifying synchronization points and "weak symbols" in the grammar. Coco/R checks the grammar for completeness, consistency, non-redundancy as well as for LL(1) conflicts.
There are versions of Coco/R for most modern languages (Java, C#, C++, Pascal, Modula-2, Modula-3, Delphi, VB.NET, Python, Ruby and others). The latest versions from the University of Linz are those for C#, Java and C++. For the Java version, there is an Eclipse plug-in and for C#, a Visual Studio plug-in. There are also sample grammars for Java and C#.
Coco/R was originally developed at the University of Linz and is distributed under the terms of a slightly relaxed GNU General Public License.
ANTLR
In computer-based language recognition, ANTLR (pronounced Antler), or ANother Tool for Language Recognition, is a parser generator that uses LL(*) parsing. ANTLR is the successor to the Purdue Compiler Construction Tool Set (PCCTS), first developed in 1989, and is under active development. Its maintainer is professor Terence Parr of the University of San Francisco.
ANTLR takes as input a grammar that specifies a language and generates as output source code for a recognizer for that language. At the moment, ANTLR supports generating code in the programming languages Ada95, ActionScript,C, C#, Java, JavaScript, Objective-C, Perl, Python, and Ruby. A language is specified using a context-free grammar which is expressed using Extended Backus Naur Form EBNF.
ANTLR allows generating parsers, lexers, tree parsers, and combined lexer-parsers. Parsers can automatically generate abstract syntax trees which can be further processed with tree parsers. ANTLR provides a single consistent notation for specifying lexers, parsers, and tree parsers. This is in contrast with other parser/lexer generators and adds greatly to the tool's ease of use.
By default, ANTLR reads a grammar and generates a recognizer for the language defined by the grammar (i.e. a program that reads an input stream and generates an error if the input stream does not conform to the syntax specified by the grammar). If there are no syntax errors, then the default action is to simply exit without printing any message. In order to do something useful with the language, actions can be attached to grammar elements in the grammar. These actions are written in the programming language in which the recognizer is being generated. When the recognizer is being generated, the actions are embedded in the source code of the recognizer at the appropriate points. Actions can be used to build and check symbol tables and to emit instructions in a target language, in the case of a compiler.
As well as lexers and parsers, ANTLR can be used to generate tree parsers. These are recognizers that process abstract syntax trees which can be automatically generated by parsers. These tree parsers are unique to ANTLR and greatly simplify the processing of abstract syntax trees.
ANTLR 3 is free software, published under a three-clause BSD License. Prior versions were released as public domain software.[1]
While ANTLR itself is free, however, the documentation necessary to use it is not. The ANTLR manual is a commercial book, The Definitive ANTLR Reference. Free documentation is limited to a handful of tutorials, code examples, and very basic API listings.
Several plugins have been developed for the Eclipse development environment to support the ANTLR grammar. There is ANTLR Studio, a proprietary product, as well as the ANTLR 2 and 3 plugins for Eclipse hosted on SourceForge.
JavaCC (C#은 아니고 Java용이지만 참고로..)
JavaCC (Java Compiler Compiler) is an open source parser generator and lexical analyzer generator for the Java programming language. JavaCC is similar to yacc in that it generates a parser from a formal grammar written in EBNFnotation, except the output is Java source code. Unlike yacc, however, JavaCC generates top-down parsers, which limits it to the LL(k) class of grammars (in particular, left recursion cannot be used). JavaCC also generates lexical analyzers in a fashion similar to lex. The tree builder that accompanies it, JJTree, constructs its trees from the bottom up.
JavaCC is licensed under a BSD license.
MinosseCC
A lexer/parser generator for C# from JavaCC.
Please refer to the following URL:
http://www.codeproject.com/KB/recipes/minossecc.aspxThe author says that
Originally, I thought of fully porting the source code of JavaCC to have a pure C# project, compiled in .NET/Mono bite code, embeddable in applications written with these frameworks. After making a few attempts, I was unable to figure out how to implement it, the project being self referenced (it requires a previous version of JavaCC to produce the actual parser for the grammars), I decided to move away from this operation plan. Then I started rewriting the original code of the project, modifying the JavaCC grammar and the code where the Java parsers and lexers were generated, to produce C# sources, fully compatible (from version 0.7.1) with both Mono and .NET Frameworks. This helped me create a *.jar application that works quite similar to the original JavaCC.
The Gardens Point Parser Generator (GPPG)
GPGP is a generator for LALR(1) parsers. It accepts a “YACC/BISON-like” input specification and produces a C# output file. The parsers that it produces are thread-safe, with all parser state held within the parser instance.
GPPG parsers are designed to be used with scanners constructed with the Gardens Point Scanner Generator (GPLEX), but have also been used with both hand-written scanners and scanners constructed by other tools. Both GPPG and the parsers which it produces use the generic types defined in C# 2.0.
GPLEX and GPPG are released in open source form under “Free-BSD” style licence arrangements. The distribution is a zip archive which contains executable files, source files, documentation and examples.
The Gardens Point Scanner Generator(GPLEX)
GPLEX is a generator for lexical scanners. It accepts a “LEX-like” input specification and produces a C# output file. The scanners that it produces are thread-safe, with all scanner state held within the scanner instance.
GPLEX scanners are designed to be used with parsers constructed with the Gardens Point Parser Generator (GPPG). Both GPLEX and the scanners which it produces use the generic types defined in C# 2.0.
GPLEX and GPPG are released in open source form under “Free-BSD” style licence arrangements. The distribution is a zip archive which contains executable files, source files, documentation and examples.
Version 1.0 now supports unicode, including surrogate pairs and fallback codepages, and allows multiple input files using yywrap.
Managed Language Tools in Visual Studio 2005 SDK
In the Visual Studio 2005 SDK Version 3, we've included a toolset which should be of value to you if you've ever considered integrating a language into Visual Studio using C#.
In the past, if you wanted to do this it was up to you to "wire-up" your lexer and parser to the Visual Studio language service interfaces or the MPF classes (Microsoft.VisualStudio.Package.LanguageService). Starting with this SDK, we're including tools that will allow you to do this in a much easier fashion.
Toolset
The tools we're including are called MPPG & MPLex (which stand for Managed Package Parser Generator and Managed Package Lexer). They are derivative works from the open-source GPPG/GPLex tools developed at the Queensland University of Technology. In fact, the MP* versions of the tools were also developed by Dr. Wayne Kelly and Prof. John Gough at QUT and share a very similar code base.
Also, Professor Gough was at the Microsoft Campus a few weeks ago at the Lang.NET symposium. I was fortunate enough to chat with him in person for a few minutes, but the Port 25 folks did a two part interview where he talks about compilers, virtual machines, and work on Ruby .NET (a version of Ruby which compiles to IL)
Architecture
I gave a short presentation on these tools at the DevLab, and included the following graphic which should make things a bit clearer:
While not a perfect representation, this should give you a basic idea of how things work together. When you provide a lex/yacc style grammar, the MPPG and MPLex tools will produce a C# lexer & parser for you at build time. These make use of a set of classes we are calling "Managed Babel" which in turn provide your language features to Visual Studio via the MPF.
Samples & Getting Started
In the Visual Studio 2005 SDK Version 3.0, there is one sample which uses these tools called Example.ManagedMyC. It supports the following language service features:
- Error Checking
- Syntax Highlighting (Colorizing)
- Brace Matching
We might include another sample showing off more features in a future version of the Visual Studio SDK.















