Generator of lexical analyzers in C and C++. Unicode Supported.
The goal of this project is to provide a generator for lexical analyzers of maximum computational efficiency and maximum range of applications. This includes the support for Unicode (UTF8, UTF16, ...) and a large variety of other encodings directly and via nested converters such as ICU(tm) and IConv. Sophisticated buffer handling allows to operate on plain file streams, on sockets, or manually fed buffer content. 'Ready-to-build' examples explain related concepts and facilitate practical applications.
Parser generator, targetting C, C++, Python, JavaScript, JSON and XML
UniCC (UNIversal Compiler-Compiler) compiles an augmented grammar definition into a program source code that parses the described grammar. Because UniCC is intended to be target-language independent, it can be configured via template definition files to emit parsers in almost any programming language.
UniCC comes with out of the box support for the programming languages C, C++, Python (both 2.x and 3.x) and JavaScript. Parsers can also be generated into JSON and XML.
...It has unique features like automatic derivation of depth grammar, production of the derivation tree including it's C interface which provides access to the abstract syntax tree, preservation of full source information and pretty printing to facilitate source-source translation, persistence to aid rapid interpreter writing.
For application in contemporary computing environments, it supports unicode, reentrancy and offers thread-safeness.