gorille-announce Mailing List for Gorille
Status: Alpha
Brought to you by:
simonstl
You can subscribe to this list here.
| 2002 |
Jan
(4) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
|---|
|
From: Simon St.L. <sim...@si...> - 2002-01-20 17:26:29
|
I'm happy to announce a fourth release of Gorille. Gorille 0.4 incorporates a code-generator for building compilable rules classes and includes compiled classes for both XML 1.0 and XML 1.1. (Thanks to Niels Peter Strandberg for the suggestion.) Gorille is a small Java package designed to let developers of various kinds of XML processors test the content and names of XML structures in their XML documents to see if they match lists of acceptable Unicode characters, including surrogates. Developers may create customized rules or rely on the rule files and classes provided. I believe that Gorille's functionality is complete at this point, though the code could certainly use more testing and documentation. Future releases will likely focus on added testing, documentation, and improvements in command-line interfaces. More information on Gorille is available from: http://gorille.sourceforge.net The download is available from: http://sourceforge.net/project/showfiles.php?group_id=43446 Thanks for the questions, suggestions, and support I've received so far. Comments and suggestions are still quite welcome. Simon St.Laurent "Every day in every way I'm getting better and better." - Emile Coue |
|
From: Simon St.L. <sim...@si...> - 2002-01-19 02:27:46
|
>Date: Thu, 10 Jan 2002 13:29:04 -0500 >To: xm...@li... >From: "Simon St.Laurent" <sim...@si...> >Subject: ANN: Gorille 0.3 > >Gorille, a Java library for testing XML document parts against the >productions defined in XML 1.0, XML 1.1, or your favorite flavor, has just >reached its third release. > >This release adds support for Unicode surrogate pairs, as permitted by XML >1.1. It also fixed a few glitches in representing characters with values >greater than 0xFFFF, as Java's char primitive has no built-in >understanding of such things. > >Many thanks to Elliotte Rusty Harold and John Cowan for pointing out both >problems and solutions in this field. (Additional thanks to my parents for >getting me the Unicode Standard 3.0 book for Christmas.) > >Surrogate pairs are very tricky critters that seem to me to require >substantially more programming care than any other aspect of Unicode, and >I suspect that developers will be cursing them for a long time to come. > >More information, downloads, CVS, etc. are available at: >http://gorille.sourceforge.net > >The testing I've been able to perform so far is pretty crude stuff. If >anyone with more experience in Unicode or better tools for creating test >documents has time to explore this work, I'd greatly appreciate it. As >XML 1.0 parsers already perform some of this testing, creating tests that >go outside of those bounds and reach gorille (not just the parser) is tricky. > >Also, I'm planning to create a code generator that generates compile-able >rules classes from the XML files for people who are uncomfortable with the >notion of specifying productions in loadable and modifiable XML documents. > >Gorille 0.3 is still an alpha version, but it's improving rapidly. Simon St.Laurent "Every day in every way I'm getting better and better." - Emile Coue |
|
From: Simon St.L. <sim...@si...> - 2002-01-19 02:27:20
|
>Date: Mon, 07 Jan 2002 13:51:10 -0500 >To: xm...@li... >From: "Simon St.Laurent" <sim...@si...> >Subject: ANN: Gorille 0.2 > >I'm happy to announce a second release of the configurable Gorille >XML/Unicode character tester. Like the earlier release, it uses XML-based >configuration files to specify which characters should be permitted in >particular XML contexts. This version adds support for both namespaces >and public identifiers, as well as an experimental SAX filter. > >Gorille is a small Java package designed to let developers of various >kinds of XML processors test the content and names of XML structures in >their XML documents. While Gorille ships with test files for both XML 1.0 >and the draft XML 1.1, you can create your own configuration files as well. > >Gorille uses an XML format to specify lists of characters according to >either XML 1.0 conventions (with its BaseChar, Ideographic, CombiningChar, >Digit, and Extender productions) or XML 1.1 conventions (NameStartChar, >NameChar). Both forms permit specification of the Char and S production >for content characters and whitespace. I've included sample lists for both >XML 1.0 and XML 1.1, as well as an ASCII-only version of XML 1.0. > >Gorille is now hosted at SourceForge, complete with mailing lists and CVS: >http://gorille.sourceforge.net > >I would especially like to hear from developers who can give Gorille more >thorough testing on a wider range of Unicode than I have been able to do >so far. The SAX Filter in particular needs some tire-kicking, as most SAX >parsers already perform the XML 1.0 version of Gorille's tests, making it >difficult to get large numbers of faulty events into Gorille. > >Despite my interests in Unicode and character encoding issues in XML, I >still live in a largely ASCII universe, and no doubt some subtleties have >escaped me. > >Contributions, bug reports, and general comments on the usefulness or lack >thereof of this tool are all quite welcome. Simon St.Laurent "Every day in every way I'm getting better and better." - Emile Coue |
|
From: Simon St.L. <sim...@si...> - 2002-01-19 02:26:55
|
>Date: Sat, 22 Dec 2001 09:20:29 -0500 >To: xm...@li... >From: "Simon St.Laurent" <sim...@si...> >Subject: ANN: Gorille (alpha) > >Gorille [1] is a very simple Java library for testing XML content and >labels against lists of allowable Unicode characters like those provided >in XML 1.0 [2] and XML 1.1 [2]. Gorille is available under the Mozilla >Public License. > >Gorille uses an XML format to specify lists of characters according to >either XML 1.0 conventions (with its BaseChar, Ideographic, CombiningChar, >Digit, and Extender productions) or XML 1.1 conventions (NameStartChar, >NameChar). Both forms permit specification of the Char and S production >for content characters and whitespace. I've included sample lists for >both XML 1.0 and XML 1.1, as well as an ASCII-only version of XML 1.0. > >Gorille performs checking of Name, Names, NMTOKEN, and NMTOKENS, as well >as character checking for any of the productions listed above. This >checking is performed by XML parsers as documents are parsed, but Gorille >may be useful for checking XML documents generated by programs or to >restrict documents to subsets of the characters allowed by XML. Gorille >relies completely on Java's built-in support for Unicode strings and >characters, though it doesn't use any of the Unicode property information >Java provides. > >Gorille does provide for some rather perverse modifications of the >productions - you could, for instance, require that all content be in >control characters while all names be ideographic - but my hope is that >developers will use it in reasonable ways which don't create arbitrary >explosions as programs reject bad information. > >I'll be using Gorille to provide name- and content-checking for MOE [4], >but hope to also create a SAXFilter which uses it and perhaps a Java >FilterReader for preprocessing content before it reaches a parser. > >Gorille is currently in alpha. I believe the basic functionality is >complete, but there's still potential for improvement, expansion, and as >always, better documentation. (Including RDDL documents for the character >list and test files!) > >[1] http://simonstl.com/projects/gorille >[2] http://www.w3.org/TR/REC-xml >[3] http://www.w3.org/TR/xml11/ >[4] http://moe.sourceforge.net Simon St.Laurent "Every day in every way I'm getting better and better." - Emile Coue |