Currently parser have couple of problems, that not come from parser per se, but from underline lexer (StreamTokenizer) we're using.
Differences in current implementation against RFC are as follows:
* Line endings are actually not checked. eg. if one have ics file that have no lines folded, and each line terminated with LF (\n) parser will not complain.
* Empty lines between components are by default skiped, but if empty line contains some other whitespace character like TAB (\t) or FF or SPACE parser will choke (this is probably more as per specs)
* Parameter names can contain illegal characters for names, and parser will not throw exceptions. Exemplo of this is in file /etc/samples/invalid/3.ics that have one calendar property named "X-INVALID(NAME)" this is obviously invalid name because of parentes, but parser is happy about it.
* Quoted strings allow much more escape sequences than defined in rfc. This is by design of StringTokenizer witch unescapes quoted strings like in Java(TM) language, so constructs like \n \t \b will be unescaped by StringTokenizer, and illegal java constructs like \o will cause StringTokenizer to throw exception.
Proposed parser implementation goes arround StringTokenizer quirks, by not using it. I have created new Lexer class that tokenizes stream in the context of content-lines as defined in RFC. By using new lexer, custome unescapement code and carefull use of Reader instance passed in, proposed Parser implementation is more conformant to rfc, and also it performs better. On tests I done on my machine new parser implementation performs roughly 20% better than old.
Also all unit tests pass when new parser implementation is used instead of the old one, except one particular test for lineNumber where exception was thrown. But by looking at that file, it apears that old parser implementation incorrectly incremented line number where exception occured, so I created fix for that also.
Ben, please let me know as soon as you can about future of this code, since if you do not want to include it for 1.0 it sure can be good starting point for 1.1.
NewParserImplementation-ver2
Logged In: YES
user_id=855223
Originator: YES
I just replaced patch with new. In old one couple of methods in lexer were public by accident.
P.S.
This patch also changes CalendarBuilder in a way that is compatible with previous version. Main change is that if one is using CalendarParserImpl or subclass CalendarBuilder will wrapp input stream or reader in UnfoldingReader, but if CalendarParser implementation used do not descends from CalendarParserImpl Reader will be passed to parser unmodified. This is to support PositionReader in new parser.
File Added: NewParserImplementation.patch
Logged In: YES
user_id=14058
Originator: NO
Hi Ivan,
I've had some trouble applying this patch (seems like one in a hundred patches actually work in eclipse), the biggest problem being that PositionReader doesn't import at all (creates an empty file). That aside, I am probably leaning towards inclusion in the 1.1 release, mainly because required changes to the CalendarParser interface (see comments on patch #1655584) will break source compatibility for the current beta releases.
Thanks again for this contribution, and I do hope we can get it integrated shortly.
regards,
ben