From: Jimmy Z. <cra...@co...> - 2006-10-03 17:32:53
|
Mark, I will keep your suggestion in mind and see what can be done to accommodate it in the future... Jimmy ----- Original Message ----- From: "Mark Swanson" <ma...@Sc...> To: "Tatu Saloranta" <cow...@ya...> Cc: <vtd...@li...> Sent: Monday, October 02, 2006 10:14 PM Subject: Re: [Vtd-xml-users] 1.6 bug: > Tatu Saloranta wrote: >> --- Mark Swanson <ma...@Sc...> wrote: >> >> ... >>> Well, allow me try to make a stronger case: >>> >>> In the real world, data isn't perfect. One can >>> either toss back illegal >>> data or try your best to work with it. A common best >>> practice is to be >>> as friendly and as considerate as you can to the >>> incoming data, and >>> produce the most accurate and conforming (to >>> whatever standard) outgoing >>> data. >> >> This is common practice for some applications ("be >> conservative at what you send, liberal at what you >> accept"), but notably not with xml processing. > > I think the need for it is greatly diminished, but that's as far as I'll > go. > > <snip> >> Having said that, I would think that if specific >> lenient modes could be enabled (and were disabled >> by default), that might be reasonable. > > Cool. > >> ... >>> 4. (at least some of) VTDs competitors already scrub >>> the data by >>> default. The XPP (Xml Pull Parser) already does >>> this. In fact, I was in >>> the middle of switching away from XPP when I ran >>> into this VTD >>> limitation. For my particular use case, using VTD is >>> now slower than XPP >>> because of this scrubbing issue. >> >> Really? I wouldn't have though xpp would do that, >> since >> I thought it aims to be an actual xml conformant >> parser... >> >> What kind of scrubbing does it do? > > I haven't tested it exhaustively, but I do know that it silently ignores > 0x0c (form feed) because this is where I noticed my old code parsed some > XML properly (contained 0x0c) and my replacement code based on vtd failed. > > >>> A single if{} could allow the pedantic behaviour (as >>> it is currently) or >>> a more friendly and considerate (I would argue more >>> industry standard) >>> behaviour. >> >> Which industries rely on broken xml content being >> processed? (an honest question, no sarcasm intended) > > It wasn't that long ago that some systems used these control characters > and some devices/software are still using them. Some financial systems I > work with today still use FS/GS/STX/ETX, and the 0x0c data is coming > directly from Outlook MAPI data (event descriptions). I'm not sure > exactly how someone is copy/pasting 0x0c characters into the Outlook > description field, but it happened yesterday/today. Also, cell phones > that I work with (I support any SyncML-capable cell phone ever made) > wrap data inside XML (SyncML is an XML protocol). All sorts of control > characters wind up in the XML that have been taken care of through other > scrubbers. I wish I didn't have to do that for the reasons mentioned. > > The problem is real; it's ugly, and I hope VTD will add this optional > feature to help developers deal with it. > > Thanks for listening. > > Cheers. > > -- > http://www.ScheduleWorld.com/tg/ > Free Google Calendar synchronization with Outlook, Evolution, > cell phones, BlackBerry, PalmOS, Exchange, Mozilla, Thunderbird, > Pocket PC/Windows Mobile. Also sync tasks, notes and contacts! > WebDAV, vfreebusy, RSS, LDAP, iCalendar, iTIP, iMIP support. > > ------------------------------------------------------------------------- > Take Surveys. Earn Cash. Influence the Future of IT > Join SourceForge.net's Techsay panel and you'll get the chance to share > your > opinions on IT & business topics through brief surveys -- and earn cash > http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV > _______________________________________________ > Vtd-xml-users mailing list > Vtd...@li... > https://lists.sourceforge.net/lists/listinfo/vtd-xml-users |