Thread: [Gpsbabel-code] Really stupid patch to gpx.c
Brought to you by:
robertl
From: Ron P. <ro...@pa...> - 2003-12-01 19:29:03
|
Why "really stupid?" Because we shouldn't have to do it. However, in case Robert feels it would be worthwhile, I have written a little patch to gpx.c that tries to find strings of the form � through  and eliminate them from the input stream before Expat gets them. It's probably not perfect - some such sequences can overlap the boundary between reads, and I only mostly handle that situation - but it does make GPSBabel successfully read at least three of my files that are infected with Groundspeak-itis (a condition characterized by odd growths of nonprinting characters on the end of log text.) So, what do y'all say? Should we put pragmatism before perfection, as we did with the stupid EasyGPS parser bug? Should I check this mess in? |
From: Robert L. <rob...@us...> - 2003-12-01 20:03:46
|
Ron Parker wrote: > Why "really stupid?" Because we shouldn't have to do it. I'm sure I've pointed a couple hundred people to co...@ge... to impress upon Jeremy and Elias that this is a real problem (no, I don't honestly believe that all have done so) in the year since we first analyzed it. We're seemingly getting nowhere, we might as well break the ice. It's getting pretty lonely up here on the high road. > have written a little patch to gpx.c that tries to find > strings of the form � through  and eliminate > them from the input stream before Expat gets them. It's So it reduces the window of croakage to that string spanning a MY_CBUF buffer. That works for a WAY larger percentage (though not 100) of users than it will now and the remainder is no worse off than they are now. This presumably isn't good for performance since we now end up scanning the input stream twice, although the second pass surely dwarfs that first pass... If the error text for the remaining cases happens to suggest emailing contact@groundspeak, I wouldn't reject that, either. I might issue extra credit if it actually parsed <email> tags instead of using a constant string. > So, what do y'all say? Should we put pragmatism before > perfection, as we did with the stupid EasyGPS parser bug? > Should I check this mess in? After a year, I suppose it's time to quit tormenting our mutual users since we seemingly can't get the problem fixed upstream. It seems that every program that reads groundspeak GPX has to implement a similarly distasteful hack. (Ron, I trust you will read that in the way it was intended - the hack is that it's necessary at all, not a critique on your unseen patch.) So much for standards and conformance. :-( Go for it, Ron. In fact, if you do it (and Rick gets the rest of geoniche in) I'll spin a 1.2.1 just to be out of this business... RJL |
From: Ron P. <ro...@pa...> - 2003-12-01 20:19:03
|
At 02:03 PM 12/1/2003 -0600, Robert Lipe wrote: >So it reduces the window of croakage to that string spanning a MY_CBUF >buffer. Actually, it does just a little better than that. It reduces the number of characters read on each pass to sizeof(buf)-10, then reads 6 more characters if an & character appears anywhere in the last 6. You can still get the occasional bad reference that crosses the line, say if you had something like ">" and the buffer boundary fell between the 'g' and the 't', or the (sadly, more likely) case where you have more than one bad ref in a row and the first one spans the buffer boundary, but the holdouts should, on average, be reduced by this hack. We could get a better hack that would fix it entirely, but it'd involve very special handling of those last 6 characters. Maybe I'll do something about that in my next bout of free time. >If the error text for the remaining cases happens to suggest emailing >contact@groundspeak, I wouldn't reject that, either. I might issue >extra credit if it actually parsed <email> tags instead of using a >constant string. There is warning text, but it's rather generic. Someone else can get the extra credit if they really want it. :) >Go for it, Ron. Consider it gone for (committed.) |
From: Robert L. <rob...@us...> - 2003-12-02 03:23:55
|
> number of characters read on each pass to sizeof(buf)-10, then reads > 6 more characters if an & character appears anywhere in the last 6. I'm down with that. > There is warning text, but it's rather generic. Someone else > can get the extra credit if they really want it. :) I may tweak the text before cutting 1.2.1. Maybe. > Consider it gone for (committed.) Thank you, Sir. RJL |
From: Ron P. <ro...@pa...> - 2003-12-02 03:25:57
|
At 02:59 PM 12/1/2003 -0600, Robert Lipe wrote: >> number of characters read on each pass to sizeof(buf)-10, then reads >> 6 more characters if an & character appears anywhere in the last 6. > >I'm down with that. I've just committed a better fix that should get *all* of the boundary conditions. Basically, if it finds an & in the last 6 chars, it reads a char at a time until it either reads a ; or it reads 6 chars. Thus, the boundary will always fall at the end of an entity of 6 chars or less. I tested it with a file with 1000 bad characters in a row and it didn't puke. >> There is warning text, but it's rather generic. Someone else >> can get the extra credit if they really want it. :) > >I may tweak the text before cutting 1.2.1. Maybe. Tweaked and committed. "Consider emailing (author) at (email) about bad characters in their GPX output." |
From: Robert L. <rob...@us...> - 2003-12-02 20:59:07
|
> Tweaked and committed. Groovy. Thanx. You get a gold star. To relate this to the thread from last week about non-ASCII locales, does this make us play less well with GPX input that has those escaped characters in places where they ARE legal? Do we have anyone with a pocket query of largely international caches to spot-check this? RJL |
From: Ron P. <ro...@pa...> - 2003-12-02 03:25:01
|
At 03:55 PM 12/1/2003 -0600, Robert Lipe wrote: >To relate this to the thread from last week about non-ASCII locales, >does this make us play less well with GPX input that has those escaped >characters in places where they ARE legal? I don't think so. Those characters (00-1f) are all unprintable control characters, so they should be illegal nearly everywhere, and in particular they won't be used just because it's a non-ASCII locale. Actually, 	, 
, and 
 should be allowed, but I strip them, too; I don't see this being a real problem. |
From: Robert L. <rob...@us...> - 2003-12-05 22:41:17
|
Here's an interesting test of ones problem solving skills. When merging two GPX files like pocket queries (but not our redistributable test case) , GPSBabel 1.2.1 sigsegv's down in the bowels of libexpat for me. The problem is a double free on the email string (once in the rd_deinit of the first file, the second time ont he closing tag in the parse of the second one) that croaks the parser. Oddly, electric fence didn't catch that... Fixed thusly: Index: gpx.c =================================================================== RCS file: /cvsroot/gpsbabel/gpsbabel/gpx.c,v retrieving revision 1.55 diff -p -u -r1.55 gpx.c --- gpx.c 1 Dec 2003 22:13:34 -0000 1.55 +++ gpx.c 5 Dec 2003 22:38:41 -0000 @@ -742,12 +742,15 @@ gpx_rd_deinit(void) { if ( cdatastr ) { xfree(cdatastr); + cdatastr = NULL; } if ( gpx_email ) { xfree(gpx_email); + gpx_email = NULL; } if ( gpx_author ) { xfree(gpx_author); + gpx_author = NULL; } if (fd) { fclose(fd); |