Thread: [Skunkdav-dev] Question on filename with special characters
Status: Beta
Brought to you by:
smulloni
From: David S. <ds...@ap...> - 2002-11-08 18:40:04
|
If the file system has the following fullpath /abc/Documents/Alison Krauss & the u#A8ECC.mp3 I see that a DAV PROPFIND on /abc/Documents returns an entry where the filename gets encoded for XML as follows: <D:href>/abc/Documents/Alison%20Krauss%20&%20the%20u%23A8ECC.mp3</ D:href> However, I would expect that the DAVFile getName would has received the decoded file name "Alison Krauss & the u#A8ECC.mp3" But it does not, it instead has the encoded filename "Alison%20Krauss%20&%20the%20u%23A8ECC.mp3" I thought I had a fix for this by using the java.net.URLDecoder.decode() method but I just discovered that method does not help with the & type encodings. So, Q1: Is this a bug with the XML parser? Shouldn't this encoding for the XML transport be transparent to me? Q2: What Java class & method can help me decode all of these possible encodings here?? thanks, -dave |
From: Jacob S. <smu...@br...> - 2002-11-08 20:01:33
|
On Fri, Nov 08, 2002 at 01:38:48PM -0500, David Scheck wrote: > If the file system has the following fullpath > > /abc/Documents/Alison Krauss & the u#A8ECC.mp3 > > I see that a DAV PROPFIND on /abc/Documents returns an entry where the > filename gets encoded for XML as follows: > > <D:href>/abc/Documents/Alison%20Krauss%20&%20the%20u%23A8ECC.mp3</ > D:href> > > However, I would expect that the DAVFile getName would has received the > decoded file name "Alison Krauss & the u#A8ECC.mp3" > But it does not, it instead has the encoded filename > "Alison%20Krauss%20&%20the%20u%23A8ECC.mp3" > > I thought I had a fix for this by using the > java.net.URLDecoder.decode() method but I just discovered that method > does not help with the & type encodings. > > So, > Q1: Is this a bug with the XML parser? Shouldn't this encoding for the > XML transport be transparent to me? Probably it should, for & " < > &apos. Clearly the behavior you are seeing is wrong. I have a vague feeling that somewhere I deal with this symptomatically, but I can't find the spot, and perhaps my memory is misleading me, and I dealt symptomatically somewhere with some other crucial problem :). > Q2: What Java class & method can help me decode all of these possible > encodings here?? It isn't connected at all with URL-encoding; you just need to do string replacements for the entities listed above. There may be code in the Java core that does it in 1.4, but I don't think there was in 1.3 (I have not used Java since before 1.4 came out, and have followed it so little that I don't even know what the install base is for the various versions). If you time to do this yourself before I get to it, I'd suggest starting to patch org.skunk.minixml.XMLParser.getPCData(). Basically, all it needs to do to solve your immediate problem is perform the five string replacements above. To do this efficiently, of course, it shouldn't scan the string five times, but the first thing to do would be put very simple successive string replacements there, to see if it fixed the problem without breaking anything, and then optimize either by adding another stage of lexing in getPCData or augmenting the lexer itself. (Offhand I think that getPCData is the only place you need to worry about these entities, but I'm rushing as usual and I might be wrong.) Oh, and I wouldn't use regular expressions, convenient though it might be, because minixml has no dependencies on any regex package, and that is a good thing. On the other hand, the minixml parser was a hack that I wrote in an afternoon, and you might be better off modifying it so that it could delegate to another parser. It wouldn't be hard to do; probably the parse(Reader, boolean) method would be changed so it could use various lexers, and the lexer would be the part that would use a more conformant parser. That way, the dav library would get the objects it expected to get, and you wouldn't have to be in the business of fixing an XML parser. On the other hand, if small distribution size matters to you, fixing the parser to the extent necessary may be the way to go (and in fact needs to be done regardless). js |