From: Eduard D. <ed...@fr...> - 2020-11-22 22:22:49
|
Dear all, I tested with a file called za%ak.xml, when trying to upload it via eXide db manager nothing happens, no errors show. This confirms one of the things David sees. One for a github issue I think. The java admin client I did not test but I suspect this is also one for an issue. Feels like a close look at how file/document(name) handling is done in the various components of exist would be wise. Regards -----Original Message----- From: David Birnbaum <dj...@gm...<mailto:David%20Birnbaum%20%3cd...@gm...%3e>> To: Michael Westbay <wes...@ja...<mailto:Michael%20Westbay%20%3cw...@ja...%3e>> Cc: Eduard Drenth <ed...@fr...<mailto:Eduard%20Drenth%20%3ce...@fr...%3e>>, exi...@li... <exi...@li...<mailto:%22e...@li...%22%20%3ce...@li...%3e>> Subject: Re: [Exist-open] Filenames with awkward characters in eXist-db? Date: Sun, 22 Nov 2020 16:09:39 -0500 Dear exist-open, Thank you, Eduard and Michael, for the quick responses. I verified that when I launch the eXist-db Java admin client it is launched with -Dfile.encoding=UTF-8 among the JAVA_OPTS settings. That seems to be the default; it's part of the client launch script that was installed with eXist-db. Because I am not able to write my own Java code, my upload methods are limited to the interfaces that eXist-db provides, such as the Java admin client and eXide, as well as the access supported through <oXygen/>. The short version is that only <oXygen/> is able to manage my files in a natural way, that is, it can upload and access them. Here are the details about the Java admin client, eXide, <oXygen/>, and the eXist-db REST interface: 1. The Java admin client refuses to upload files with filenames that include percent signs and square brackets, that is, certain ASCII characters that have special meaning in URIs. The error message in the Java admin client is "/Users/djb/repos/cz/pos/verb/test-1a%7%.xml could not be encoded as a URI". I think this is incorrect because this filename can be encoded as a URI; for example, I can encode it as a URI with either encode-for-uri() or, in XProc, p:urify(). When I try to upload the file by using the file manager component of eXide, there is no error (or other) message, but the file is not uploaded. 2. The Java admin client does upload files with non-ASCII characters in their filenames, such as "test-1, ё.xml", which contains a Cyrillic character (also a space, which is ASCII, but requires special handling in URIs), as long as they do not also contain percent signs or square brackets. These show up in the Java admin client file listing in their percent-encoded form (test-1,%20%D1%91.xml) and in the "Manage files" option in eXide in their string form (test-1, ё.xml). However the eXide Manage files interface cannot upload these files; as with the preceding example, there is no error, but the file is not uploaded. Once I get the file onto the system, if I double-click on it in the Java admin client listing, it opens. If I try to open it in eXide by clicking inside the File manager interface, I get an error message: "Failed to load document /db/apps/cz/xml//by-form/test-1, ё.xml: 400 Bad Request". But if I address the file inside a doc() function in eXide (e.g., doc("/db/apps/cz/xml/by-form/test-1,%20%D1%91.xml")) in eXide, it is accessible. 3. <oXygen> can upload and open both types of files. 4. When I use the REST API in a browser or with curl at the command line to show a directory listing (http://localhost:8080/exist/rest/db/apps/cz/xml/by-form/), the filename is listed in its percent-encoded form (test-1,%20%D1%91.xml). But if I add this filename to the end of the URI in the browser or on the command line with curl (http://localhost:8080/exist/rest/db/apps/cz/xml/by-form/test-1,%20%D1%91.xml), I get an error report: HTTP ERROR 404 Document /db/apps/cz/xml/by-form/test-1,%20?.xml not found URI: /exist/rest/db/apps/cz/xml/by-form/test-1,%20%D1%91.xml STATUS: 404 MESSAGE: Document /db/apps/cz/xml/by-form/test-1,%20?.xml not found SERVLET: EXistServlet I can work around the limitations in a variety of ways, from using a different file-naming strategy to managing the files only through <oXygen/>, but neither of those is consistent with my normal workflow, and I would also like to understand whether my expectations with respect to access through the Java admin client, the eXide File manager interface, and the REST API are realistic, that is, whether the problems reflect misunderstanding on my part or inconsistent or incorrect behavior from the eXist-db file management resources. What I expect is that any interface should be able to manage any file with any legal filename that can be converted to a URI using encode-for-uri(). As far as I can tell, only <oXygen/> is able to do this, and the Java admin client, the eXide file manager, and the eXist-db REST API run into trouble (of various types) with one or both of these types of filenames. When I look at Eduard's and Michael's examples in this thread, their filenames contain non-ASCII characters, but they do not appear to contain percent signs or square brackets, and I wonder whether perhaps that explains why my filenames are problematic for me while theirs are not for them. Michael reports no problems with <oXygen/>, and that was my experience, as well (including with filenames that contain percent signs and square brackets), so my issues are with the Java admin client, eXide, and the REST API. Best, David On Wed, Nov 18, 2020 at 7:26 AM Michael Westbay <wes...@ja...<mailto:wes...@ja...>> wrote: 2020年11月18日(水) 19:59 Eduard Drenth <ed...@fr...<mailto:ed...@fr...>>: If you standardize everything in the whole chain of components/programs/processes on utf-8 you won't have problems is my experience. Don't forget JAVA_OPTS=-Dfile.encoding=UTF-8. I use XmldbURI.xmldbUriFor(path.toFile().getName()); to store documents in a collection. I use URLDecoder.decode(u, "UTF-8"); to get the original filename. I second this. I use Japanese and Korean for filenames with no problem having everything everything set to UTF-8. It works fine with oXygen and Nova editors' file trees this way. Take care. -- Michael Westbay Writer/System Administrator http://www.japanesebaseball.com/ -- Eduard Drenth, Software Architekt ed...@fr...<mailto:ed...@fr...> Doelestrjitte 8 8911 DX Ljouwert +31 58 234 30 47 +31 62 094 34 28 (privé) skype: eduarddrenth https://github.com/eduarddrenth frisian.eu gpg: https://pgp.surfnet.nl/pks/lookup?search=eduarddrenth Op freed bin ik thús/wurkje ik minder |