From: Michael W. <wes...@ja...> - 2020-11-23 02:50:32
|
Hi David, All files with UTF-8 names are put into my eXist instances in one of three ways: 1. WebDAV -- oXygen and other WebDAV clients, dragging and dropping from the MacOS Finder to the client. Note: I don't use the MacOS built in WebDAV client as it inserts all kinds of junk into the database when I do. 2. Restore -- I use the following restore script from the command line in the eXist backup directory I want to restore: export JAVA_OPTIONS="-Xms256M -Xmx512M -Dfile.encoding=UTF-8" export EXIST_HOME=~/exist ${JAVA_HOME}/bin/java ${JAVA_OPTIONS} -Dexist.home=$EXIST_HOME -jar $EXIST_HOME/start.jar backup -r `pwd`/__contents__.xml -u admin 3. XQuery -- The origin of most of the XML files in my database are placed there through some sort of XQuery process, concluding in a call to xmldb:store($col,$name,$page,"text/xml"). $name in this case is the UTF-8 encoded name (such as 松田-宣浩.xml). I *do not* URI encode it, so the kanji are *not* converted to %E6%9D%BE%E7%94%B0-%E5%AE%A3%E6%B5%A9.xml before the call. That leads me to believe that xmldb:store does the URI encoding internally. I know these work, so have stuck with these methods. Do you think that any of these methods could fit into your work flow? 2020年11月23日(月) 6:09 David Birnbaum <dj...@gm...>: > Dear exist-open, > > Thank you, Eduard and Michael, for the quick responses. I verified that > when I launch the eXist-db Java admin client it is launched > with -Dfile.encoding=UTF-8 among the JAVA_OPTS settings. That seems to be > the default; it's part of the client launch script that was installed with > eXist-db. Because I am not able to write my own Java code, my upload > methods are limited to the interfaces that eXist-db provides, such as the > Java admin client and eXide, as well as the access supported through > <oXygen/>. The short version is that only <oXygen/> is able to manage my > files in a natural way, that is, it can upload and access them. Here are > the details about the Java admin client, eXide, <oXygen/>, and the eXist-db > REST interface: > > 1. The Java admin client refuses to upload files with filenames that > include percent signs and square brackets, that is, certain ASCII > characters that have special meaning in URIs. The error message in the Java > admin client is "/Users/djb/repos/cz/pos/verb/test-1a%7%.xml could not be > encoded as a URI". I think this is incorrect because this filename can be > encoded as a URI; for example, I can encode it as a URI with either > encode-for-uri() or, in XProc, p:urify(). When I try to upload the file by > using the file manager component of eXide, there is no error (or other) > message, but the file is not uploaded. > > 2. The Java admin client does upload files with non-ASCII characters in > their filenames, such as "test-1, ё.xml", which contains a Cyrillic > character (also a space, which is ASCII, but requires special handling in > URIs), as long as they do not also contain percent signs or square > brackets. These show up in the Java admin client file listing in their > percent-encoded form (test-1,%20%D1%91.xml) and in the "Manage files" > option in eXide in their string form (test-1, ё.xml). However the eXide > Manage files interface cannot upload these files; as with the preceding > example, there is no error, but the file is not uploaded. Once I get the > file onto the system, if I double-click on it in the Java admin client > listing, it opens. If I try to open it in eXide by clicking inside the File > manager interface, I get an error message: "Failed to load document > /db/apps/cz/xml//by-form/test-1, ё.xml: 400 Bad Request". But if I address > the file inside a doc() function in eXide (e.g., > doc("/db/apps/cz/xml/by-form/test-1,%20%D1%91.xml")) in eXide, it is > accessible. > > 3. <oXygen> can upload and open both types of files. > > 4. When I use the REST API in a browser or with curl at the command line > to show a directory listing ( > http://localhost:8080/exist/rest/db/apps/cz/xml/by-form/), the filename > is listed in its percent-encoded form (test-1,%20%D1%91.xml). But if I add > this filename to the end of the URI in the browser or on the command line > with curl ( > http://localhost:8080/exist/rest/db/apps/cz/xml/by-form/test-1,%20%D1%91.xml), > I get an error report: > > HTTP ERROR 404 Document /db/apps/cz/xml/by-form/test-1,%20?.xml not found > URI: /exist/rest/db/apps/cz/xml/by-form/test-1,%20%D1%91.xml > STATUS: 404 > MESSAGE: Document /db/apps/cz/xml/by-form/test-1,%20?.xml not found > SERVLET: EXistServlet > > > I can work around the limitations in a variety of ways, from using a > different file-naming strategy to managing the files only through > <oXygen/>, but neither of those is consistent with my normal workflow, and > I would also like to understand whether my expectations with respect to > access through the Java admin client, the eXide File manager interface, and > the REST API are realistic, that is, whether the problems reflect > misunderstanding on my part or inconsistent or incorrect behavior from the > eXist-db file management resources. What I expect is that any interface > should be able to manage any file with any legal filename that can be > converted to a URI using encode-for-uri(). As far as I can tell, only > <oXygen/> is able to do this, and the Java admin client, the eXide file > manager, and the eXist-db REST API run into trouble (of various types) with > one or both of these types of filenames. When I look at Eduard's and > Michael's examples in this thread, their filenames contain non-ASCII > characters, but they do not appear to contain percent signs or square > brackets, and I wonder whether perhaps that explains why my filenames are > problematic for me while theirs are not for them. Michael reports no > problems with <oXygen/>, and that was my experience, as well (including > with filenames that contain percent signs and square brackets), so my > issues are with the Java admin client, eXide, and the REST API. > > Best, > > David > > > On Wed, Nov 18, 2020 at 7:26 AM Michael Westbay < > wes...@ja...> wrote: > >> 2020年11月18日(水) 19:59 Eduard Drenth <ed...@fr...>: >> >>> If you standardize everything in the whole chain of >>> components/programs/processes on utf-8 you won't have problems is my >>> experience. Don't forget JAVA_OPTS=-Dfile.encoding=UTF-8. >>> >>> I use XmldbURI.xmldbUriFor(path.toFile().getName()); to store documents >>> in a collection. >>> I use URLDecoder.decode(u, "UTF-8"); to get the original filename. >>> >> >> I second this. I use Japanese and Korean for filenames with no problem >> having everything everything set to UTF-8. It works fine with oXygen and >> Nova editors' file trees this way. >> >> Take care. >> >> -- >> Michael Westbay >> Writer/System Administrator >> http://www.japanesebaseball.com/ >> > -- Michael Westbay Writer/System Administrator http://www.japanesebaseball.com/ |