From: David B. <dj...@gm...> - 2020-11-23 17:05:44
|
Dear All, Thanks again to those who have responded quickly and helpfully. To answer Peter's question, the <oXygen/> interface is through WebDAV, and it works for me, as it does for Michael. While I'm unhappy to hear that the problems I reported with the Java admin client and eXide were bugs (user error, although embarrassing, is easier to fix), that information tells me that I should stop trying to get those interfaces to work for my purposes, and fall back on a work-around. Michael's most recent posting segues into a more general question about workflow, that is, the three methods he mentions invite me to consider why I've been using the Java admin client to upload files instead of an alternative. In case that larger context is helpful, whether for filename management or just for considering eXist-db development in general: I've tried a lot of different approaches to developing for eXist-db, including eXide, <oXygen/>, Atom, and VS Code, and I've settled recently on VS Code (with the eXist-db module). My reason is that I find it most natural to work in a Git repo that lives on my file system with automatic sync to my local eXist-db installation. When the parts are all working at their best, I find this approach approach simpler than working in a local Git repo and having to upload manually to my local eXist-db installation to test changes as I implement them, and I find it simpler than working directly inside eXist-db (whether with eXide or with <oXygen/>) because the synchronization with VS Code is automatic, so I can avoid having to sync the app within eXist-db with the local repo explicitly. In this way the files on the file system and the resources inside my local eXist-db are automatically kept in sync. Atom has similar support, and I switched from that to VS Code because Wolfgang mentioned in the eXist-db Slack channel that he had made that change in his own workflow, and that inspired me to try out VS Code. The workflow above does not do everything I need. Specifically, it has the same problem with filenames that have percent signs and square brackets as the Java admin client, so that when I create a file with a name that contains those characters in my file system, the automatic synchronization with my running eXist-db instance fails. Additionally, the automatic synchronization seems to have a file-size limit, which the Java admin client doesn't have, so in the case of files that 1) have acceptable filenames and 2) are too large for automatic sync, I have been using the Java admiin client to upload them. Some of those files are created in a different repo (they are part of a different project that I am reusing in my eXist-db app), and if I have understood correctly, I can't use xmldb:store() to store files from the file system into eXist-db. If my understanding of this detail is correct, that would preclude Michael's third approach. I'm working with individual files during development, rather than a restore operation, so Michael's second approach also doesn't match my situation closely. What I can do, though, is consistent with Michael's first approach: use <oXygen/> to manage the uploads of files that the Java admin client (or automatic VS Code sync) refuses to accept. This is more awkward than having the automatic sync work with those files, and also more awkward than using the Java admin client because it means opening the eXist-db instance inside <oXygen/> not to work with the files, as one usually does in <oXygen/>, but just to manage uploads. But it will get the job done, and I prefer it to dumbing down my self-documenting filenames, which was the only other potential workaround I had identified. Best, David On Sun, Nov 22, 2020 at 9:50 PM Michael Westbay < wes...@ja...> wrote: > Hi David, > > All files with UTF-8 names are put into my eXist instances in one of three > ways: > > 1. WebDAV -- oXygen and other WebDAV clients, dragging and dropping > from the MacOS Finder to the client. > Note: I don't use the MacOS built in WebDAV client as it inserts all > kinds of junk into the database when I do. > > 2. Restore -- I use the following restore script from the command line > in the eXist backup directory I want to restore: > > export JAVA_OPTIONS="-Xms256M -Xmx512M -Dfile.encoding=UTF-8" > export EXIST_HOME=~/exist > ${JAVA_HOME}/bin/java ${JAVA_OPTIONS} -Dexist.home=$EXIST_HOME -jar > $EXIST_HOME/start.jar backup -r `pwd`/__contents__.xml -u admin > > 3. XQuery -- The origin of most of the XML files in my database are > placed there through some sort of XQuery process, concluding in a call to > xmldb:store($col,$name,$page,"text/xml"). $name in this case is the UTF-8 > encoded name (such as 松田-宣浩.xml). I *do not* URI encode it, so the > kanji are *not* converted to %E6%9D%BE%E7%94%B0-%E5%AE%A3%E6%B5%A9.xml > before the call. That leads me to believe that xmldb:store does the URI > encoding internally. > > I know these work, so have stuck with these methods. Do you think that any > of these methods could fit into your work flow? > > > 2020年11月23日(月) 6:09 David Birnbaum <dj...@gm...>: > >> Dear exist-open, >> >> Thank you, Eduard and Michael, for the quick responses. I verified that >> when I launch the eXist-db Java admin client it is launched >> with -Dfile.encoding=UTF-8 among the JAVA_OPTS settings. That seems to be >> the default; it's part of the client launch script that was installed with >> eXist-db. Because I am not able to write my own Java code, my upload >> methods are limited to the interfaces that eXist-db provides, such as the >> Java admin client and eXide, as well as the access supported through >> <oXygen/>. The short version is that only <oXygen/> is able to manage my >> files in a natural way, that is, it can upload and access them. Here are >> the details about the Java admin client, eXide, <oXygen/>, and the eXist-db >> REST interface: >> >> 1. The Java admin client refuses to upload files with filenames that >> include percent signs and square brackets, that is, certain ASCII >> characters that have special meaning in URIs. The error message in the Java >> admin client is "/Users/djb/repos/cz/pos/verb/test-1a%7%.xml could not be >> encoded as a URI". I think this is incorrect because this filename can be >> encoded as a URI; for example, I can encode it as a URI with either >> encode-for-uri() or, in XProc, p:urify(). When I try to upload the file by >> using the file manager component of eXide, there is no error (or other) >> message, but the file is not uploaded. >> >> 2. The Java admin client does upload files with non-ASCII characters in >> their filenames, such as "test-1, ё.xml", which contains a Cyrillic >> character (also a space, which is ASCII, but requires special handling in >> URIs), as long as they do not also contain percent signs or square >> brackets. These show up in the Java admin client file listing in their >> percent-encoded form (test-1,%20%D1%91.xml) and in the "Manage files" >> option in eXide in their string form (test-1, ё.xml). However the eXide >> Manage files interface cannot upload these files; as with the preceding >> example, there is no error, but the file is not uploaded. Once I get the >> file onto the system, if I double-click on it in the Java admin client >> listing, it opens. If I try to open it in eXide by clicking inside the File >> manager interface, I get an error message: "Failed to load document >> /db/apps/cz/xml//by-form/test-1, ё.xml: 400 Bad Request". But if I address >> the file inside a doc() function in eXide (e.g., >> doc("/db/apps/cz/xml/by-form/test-1,%20%D1%91.xml")) in eXide, it is >> accessible. >> >> 3. <oXygen> can upload and open both types of files. >> >> 4. When I use the REST API in a browser or with curl at the command line >> to show a directory listing ( >> http://localhost:8080/exist/rest/db/apps/cz/xml/by-form/), the filename >> is listed in its percent-encoded form (test-1,%20%D1%91.xml). But if I add >> this filename to the end of the URI in the browser or on the command line >> with curl ( >> http://localhost:8080/exist/rest/db/apps/cz/xml/by-form/test-1,%20%D1%91.xml), >> I get an error report: >> >> HTTP ERROR 404 Document /db/apps/cz/xml/by-form/test-1,%20?.xml not found >> URI: /exist/rest/db/apps/cz/xml/by-form/test-1,%20%D1%91.xml >> STATUS: 404 >> MESSAGE: Document /db/apps/cz/xml/by-form/test-1,%20?.xml not found >> SERVLET: EXistServlet >> >> >> I can work around the limitations in a variety of ways, from using a >> different file-naming strategy to managing the files only through >> <oXygen/>, but neither of those is consistent with my normal workflow, and >> I would also like to understand whether my expectations with respect to >> access through the Java admin client, the eXide File manager interface, and >> the REST API are realistic, that is, whether the problems reflect >> misunderstanding on my part or inconsistent or incorrect behavior from the >> eXist-db file management resources. What I expect is that any interface >> should be able to manage any file with any legal filename that can be >> converted to a URI using encode-for-uri(). As far as I can tell, only >> <oXygen/> is able to do this, and the Java admin client, the eXide file >> manager, and the eXist-db REST API run into trouble (of various types) with >> one or both of these types of filenames. When I look at Eduard's and >> Michael's examples in this thread, their filenames contain non-ASCII >> characters, but they do not appear to contain percent signs or square >> brackets, and I wonder whether perhaps that explains why my filenames are >> problematic for me while theirs are not for them. Michael reports no >> problems with <oXygen/>, and that was my experience, as well (including >> with filenames that contain percent signs and square brackets), so my >> issues are with the Java admin client, eXide, and the REST API. >> >> Best, >> >> David >> >> >> On Wed, Nov 18, 2020 at 7:26 AM Michael Westbay < >> wes...@ja...> wrote: >> >>> 2020年11月18日(水) 19:59 Eduard Drenth <ed...@fr...>: >>> >>>> If you standardize everything in the whole chain of >>>> components/programs/processes on utf-8 you won't have problems is my >>>> experience. Don't forget JAVA_OPTS=-Dfile.encoding=UTF-8. >>>> >>>> I use XmldbURI.xmldbUriFor(path.toFile().getName()); to store >>>> documents in a collection. >>>> I use URLDecoder.decode(u, "UTF-8"); to get the original filename. >>>> >>> >>> I second this. I use Japanese and Korean for filenames with no problem >>> having everything everything set to UTF-8. It works fine with oXygen and >>> Nova editors' file trees this way. >>> >>> Take care. >>> >>> -- >>> Michael Westbay >>> Writer/System Administrator >>> http://www.japanesebaseball.com/ >>> >> > > -- > Michael Westbay > Writer/System Administrator > http://www.japanesebaseball.com/ > |