From: Joe W. <jo...@gm...> - 2015-08-29 13:33:39
|
Thanks very much, Adam, for this full and interesting explanation. I can now appreciate everyone's enthusiasm for this pull request! Joe Sent from my iPhone On Aug 29, 2015, at 5:37 AM, Adam Retter <ad...@ex...> wrote: >> Moving to nio.2 improves eXist's performance and scalability in that it >> allows eXist (1) to handle many more simultaneous connections (both file >> system but also network operations) and (2) to treat connections these as >> non-blocking operations. >> >> Is that about right? > > Those can be advantages of NIO if you use async operations and > channels, however we have not made any changes in this area. eXist > already uses channels for its BTree files, for other files we would > see little or no gain for switching. For network stuff, we rely on > Jetty and unfortunately we are still using the Servlet API and our > approach is blocking, ideally that should be replaced. > > >> Also, the pull request mentioned several improvements that, and I have some >> questions about each item (inline): >> >> 1. File moves are now atomic. >> >> Q: as opposed to ___? > > Previously file or directory moves would or could be non-atomic. For > example when moving a file, the file would be copied first and then > the original deleted as a separate operation. With a directory, this > is amplified, e.g. a new directory entry is made, each file is copied > and then the original deleted, and finally the original directory > entry is removed. > > You can probably imagine that if an error occurs or the system is > stopped or crashes during such an operation, then previously you would > have an inconsistency and possible corruption as the operation was not > atomic, e.g. perhaps some of the files were moved or part of the file > was moved. > > >> 2. File copying and data copying is now handled directly by the JVM/OS. >> >> Q: as opposed to being handled by eXist homegrown code? > > Exactly. Previously we would set up a buffer in memory, typically 4KB. > We would then loop though the source file/stream copying 4KB (or less) > at a time into the in-memory buffer, then copying that out of the > in-memory buffer to the destination file. This was a non-optimized and > inefficient operation for many reasons, including: 1) assuming a 4KB > buffer which may be non-optimal, 2) having to make 2 copies of the > data from source to dest, 3) not using faster DMA techniques already > available in NIO.1 since Java 6, e.g. FileChannel.transferTo. > > We now simply call Files.copy, which passes control for the copy > process to the JVM NIO.2, which should choose depending on the OS > choose the most efficient mechanism (hopefully DMA) supported by the > OS. > > >> 3. FileLock flush synchronisation is now guaranteed by the OS. >> >> Q: whereas before eXist's approach to this didn't provide this guarantee? > > Previously we used a java.nio.RandomAccessFile to obtain a FileChannel > and performed a number of operations and then manually called > FileChannel#force, which effectively calls `fsync` on the underlying > OS to ensure the data is flushed to disk. The two issues here were, 1) > you have to remember to call `force` appropriately, hopefully we were > indeed doing that. 2) we were performing several operations before > calling `force`, again a system crash or unexpected error could cause > the system to end up with an inconsistent file and then the lock file > data would be corrupt. > > >> 4. Along the way I also found many Input and Output streams that were not >> closed or could be left open by an exception, these have been fixed. >> >> Q: sounds like these could've led to memory leaks or file corruption? > > It is unlikely that these would have led to memory leaks. Used enough > they could lead to a leak of file descriptors, i.e. using up file > handles from the underlying OS which has a finite limit on these. > Potentially a FileOutputStream that is not closed could result in a > corrupt file: if a number of writes were performed, and neither flush > or close was called, and then the systems stops unexpectedly because > of a crash or some sort of error, then you could have a file where not > all of the data was persisted to disk. > > I recently spent quite some time identifying many of these unclosed > streams in eXist when I introduced the use of try-with-resource and > fixing them. I was actually surprised when I found even more during > this piece of work, somehow these additional ones had escaped my > notice in the past. > > > -- > Adam Retter > > eXist Developer > { United Kingdom } > ad...@ex... > irc://irc.freenode.net/existdb |