From: Adam R. <ad...@ex...> - 2015-08-29 09:37:59
|
> Moving to nio.2 improves eXist's performance and scalability in that it > allows eXist (1) to handle many more simultaneous connections (both file > system but also network operations) and (2) to treat connections these as > non-blocking operations. > > Is that about right? Those can be advantages of NIO if you use async operations and channels, however we have not made any changes in this area. eXist already uses channels for its BTree files, for other files we would see little or no gain for switching. For network stuff, we rely on Jetty and unfortunately we are still using the Servlet API and our approach is blocking, ideally that should be replaced. > Also, the pull request mentioned several improvements that, and I have some > questions about each item (inline): > > 1. File moves are now atomic. > > Q: as opposed to ___? Previously file or directory moves would or could be non-atomic. For example when moving a file, the file would be copied first and then the original deleted as a separate operation. With a directory, this is amplified, e.g. a new directory entry is made, each file is copied and then the original deleted, and finally the original directory entry is removed. You can probably imagine that if an error occurs or the system is stopped or crashes during such an operation, then previously you would have an inconsistency and possible corruption as the operation was not atomic, e.g. perhaps some of the files were moved or part of the file was moved. > 2. File copying and data copying is now handled directly by the JVM/OS. > > Q: as opposed to being handled by eXist homegrown code? Exactly. Previously we would set up a buffer in memory, typically 4KB. We would then loop though the source file/stream copying 4KB (or less) at a time into the in-memory buffer, then copying that out of the in-memory buffer to the destination file. This was a non-optimized and inefficient operation for many reasons, including: 1) assuming a 4KB buffer which may be non-optimal, 2) having to make 2 copies of the data from source to dest, 3) not using faster DMA techniques already available in NIO.1 since Java 6, e.g. FileChannel.transferTo. We now simply call Files.copy, which passes control for the copy process to the JVM NIO.2, which should choose depending on the OS choose the most efficient mechanism (hopefully DMA) supported by the OS. > 3. FileLock flush synchronisation is now guaranteed by the OS. > > Q: whereas before eXist's approach to this didn't provide this guarantee? Previously we used a java.nio.RandomAccessFile to obtain a FileChannel and performed a number of operations and then manually called FileChannel#force, which effectively calls `fsync` on the underlying OS to ensure the data is flushed to disk. The two issues here were, 1) you have to remember to call `force` appropriately, hopefully we were indeed doing that. 2) we were performing several operations before calling `force`, again a system crash or unexpected error could cause the system to end up with an inconsistent file and then the lock file data would be corrupt. > 4. Along the way I also found many Input and Output streams that were not > closed or could be left open by an exception, these have been fixed. > > Q: sounds like these could've led to memory leaks or file corruption? It is unlikely that these would have led to memory leaks. Used enough they could lead to a leak of file descriptors, i.e. using up file handles from the underlying OS which has a finite limit on these. Potentially a FileOutputStream that is not closed could result in a corrupt file: if a number of writes were performed, and neither flush or close was called, and then the systems stops unexpectedly because of a crash or some sort of error, then you could have a file where not all of the data was persisted to disk. I recently spent quite some time identifying many of these unclosed streams in eXist when I introduced the use of try-with-resource and fixing them. I was actually surprised when I found even more during this piece of work, somehow these additional ones had escaped my notice in the past. -- Adam Retter eXist Developer { United Kingdom } ad...@ex... irc://irc.freenode.net/existdb |