From: Antoni M. <ant...@gm...> - 2008-07-27 14:51:13
|
Hello Aperturians I've added subcrawlers for tar and bzip2 archives. Did some refactoring that allowed most of the crawling logic to stay within the AbstractArchiverSubcrawler and AbstractCompressorSubCrawler classes. The tests have been refactored too, now most of the meat is in the AbstractSubCrawlerTest class. The actual implementations and test cases have become short and simple. I've taken a look at other compression formats. 7-zip - there is a Java SDK [1] for working with 7zip files, but it uses an approach completely different java.util.zip. A guy named Christopher League wrote a layer on top of that SDK [2] that exposes a LzmaInputStream, but it lacks the getNextEntry() method i.e. it assumes that the 'archive' contains a single file. Either we create our own wrapper from scratch or extend the one from League. I'd like to hear your opinion if you think it's worthwile. Rar - the 7zip folks have written C++ decompression routines. (in the source code of th 7zip windows utility [3]) AFAIK nobody tried to port them into java. It's possible, but will require quite a lot of work (a great opportunity learn what compression is about BTW :-). There's a guy who boasts having written a rar scanning utility [4]. It would give us the structure of the archive but no content. What do you think. Any other ideas,formats,APIs i've missed? Antoni Mylka ant...@gm... [1] http://www.7-zip.org/sdk.html [2] http://contrapunctus.net/league/haques/lzmajio/ [3] http://downloads.sourceforge.net/sevenzip/7z457.tar.bz2 [4] http://www.adarshr.com/papers/raroscope |
From: Christiaan F. <chr...@ad...> - 2008-07-28 12:30:52
|
Antoni Myłka wrote: > Hello Aperturians > > I've added subcrawlers for tar and bzip2 archives. Did some refactoring > that allowed most of the crawling logic to stay within the > AbstractArchiverSubcrawler and AbstractCompressorSubCrawler classes. The > tests have been refactored too, now most of the meat is in the > AbstractSubCrawlerTest class. > > The actual implementations and test cases have become short and simple. Great! I will take a look at this later this week. > I've taken a look at other compression formats. > > 7-zip - there is a Java SDK [1] for working with 7zip files, but it uses > an approach completely different java.util.zip. A guy named Christopher > League wrote a layer on top of that SDK [2] that exposes a > LzmaInputStream, but it lacks the getNextEntry() method i.e. it assumes > that the 'archive' contains a single file. Either we create our own > wrapper from scratch or extend the one from League. I'd like to hear > your opinion if you think it's worthwile. Hard to judge without any idea how long this will take. I'd say: if it's more than a matter of days, don't do it. Most important is to document where you've stopped and what the possible paths of future development are, just as you're doing now but perhaps in more detail. This allows others in the community who really need this to take it from there. > Rar - the 7zip folks have written C++ decompression routines. (in the > source code of th 7zip windows utility [3]) AFAIK nobody tried to port > them into java. It's possible, but will require quite a lot of work (a > great opportunity learn what compression is about BTW :-). There's a guy > who boasts having written a rar scanning utility [4]. It would give us > the structure of the archive but no content. What do you think. At [4] I read: "Since RARoScope has been poised as more of a "RAR scanning library" rather than a "RAR decompressor" for Java, it can't do the decompressing part. However, this is open to change in the near future. Depending on the response I get, I might implement a decompressing logic." Step one would be to send him a response :) On that page I also read "RARoScope doesn't support reading multi-volume archives yet." Are these the rar archives that are split into multiple files (i.e. .rar, .r00, .r01, ...)? Good to know what the limitations are. Also, how are these files now classified by the MIME type identifier? > Any other ideas,formats,APIs i've missed? Here are some more: http://en.wikipedia.org/wiki/List_of_archive_formats :) I think you have covered the most important ones. Mac users may appreciate support for .sit files, no idea whether a Java library exists for it. For the rest, it will depend on actual use cases of people, let's wait for them first. Regards, Chris -- |
From: Antoni M. <ant...@gm...> - 2008-08-18 19:41:35
|
2008/7/28 Christiaan Fluit <chr...@ad...>: > Antoni Myłka wrote: >> Hello Aperturians >> >> I've added subcrawlers for tar and bzip2 archives. Did some refactoring >> that allowed most of the crawling logic to stay within the >> AbstractArchiverSubcrawler and AbstractCompressorSubCrawler classes. The >> tests have been refactored too, now most of the meat is in the >> AbstractSubCrawlerTest class. >> >> The actual implementations and test cases have become short and simple. > ... > > I think you have covered the most important ones. Mac users may > appreciate support for .sit files, no idea whether a Java library exists > for it. For the rest, it will depend on actual use cases of people, > let's wait for them first. > So what do we do with the SF ticket. I'd mark it as closed. The task described in the ticket description has been done. If want additional subcrawlers we could create additional tickets. That would leave us with only two feature requests marked for the 1.2.0.beta release -- Antoni Myłka ant...@gm... |