In Windows, I have a compressed folder a.zip which contains a compressed folder b.zip which contains a file c.dat.
The command
7z e C:\Data -oC:\Data ".dat" -r -y
does not retrieve c.dat since it can't recurse through compressed folders. I have to first unzip a.zip and only then run the command.
The problem is I'm working in a much more complicated directory structure with lots of layers of compressed folders. I need a command or script that will identify the locations of all the .dat files and do the necessary unzipping to get them.
A tool that lets me view the location of all .dat files without doing any unziping would also be ok.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
7-Zip cannot do that itself. You may write a batch file that extracts and lists nested archives recursively. Writing your own software will be more efficient, because you can extract archives directly to memory (see also the discussion of a similar problem). However, in general, you will have to extract at least the catalog of each nested archive.
If you want to search .dat files regularly, it would be better to create an index beforehand and put it on the topmost level of the tree. You will still need to extract all archives down the tree (with possible in-memory optimizations if you write your own software), but the branches without .dat files will be effectively eliminated.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Unfortunately, no, I don't (and I suggested to create such an index before creating the complex archive structure). However, you can create such an index yourself by writing a batch file (or a program) as suggested above. By the way, is your work of unzipping .dat files needed to be done once, multiple times on the same archive, once on each of different archives, or multiple times on different archives? The optimal strategy for you depends on the answer.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I see what you're saying. The problem is that we're not compressing the files ourselves. We have no control over how we're getting the files. We have a recursive solution that just brute forces it, but it's resource-intensive and outputs much more stuff that we don't need. I've been creating an index afterwards for double checking purposes.
It only needs to be done once for each archive, with one new archive coming each month or so. It would however be nice to get the names of all dats without unzipping just to double check our work and make sure we haven't missed anything.
I noticed that in the 7zip GUI you can dig down into archives and extract a file without extracting the parent archives first - unlike with the command line.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
7-Zip silently unpacks all nested archives you enter. Look into your temporary folder - you will see a directory named 7zblahblah and the archive you just entered. As I've told above, it is possible to unpack a nested archive into memory (and thus spare some I/O bandwidth), but unpacking is still necessary to get the list of files within it. The only exception is an archive stored with no compression - you can read it directly from its parent.
If you don't want to develop your own software, it is sufficient to use your brute-force approach. If you want to double-check results, then you should create the index in the first pass - the second pass may be optimized by using the index.
A question to Igor Pavlov: do you set FILE_ATTRIBUTE_TEMPORARY when extracting a nested archive to be opened? This may increase speed if enough RAM is available for the cache.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
In Windows, I have a compressed folder a.zip which contains a compressed folder b.zip which contains a file c.dat.
The command
7z e C:\Data -oC:\Data ".dat" -r -y
does not retrieve c.dat since it can't recurse through compressed folders. I have to first unzip a.zip and only then run the command.
The problem is I'm working in a much more complicated directory structure with lots of layers of compressed folders. I need a command or script that will identify the locations of all the .dat files and do the necessary unzipping to get them.
A tool that lets me view the location of all .dat files without doing any unziping would also be ok.
7-Zip cannot do that itself. You may write a batch file that extracts and lists nested archives recursively. Writing your own software will be more efficient, because you can extract archives directly to memory (see also the discussion of a similar problem). However, in general, you will have to extract at least the catalog of each nested archive.
If you want to search .dat files regularly, it would be better to create an index beforehand and put it on the topmost level of the tree. You will still need to extract all archives down the tree (with possible in-memory optimizations if you write your own software), but the branches without .dat files will be effectively eliminated.
Thanks for the info! Do you know of a tool that will create an index showing full contents of nested archives?
Unfortunately, no, I don't (and I suggested to create such an index before creating the complex archive structure). However, you can create such an index yourself by writing a batch file (or a program) as suggested above. By the way, is your work of unzipping .dat files needed to be done once, multiple times on the same archive, once on each of different archives, or multiple times on different archives? The optimal strategy for you depends on the answer.
I see what you're saying. The problem is that we're not compressing the files ourselves. We have no control over how we're getting the files. We have a recursive solution that just brute forces it, but it's resource-intensive and outputs much more stuff that we don't need. I've been creating an index afterwards for double checking purposes.
It only needs to be done once for each archive, with one new archive coming each month or so. It would however be nice to get the names of all dats without unzipping just to double check our work and make sure we haven't missed anything.
I noticed that in the 7zip GUI you can dig down into archives and extract a file without extracting the parent archives first - unlike with the command line.
7-Zip silently unpacks all nested archives you enter. Look into your temporary folder - you will see a directory named 7zblahblah and the archive you just entered. As I've told above, it is possible to unpack a nested archive into memory (and thus spare some I/O bandwidth), but unpacking is still necessary to get the list of files within it. The only exception is an archive stored with no compression - you can read it directly from its parent.
If you don't want to develop your own software, it is sufficient to use your brute-force approach. If you want to double-check results, then you should create the index in the first pass - the second pass may be optimized by using the index.
A question to Igor Pavlov: do you set FILE_ATTRIBUTE_TEMPORARY when extracting a nested archive to be opened? This may increase speed if enough RAM is available for the cache.