Info-ZIP project / Support Requests / #11 epub unzip problem

Peter Koellner - 2012-08-22

example-corrupt-fixed-screenshot.jpg

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Steven Schweda - 2012-08-22

> example-corrupt-fixed-screenshot.jpg

Plain text might have been easier than a picture of plain text, but
that does look like garbage.

> [...] Since extract.c copies the central directory filename over the
> local filename in line 1323, [...]

As usual, information like a source-code line number might be more
useful if you revealed which version of UnZip you were using.

> [...] So I would like to add a modifier to the zip -F fix archive
> option that globally replaces the central directory filename with the
> local one or vice versa, so standard tools working will properly process
> the modified archive.

I'm not an expert in the "zip -F" code, but that sounds possible.
Before doing much work on a new feature like that, it would be nice to
know how the defective/misunderstood archive was created, and whether
the unexpected content has any value.

> [...] did I miss something in the zip format specifications that would
> explain the 'garbage' in the central directory [...]

No, to me, it looks like garbage. Can you ask the people who
provided the archive how it was made, and whether there's some reason
for the apparent defect(s)?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Peter Koellner - 2012-08-22

> Plain text might have been easier than a picture of plain text, but
> that does look like garbage.

Well, highlighting and comparing seemed easier that way.
On the other two instances the truncated file name ends with 0x00 0x44 0x00 0x00 a couple of characters before the original filename ended, Only with the shown one there is some sort of data after that.

>> [...] Since extract.c copies the central directory filename over the
>> local filename in line 1323, [...]
>
>As usual, information like a source-code line number might be more
>useful if you revealed which version of UnZip you were using.

Hmmm... debian stable is not THAT old that it would use any other than the most recent stable release:
UnZip 6.00 of 20 April 2009, by Debian. Original by Info-ZIP.
Debian package version 6.0-4, and the line number is from the latest sources from sourceforge.

> I'm not an expert in the "zip -F" code, but that sounds possible.
> Before doing much work on a new feature like that, it would be nice to
> know how the defective/misunderstood archive was created, and whether
> the unexpected content has any value.

No idea. Adobe Digital Edition manages to open the file, but complains about "minor errors". I don't know if that means these things. I guess it might be some sort of home-brewn serial number marker scheme by the publisher or a bug in their publishing tools. The ePub specifications say nothing about such deviations from the container format.

> [...] did I miss something in the zip format specifications that would
> explain the 'garbage' in the central directory [...]

> No, to me, it looks like garbage. Can you ask the people who
> provided the archive how it was made, and whether there's some reason
> for the apparent defect(s)?

Probably not. It was published by Bantam books, I don't know how their packaging process looks like, and my experience with the publishing industry is that they are not very forthcoming when ebook format details are being discussed - even if one might be able to reach someone who actually knows someone with the technical expertise...

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Steven Schweda - 2012-08-23

> Hmmm... debian stable is not THAT old that it would use any other than
> the most recent stable release:
> UnZip 6.00 of 20 April 2009, by Debian. Original by Info-ZIP.
> Debian package version 6.0-4, and the line number is from the latest
> sources from sourceforge.

For the record, we don't track the source code as modified by
everyone else, and "latest" is not a useful description. My copy of the
source for:
UnZip 6.00 of 20 April 2009, by Info-ZIP. [...]
has this statement at line 1243:
zfstrcpy(G.filename, G.pInfo->cfilname);
and nothing relevant at line 1323.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Peter Koellner - 2012-08-23

> UnZip 6.00 of 20 April 2009, by Info-ZIP. [...]
> has this statement at line 1243:
> zfstrcpy(G.filename, G.pInfo->cfilname);
> and nothing relevant at line 1323.

Ok. seems like the https://sourceforge.net/projects/infozip/files/latest/download link does not go to the latest release, but to 6.10beta while debian stable uses 6.0. So, yes, in 6.0 the filename check happens in the block starting with the comment about filename consistency checks at line 1225. Anyway, this would not be the place where to fix the filenames, since that would be in the zip sources, not in unzip.

But I guess it will take a while to get more samples of this type of problem, since I only can check on ebooks I bought. I'll try to contact the publisher, but I am not very optimistic about that.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Peter Koellner - 2012-08-23

Ah, well, after looking at it from a different angle, I guess the problem could be reduced to the following (probably) fixable situation:

If the filename size entry of the central directory entry and the file header are the same but on one of the two copies contains a zero-terminated string shorter than the given size, the shorter string is probably faulty.

I have send a bug report to the retailer where I got the file since it is a bit unclear who actually does the final DRM-armoured epub packaging. There might be a epub packaging tool out there that produces faulty zip containers. Well, if zip should be able to fix this type of problem, it would probably a good idea to check for this type of error and apply a fix. I guess there might be some complications involved with UTF-8 filenames etc., so it might not be that trivial. So I guess unless someone with some experience tells me that it would be a good idea to take a look at the source I won't waste any more time with that.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

epub unzip problem

Group

Searches

Help

#11 epub unzip problem

Discussion