From: Markus M. <me...@me...> - 2005-04-14 09:13:41
|
Hi, in pyx/epsfile.py, EPS files are opened in 'r' mode. This has two results: - When (e.g.) a Mac EPS file is opened on Windows, the bounding box cannot be read, as the line endings are not correctly parsed - Even if the bounding box could be read, the result of inserting this EPS file into a canvas would be inconsistent line endings in the output file. Of course, the solution would be to open these files in universal newline mode ('rU'), but this would destroy binary EPS files. So I propose to do the following: 1. When the bounding box is read, always open the file in mode 'rU'. Even if it were a binary file, the DSC comments will be in line-oriented format and so this mode is sufficient to parse the bounding box information. This change is backwards-compatible with the 'r' mode as 'rU' is a superset of 'r'. 2. To the epsfile constructor, add a new optional parameter 'istextfile' which defaults to False (old behaviour). Remeber this parameter. In outputPS, when the 'istextfile' parameter is True, use 'rU' mode to read the file, otherwise 'r'. This will implement the following behaviour: When the user knows that the EPS file in question is a textfile, but may have line endings from another platform, she uses 'istextfile=1'. When the EPS file in question may be a binary file, the user can use 'istextfile=0' (the default), which has the drawback that line endings may be inconsistent in the output file. The bounding box is correctly read in every case. Comments? Markus |
From: Andre W. <wo...@us...> - 2005-04-14 10:54:25
|
Hi Markus, On 14.04.05, Markus Meyer wrote: > in pyx/epsfile.py, EPS files are opened in 'r' mode. This has two results: > > - When (e.g.) a Mac EPS file is opened on Windows, the bounding box > cannot be read, as the line endings are not correctly parsed I stepped into this bug myself a few times already, but my workaround was to "fix" the PostScript files. ;-) None of our users complained about it before. But you're totally right, this is bad and wrong and whatever else ... > - Even if the bounding box could be read, the result of inserting this > EPS file into a canvas would be inconsistent line endings in the output > file. Right, but I would not mind too much. In some sense (regarding binary data) it would be best to keep the data as it is. OTOH, proper EPS-files *must* contain DSC-commands surrounding any binary data. Hence in principle there is no problem to handle this special case correctly as well and to translate the line endings outside the binary regions. > Of course, the solution would be to open these files in universal > newline mode ('rU'), but this would destroy binary EPS files. So I > propose to do the following: > > 1. When the bounding box is read, always open the file in mode 'rU'. > Even if it were a binary file, the DSC comments will be in line-oriented > format and so this mode is sufficient to parse the bounding box > information. This change is backwards-compatible with the 'r' mode as > 'rU' is a superset of 'r'. We could think of using this mode for parsing all DCS-commands, but this will fail in rare cases! The DSC commands surrounding binary data give you a number of bytes or a number of lines (there are different possibilities to mark the binary region). This numbers might get wrong when reading in "rU" mode! The proper way would be to read everything in binary mode and properly handle the line endings ourselfs. However I'm not sure about the drawbacks in terms of execution speed. Its easy to insert a bottleneck here ... ;-) > 2. To the epsfile constructor, add a new optional parameter 'istextfile' > which defaults to False (old behaviour). Remeber this parameter. In > outputPS, when the 'istextfile' parameter is True, use 'rU' mode to read > the file, otherwise 'r'. This will implement the following behaviour: > When the user knows that the EPS file in question is a textfile, but may > have line endings from another platform, she uses 'istextfile=1'. When > the EPS file in question may be a binary file, the user can use > 'istextfile=0' (the default), which has the drawback that line endings > may be inconsistent in the output file. The bounding box is correctly > read in every case. I would not like to see an istextfile flag, since it is the wrong thing to do. We might have a "unifylineendings"-flag, if this is really important to unify them. The unifylineendings-flag could recode the line endings for all non-binary areas, skipping the (number-of-bytes) binary areas. (I thing the number-of-line binary areas might still be transformed to different line endings, but I have to read some specs about it -- and maybe it will not even be clear out of the specs.) By default unifylineendings could be disabled making no troubles for any EPS file (without proper binary data markers), since it would not touch any line endings ... For the moment I would suggest to properly implement the DSC parsing and keep all line endings as they are. Later on we could still add a unifylineendings=True functionality ... André -- by _ _ _ Dr. André Wobst / \ \ / ) wo...@us..., http://www.wobsta.de/ / _ \ \/\/ / PyX - High quality PostScript figures with Python & TeX (_/ \_)_/\_/ visit http://pyx.sourceforge.net/ |
From: Andre W. <wo...@us...> - 2005-04-14 15:45:10
Attachments:
epsfile.patch
|
Hi, On 14.04.05, Andre Wobst wrote: > The proper way would be to read everything > in binary mode and properly handle the line endings ourselfs. I've checked in an appropriate fix. The enclosed patch generated from the CVS HEAD should by applyable to recent versions of PyX as well. I'm not going into the replace-end-of-line-marker business (to unify them) for the moment. I don't expect it to be crucial to anybody. PyX creates readable PostScript, but don't expect that to be true for other, included material. It stays as it is ... André -- by _ _ _ Dr. André Wobst / \ \ / ) wo...@us..., http://www.wobsta.de/ / _ \ \/\/ / PyX - High quality PostScript figures with Python & TeX (_/ \_)_/\_/ visit http://pyx.sourceforge.net/ |