Tree [dd475a] master gsdjvu_1_6 /
History



File Date Author Commit
att 2005-06-24 leonb leonb [a1641b] Added zoom annotation in gsdjvu.djvu.
.gitignore 2011-02-11 Leon Bottou Leon Bottou [41ac6a] Ignore build dirs.
COPYING 2005-07-08 leonb leonb [053219] Tom's changes.
COPYING.CPL 2005-06-25 leonb leonb [8afaf0] Initial release.
COPYING.GPL 2005-06-25 leonb leonb [8afaf0] Initial release.
README 2009-07-10 leonb leonb [7b7388] Fixed readme file
build-gsdjvu 2013-04-08 Leon Bottou Leon Bottou [dd475a] fix install of scrips gsdjvu and djvudigital
djvudigital 2009-07-10 leonb leonb [5d7ec3] Added script djvudigital in case it does not co...
gdevdjvu.c 2013-04-08 Leon Bottou Leon Bottou [dfe562] eliminate warning
gsdjvu 2005-07-01 leonb leonb [88f775] Works.
gsdjvu.mak 2011-01-25 leonb leonb [5a89e1] Undo last commit (that was wrong).
ps2utf8.ps 2009-03-02 leonb leonb [c77fc5] Fixed comment.

Read Me



GSDJVU contains two specialized drivers for the program GHOSTSCRIPT.  
These drivers process the PS or PDF file and produce an output that 
separates the foreground objects from the background objects.  
This separation can be used to generate DJVU documents.

The separation algorithm is described in the following paper:

* Leon Bottou, Patrick Haffner and Yann LeCun: "Conversion of 
Digital Documents to Multilayer Raster Formats", Proceedings 
of the Sixth International Conference on Document Analysis 
and Recognition, IEEE, September 2001.

An electronic version of this paper can be found at
<http://leon.bottou.com/publications>




LICENSING
---------

Things are not as simple as they should be.
Please read attentively the contents of file COPYING.
Make sure you understand it before building GSDJVU.



BUILDING GSDJVU
---------------

The following instructions have been tested on Linux machines, 
and should work reliably on most Unix based operating systems,
including MacOS X and Cygwin.

1) Download all necessary source files.

    - Create a directory 'BUILD'
    - Download the following files into this directory:
        
        ghostscript-8.64.tar.bz2
        [optional] ghostscript-fonts-std-8.11.tar.gz 
        [optional] jpegsrc.v6b.tar.gz 
        [optional] libpng-1.2.6.tar.gz
        [optional] zlib-1.2.1.tar.gz
      
      These files can be found under 
        <http://ghostscript.com/releases/>

      If you do not provide the fonts file, ghostscript 
      tries to find the fonts in standard places on your system.
      Of course things will break badly if it does not find them.

      If you do not provide any of the last three files,
      jpegsrc, libpng and zlib, the building script will
      attempt to use the system wide libraries.
      This only works if the corresponding development files
      are installed on your system. This usually is better 
      because the resulting program benefits from the latest 
      installed versions (including bug fix and security patches.)

      The ghostscript versions suggested above were current at
      the time gsdjvu was released. You can try to build gsdjvu 
      using newer versions, but do not take it for granted.
      
      Note that the file "ghostscript-fonts-std-8.11.tar.gz"
      is needed for versions of ghostscript earlier than 8.57.


2) Run the shell script 'build-gsdjvu'.
   Use the full pathname.
   Example:
  
   $ /home/donald/gsdjvu/build-gsjvu

   The script unpacks the source files, patches ghostscript
   with the gdevdjvu driver, compiles ghostscript, and
   selects which files should be kept around.
   Please consult the script itself for more details.

   The script install the final gsdjvu files in
   a directory named 'BUILD/INST/gsdjvu'. 
   You can copy this directory wherever you want.
   This directory contains an executable command
   named 'gsdjvu'. The simplest way to use it
   is to create a symbolic link named 'gsdjvu'
   in a directory included in your PATH...

   Examples: 

    - To install gsdjvu in /usr/local/lib/gsdjvu as root.

        # cp -r BUILD/INST/gsdjvu  /usr/local/lib
        # cd /usr/local/bin
        # ln -s ../lib/gsdjvu/gsdjvu gsdjvu

    - To install gsdjvu in your home directory, assuming 
      that your command search path contains "$HOME/bin".

        # cp -r BUILD/INST/gsdjvu $HOME/gsdjvu
        # cd $HOME/bin
        # ln -s ../gsdjvu/gsdjvu gsdjvu




USING GSDJVU
------------

The simplest way to use gsdjvu is the 
script 'djvudigital' that comes with djvulibre.
This script provides for directly converting
a PS or PDF file into a DJVU file.
See the man page djvudigital(1).


It is also possible to directly use gsdjvu directly.
The gsdjvu executable is simply an instance of ghostscript
with two additional drivers named 'djvumask' and 'djvusep'.

* Driver 'djvumask' outputs the djvu segmentation mask as 
  one or several bitonal image in PBM format. Black pixels
  in the bitonal image indicate the foreground.

  The following gsdjvu command processes the pages 
  of 'myfile.ps' and stores the segmentation mask image(s) 
  as PBM files named 'myfile001.pbm', 'myfile002.pbm', etc.  
  Option '-r300' specify a resolution of 300 dots per inche.

  $  gsdjvu -sDEVICE=djvumask -sOutputFile='myfile%03d.pbm' \
       -r300 -dNOPAUSE -dBATCH -f myfile.ps -c quit

* Driver 'djvusep' generates a "separated image file".
  This file describes each page using two images:
   - a high resolution foreground image with a small color 
     palette and a transparent color.
   - a low resolution background image.
  This "separated image file" format is described in the 
  djvulibre man page csepdjvu(1).  

  Example:

  $ gsdjvu -sDEVICE=djvusep -sOutputFile='myfile.sep' \
       -r300 -dNOPAUSE -dBATCH -f myfile.ps -c quit

Besides all ghostscrip options, 
program 'gsdjvu' recignizes specific options
when using the 'djvumask' and 'djvusep' drivers.
The following documentation is copied from the
comments in the driver source code "gdevdjvu.c":

 ------------------------------------------------------------------------ 

   This driver implements two devices:

   * Device "djvumask" outputs the foreground mask as PBM files.

   * Device "djvusep" outputs separation files containing one or several
     pages. Each page contains the foreground encoded as a CRLE image
     (cf. comments on Color RLE image below), possibly followed by a
     subsampled PPM image representing the background, possibly followed by an
     arbitrary number of comment lines starting with character '#'.
     These files should be piped through the back-end encoder "csepdjvu".
   
   The following ghostscript command line options are supported:

   * "-sOutputFile=<string>"  Selects the name of the output file.
                              The usual Ghostscript conventions apply.

   * "-dThreshold=<0..100>"   Selects the separation threshold.  Larger
                              values put more things into the foreground
                              layer.  Default value is 80

   * "-dBgSubsample=<1-12>"   Selects the background subsampling factor.
                              Default value is 3.

   * "-dFgColors=<1-4000>"    Selects the maximal number of colors in the
                              foreground layer. Default value is 256.

   * "-dFgImgColors=<0-4000>" Images with less than the specified number of
                              colors might be moved into the foreground.  
                              Default value is 256.

   * "-dMaxBitmap=<num>"      Specifies the maximal amount of memory used
                              by the banding code when rendering the background
                              layer.  Default is 10000000.

   * "-dAutoHires"            This option enables an algorithm that lets
                              device "djvusep" ignore -dBgSubsample and 
			      generate a full resolution background layer
			      when the foreground is empty.

   * "-dExtractText"          This option causes the generation of sep file 
                              comments describing the textual information.

   * "-dReopenPerPage"        Specifies that the output file must be reopened
                              for each page.  Default is to reopen if the 
			      filename contains a "%d" specification for 
			      the page number.