Download Latest Version Scanned PDFs, now with tables of contents source code.tar.gz (9.9 MB)
Email in envelope

Get an email when there's a new version of paip-lisp

Home / v1.2
Name Modified Size InfoDownloads / Week
Parent folder
PAIP-tesseract-300dpi-bw.pdf 2022-04-19 24.6 MB
A better scan source code.tar.gz 2022-04-18 9.9 MB
A better scan source code.zip 2022-04-18 10.1 MB
README.md 2022-04-18 1.3 kB
Totals: 4 Items   44.7 MB 0

About this copy

This is a scanned copy of the 4th printing, 1998. It's shared for reading, and for improving the Markdown copy in our Github repo.

How it was made

@pronoiac had the spine / binding removed and fed the pages through a scanner. Steps and software used:

  • scanner gave 600dpi grayscale, as 3.6 gigabytes of png files
  • Scantailor Advanced (in Docker): deskew the pages and render the pages as 300dpi black and white (1-bit) tiffs - 30 megabytes
  • tiff2pdf and pdfunite: turn those many tiffs into one pdf
  • OCRmyPDF: OCR with Tesseract, add title and author to the pdf, apply lossless JBIG2 compression - 24 megabytes

Other notes

  • It’s higher resolution, though an older printing (4th printing, 1998) than the previous scan (6th printing, 2001).
  • OCR is better than the previous scan - searching for keywords or phrases usually works
  • why not the grayscale PNGs: space constraints on Github releases, and dubious value for space
  • ebooks from the Markdown version are getting closer
  • see [#137] for some of the thoughts behind this release
Source: README.md, updated 2022-04-18