Give a tailor a pattern and some scraps of fabric, and you can expect back some clothing that makes you look good. Scan Tailor works in a similar way with scanned pages. Give it raw scans and you get back clean page images that are ready to be assembled into PDF or DjVu files.
Scan Tailor can split two-page scans or cut off parts of a neighboring page from a single-page scan. It corrects page skew and aligns pages with each other. It removes the shadow around a book’s spine and increases the contrast of content in that area. It can convert text to black and white while leaving gray-scale or color images intact. It will even automatically detect those images for you.
About the only thing Scan Tailor doesn’t do, outside of optical character recognition, is correct the geometric distortions you get when capturing a document with a camera rather than a scanner – but its developer says that functionality is coming.
Scan Tailor beats the features offered with applications bundled with most scanners. “In fact,” says developer Joseph Artsimovich, “I always advise people to turn off any post-processing features of their scanning software if they are going to use Scan Tailor for post-processing. From Scan Tailor’s point of view, any ‘improvements’ made by scanning software are considered as damage. For example, contrast enhancement may completely remove the spine of a book, forcing Scan Tailor to guess which parts it needs to cut off.”
Artsimovich, who grew up in the Soviet Union, used to read lots of books – “until I started this project in late 2007” he says with a wink. In today’s Russia, many books are available online for free, in part because foreign works published in USSR before 1974 are not protected by copyright in Russia, and in part because many volunteers spend their time digitizing books. “Having benefited greatly from the wide availability of books online, I wanted to give something back. My initial idea was to digitize one of my own books. That didn’t happen, because I quickly found out that existing post-processing software is so complex and unintuitive that it’s next to impossible to use, even for a technical person like myself. At that time I was looking for some new stuff to work on. That’s how Scan Tailor was born.”
Considering that he is a C++ fan, Artsimovich’s choice to create Scan Tailor using Qt and the Boost libraries was a natural. “People might be surprised by the fact that I don’t use any image processing or computer vision libraries. I did evaluate several of them, but I found flaws in every single one. A common flaw would be non-thread-safe code, which is unacceptable for Scan Tailor, which does use threads. So I ended up porting algorithms from other libraries if they were available, and implementing the missing ones by myself.”
Artsimovich says Scan Tailor’s next major release will bring correction of geometric distortions and improved despeckling functionality. “The former may require a couple more releases to make it handle complex cases. When that’s done, the project will have reached the state when there are no high priority tasks left, and at that point we might declare it a 1.0 release. After that we may start implementing various nice-to-have features and improving performance by rewriting some of the algorithms for OpenCL.”
Artsimovich appreciates help from other contributors. “Coming up with new image processing algorithms is important, as I am not a particularly bright guy. General purpose developers are even more necessary, as there is a huge amount of work needs to be done that doesn’t require any special knowledge.
“Documentation is an area which I don’t touch myself and completely rely on volunteers to work on. It’s too painful for me to write large amounts of anything but code. So far we have decent documentation in Russian, but next to none in English. I’d like that to change.”
The best way to get in touch is to join Scan Tailor’s forums.