Author: Jascha Casadio
Current version: 1.0.20130608
Current dev version: 1.0.20130608
Official Webpage: https://sourceforge.net/projects/pyelib/
Facebook Official Webpage: https://www.facebook.com/pyelib/
Twitter Official Webpage: https://twitter.com/pyELib
pyELib is FREE and OPEN SOURCE and is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License.
pyELib IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED. IN NO EVENT SHALL THE AUTHOR(s) BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. Make a backup of your files before feeding pyELib with them!
pyELib relies on the following packages (and their dependencies!). Be sure to have them all installed on your machine if you want all the features fo pyELib enabled.
+ python 2.6+ (should not work on 3.x)
+ mysql 5+ (be sure the service is running!)
+ djvulibre (python-djvu, djvulibre-bin)
+ tesseract-ocr (tesseract-ocr, tesseract-ocr-eng)
+ poppler 0.20.3+ (python-poppler)
+ libchm-bin 2:0.40+
+ GPL Ghostscript 8.71+
+ python libs: wx, html (you might want to use pip to install them)
How to run pyELib
1. Check the requirements
2. Prepare the Database*
3. $python pyelib.py
*either open the mysql console (mysql -u username -p) and copy/paste the commands found in docs/mysql_setup or load directly the dump of the database (mysql < /path/to/pyelib.dump), which is found inside the db/ directory.
Does pyELib work on Windows?
No, it does not. Even though Python is cross platform, pyELib relies on specific Unix system calls that are not available on Windows.
pyELib seems to go quite slow. Is there anyway to improve its performace?
Dont change the tmp dir unless you have a very good reason to.
Having the books on a removable device (USB) can slow down the process.
OCRd books require a lot of CPU power. If many of them are processed at once, you might experience a slow down.
Don't run too many parallel jobs, anyway.
Some books were renamed with funny/wrong names. What's the problem?
This mainly happen if the information retrieved from the internet is wrong. pyELib has no idea when a title is correct or wrong and assumes Amazon to provide good data.
Are .mobi books gonna be supported by pyELib?
They might be in the future.
Are .lit books gonna be supported by pyELib?
In August 2011, Microsoft announced they were discontinuing both Microsoft Reader and the use of the .lit format for ebooks at the end of August 2012. At present time, I ain't planning to add support for .lit files.
I see pyELib is open source. Can I do whatever I want with it?
Yes pyELib is open source. The suite is free to use for whatsoever non commercial use and can be freely distributed. Nevertheless, you ain't free to do whatever you want with it. In particular you ain't, at present time, allowed to neither modify nor build upon it as it has been released under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License. Please, be sure to check the LICENSE file that comes inside the archive, as well as the Copyright section of this documentation. And feel free to contact me should you have any doubt/request about the license.
I see several error messages in the console. What's wrong?
Several things can go wrong. First, pyELib is far from being perfect. The major problem I've encountered is books of the same type having been encoded in many different ways, sometimes violating the file type standard. pyELib makes huge use of try/except/finally clauses and clearly reports the file and method that raised an exception, so you can easily see if it's a pyELib bug. In many cases, anyway, it's not pyELib who's wrong but any of the third party tools used. Nothing can be done about it since those programs are used as they are. For example, the error message "Syntax Error: Expected the optional content group list, but wasn't able to find it, or it isn't an Array" comes from poppler. A third source of problems and errors comes from the web servers containing information about the books. If they contain errors, there's no way pyELib can cope with that. Note that sometimes it's better to have pyELib failing (either directly or indirectly) on a book than having pyELib working for that specific file and getting everything wrong for another thousands. Nevertheless, should you see a pyELib error, please report it so that it can be investigated. A list with known third party errors is being prepared. pyELib can't do much about it.
Is it possible to decide which file format is better/worse than others?
At the moment is hardcoded but could be added in future. you can go change it directly in the code (file, line)
Is it possible to change the renaming style of the files?
At the moment is hardcoded but could be implemented in the future.
I would like to give creadits (in alphabetical order) to several people/entities who developed tools that I've somehow used during the development of pyELib, given that, nevertheless, none of them is neither required for pyELib to run nor included into pyELib itself.
Creative Commons http://creativecommons.org
Free Logo Design http://www.freelogoservices.com
Tasseract ORC http://code.google.com/p/tesseract-ocr
WWW SQL Designer http://ondras.zarovi.cz/sql/demo
Google App Engine https://developers.google.com/appengine/