<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Recent changes to Home</title><link>https://sourceforge.net/p/ebook-scraper/wiki/Home/</link><description>Recent changes to Home</description><atom:link href="https://sourceforge.net/p/ebook-scraper/wiki/Home/feed" rel="self"/><language>en</language><lastBuildDate>Wed, 05 Mar 2025 15:10:19 -0000</lastBuildDate><atom:link href="https://sourceforge.net/p/ebook-scraper/wiki/Home/feed" rel="self" type="application/rss+xml"/><item><title>Home modified by John Dalbey</title><link>https://sourceforge.net/p/ebook-scraper/wiki/Home/</link><description>&lt;div class="markdown_content"&gt;&lt;pre&gt;--- v7
+++ v8
@@ -1,10 +1,12 @@
+## Ebook-Scraper
+
 My local public library uses the OverDrive platform to allow patrons to access ebooks and other digital resources.  I prefer to read ebooks on a Kindle, and happily many books are available in Kindle format.  However, there are some items that are available only in EPUB format with DRM protection, or as Overdrive Read books.  The EPUB format books can only be read using Adobe Digital Editions which is not available for Linux, my preferred OS.  So my only option is to view the Overdrive Read book in a web browser.

 This simple Python program converts the ebook in the browser into a plain text file.  

-The mechanics work like this:  
+The program operates in this manner:
 1. Take a screenshot of a page of the book as displayed in the browser.
-2. Use Optical Character Recognition (OCR) to convert the screenshot into plain text (using Tesseract).
+2. Use Optical Character Recognition (OCR) to convert the screenshot into plain text.
 3. Save the text to a local file.
 4. Advance to the next page by simulating a key press.

@@ -24,15 +26,15 @@
 `python3 -m venv ebook-env`
 `source ebook-env/bin/activate`
  Install required python modules: 
-`pip install PySimpleGUI pyautogui pytesseract`
+`pip install pyautogui pytesseract`
 Run the application: 
 `python ebook_scraper.py`


 ### Usage
-Open a web browser with the Overdrive book you want to read.  Launch the Ebook Scraper and position the opening dialog  side-by-side with the browser.  In browser, advance to the page you want to start scraping. On the top banner, click on the column control to select single column.  Click on the bottom to view the progress bar. Enter the desired start and end page numbers dialog form.  Click on the text to hide the progress bar.  Click OK in the dialog.  
+Open a web browser with the Overdrive book you want to read.  Launch the Ebook Scraper.  In browser, advance to the page you want to start scraping. On the top banner, click on the column control to select single column.  Click on the bottom to view the progress bar. Enter the desired start and end page numbers into the form fields.  Click on the text to hide the progress bar.  Enter optional output filename or use the default. Click OK in the dialog.  
 Next you will be prompted to locate the area of the screen that contains the text to be scraped.  Once the boundaries are located, the process commences automatically.  Sit back and watch as the browser pages flip by.  Don't touch the mouse or keyboard until the scraping is complete.  
-The output file named `ebook_content.txt`  will be found in the same directory as the program. 
+The default output file named `ebook_content.txt`  will be found in the same directory as the program. 

 [Video demonstration](https://www.youtube.com/watch?v=CoIZ4Xp8Ek8)

&lt;/pre&gt;
&lt;/div&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">John Dalbey</dc:creator><pubDate>Wed, 05 Mar 2025 15:10:19 -0000</pubDate><guid>https://sourceforge.net87b5a605c35d7b76c549f1cffa0cb6f637ed98d8</guid></item><item><title>Home modified by John Dalbey</title><link>https://sourceforge.net/p/ebook-scraper/wiki/Home/</link><description>&lt;div class="markdown_content"&gt;&lt;pre&gt;--- v6
+++ v7
@@ -36,5 +36,17 @@

 [Video demonstration](https://www.youtube.com/watch?v=CoIZ4Xp8Ek8)

+###Is this legal?
+I'm not a lawyer nor have I consulted one on this issue.  My claim is that it *is* legal, for personal use, and I support that claim with several arguments.
+
+1. I obtained the original material, the ebook, by checking it out from the library, the lawful way to proceed.  So I am allowed to view the material assuming I do so within the time period for which I have checked out the book.  This tool simply facilitates my viewing the ebook on a different device (a Kindle reader) instead of a web browser.  
+
+2. This tool does not break or hack the DRM features of the ebook. 
+
+3. An analogy could be made with recording a song on the radio or using a DVR to record a TV show.  In the U.S. the "Audio Home Recording Act of 1992" permits recording audio from the radio, and the Betamax ruling of 1984 considered it a fair use to record programs off the television to be watched at a later date. 
+
+
+
+
 [[members limit=20]]
 [[download_button]]d
&lt;/pre&gt;
&lt;/div&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">John Dalbey</dc:creator><pubDate>Wed, 31 Jan 2024 02:43:12 -0000</pubDate><guid>https://sourceforge.net4a9b2ad6cd8e58f7f6780ef01fbb3a0da0b64f9c</guid></item><item><title>Home modified by John Dalbey</title><link>https://sourceforge.net/p/ebook-scraper/wiki/Home/</link><description>&lt;div class="markdown_content"&gt;&lt;pre&gt;--- v5
+++ v6
@@ -34,5 +34,7 @@
 Next you will be prompted to locate the area of the screen that contains the text to be scraped.  Once the boundaries are located, the process commences automatically.  Sit back and watch as the browser pages flip by.  Don't touch the mouse or keyboard until the scraping is complete.  
 The output file named `ebook_content.txt`  will be found in the same directory as the program. 

+[Video demonstration](https://www.youtube.com/watch?v=CoIZ4Xp8Ek8)
+
 [[members limit=20]]
 [[download_button]]d
&lt;/pre&gt;
&lt;/div&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">John Dalbey</dc:creator><pubDate>Tue, 30 Jan 2024 18:48:31 -0000</pubDate><guid>https://sourceforge.net37b15189438ed105e06b18689e658197422d7fac</guid></item><item><title>Home modified by John Dalbey</title><link>https://sourceforge.net/p/ebook-scraper/wiki/Home/</link><description>&lt;div class="markdown_content"&gt;&lt;pre&gt;--- v4
+++ v5
@@ -21,10 +21,8 @@
 Install dependencies:
 `sudo apt-get install tesseract-ocr`
 Create a virtual environment and activate it:
-~~~
-python3 -m venv ebook-env
-source ebook-env/bin/activate
-~~~
+`python3 -m venv ebook-env`
+`source ebook-env/bin/activate`
  Install required python modules: 
 `pip install PySimpleGUI pyautogui pytesseract`
 Run the application: 
&lt;/pre&gt;
&lt;/div&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">John Dalbey</dc:creator><pubDate>Tue, 30 Jan 2024 18:31:04 -0000</pubDate><guid>https://sourceforge.netea6e1d4113a1103f26e412132b12fed11c1e8b61</guid></item><item><title>Home modified by John Dalbey</title><link>https://sourceforge.net/p/ebook-scraper/wiki/Home/</link><description>&lt;div class="markdown_content"&gt;&lt;pre&gt;--- v3
+++ v4
@@ -16,19 +16,25 @@
 Use the ticket system to submit defect reports or enhancement requests.

 ### Installation
-Required dependencies:
+Download the source code:
+`curl https://sourceforge.net/p/ebook-scraper/code/HEAD/tree/trunk/ebook_scraper.py?format=raw &amp;gt; ebook_scraper.py`
+Install dependencies:
+`sudo apt-get install tesseract-ocr`
+Create a virtual environment and activate it:
 ~~~
- PySimpleGUI
- pyautogui
- pytesseract
+python3 -m venv ebook-env
+source ebook-env/bin/activate
 ~~~
-Download `ebook_scraper.py`.
-Execute: `python ebook_scraper.py`
+ Install required python modules: 
+`pip install PySimpleGUI pyautogui pytesseract`
+Run the application: 
+`python ebook_scraper.py`
+

 ### Usage
 Open a web browser with the Overdrive book you want to read.  Launch the Ebook Scraper and position the opening dialog  side-by-side with the browser.  In browser, advance to the page you want to start scraping. On the top banner, click on the column control to select single column.  Click on the bottom to view the progress bar. Enter the desired start and end page numbers dialog form.  Click on the text to hide the progress bar.  Click OK in the dialog.  
 Next you will be prompted to locate the area of the screen that contains the text to be scraped.  Once the boundaries are located, the process commences automatically.  Sit back and watch as the browser pages flip by.  Don't touch the mouse or keyboard until the scraping is complete.  
-At the moment the output file is named `ebook_content.txt` and will be found in the same directory as the program. 
+The output file named `ebook_content.txt`  will be found in the same directory as the program. 

 [[members limit=20]]
 [[download_button]]d
&lt;/pre&gt;
&lt;/div&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">John Dalbey</dc:creator><pubDate>Tue, 30 Jan 2024 18:27:33 -0000</pubDate><guid>https://sourceforge.netf109d699067aa5dedc5c7884a61b3925d9c26e9f</guid></item><item><title>Home modified by John Dalbey</title><link>https://sourceforge.net/p/ebook-scraper/wiki/Home/</link><description>&lt;div class="markdown_content"&gt;&lt;pre&gt;--- v2
+++ v3
@@ -15,5 +15,20 @@
 The source code is pretty straightforward and has some explanatory comments so it should be easy to modify or enhance.
 Use the ticket system to submit defect reports or enhancement requests.

+### Installation
+Required dependencies:
+~~~
+ PySimpleGUI
+ pyautogui
+ pytesseract
+~~~
+Download `ebook_scraper.py`.
+Execute: `python ebook_scraper.py`
+
+### Usage
+Open a web browser with the Overdrive book you want to read.  Launch the Ebook Scraper and position the opening dialog  side-by-side with the browser.  In browser, advance to the page you want to start scraping. On the top banner, click on the column control to select single column.  Click on the bottom to view the progress bar. Enter the desired start and end page numbers dialog form.  Click on the text to hide the progress bar.  Click OK in the dialog.  
+Next you will be prompted to locate the area of the screen that contains the text to be scraped.  Once the boundaries are located, the process commences automatically.  Sit back and watch as the browser pages flip by.  Don't touch the mouse or keyboard until the scraping is complete.  
+At the moment the output file is named `ebook_content.txt` and will be found in the same directory as the program. 
+
 [[members limit=20]]
-[[download_button]]
+[[download_button]]d
&lt;/pre&gt;
&lt;/div&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">John Dalbey</dc:creator><pubDate>Sun, 28 Jan 2024 04:37:13 -0000</pubDate><guid>https://sourceforge.netaf7baea633ceba8f0b9be145afe166199ea81843</guid></item><item><title>Home modified by John Dalbey</title><link>https://sourceforge.net/p/ebook-scraper/wiki/Home/</link><description>&lt;div class="markdown_content"&gt;&lt;pre&gt;--- v1
+++ v2
@@ -1,8 +1,19 @@
-Welcome to your wiki!
+My local public library uses the OverDrive platform to allow patrons to access ebooks and other digital resources.  I prefer to read ebooks on a Kindle, and happily many books are available in Kindle format.  However, there are some items that are available only in EPUB format with DRM protection, or as Overdrive Read books.  The EPUB format books can only be read using Adobe Digital Editions which is not available for Linux, my preferred OS.  So my only option is to view the Overdrive Read book in a web browser.

-This is the default page, edit it as you see fit. To add a new page simply reference it within brackets, e.g.: [SamplePage].
+This simple Python program converts the ebook in the browser into a plain text file.  

-The wiki uses [Markdown](/p/ebook-scraper/wiki/markdown_syntax/) syntax.
+The mechanics work like this:  
+1. Take a screenshot of a page of the book as displayed in the browser.
+2. Use Optical Character Recognition (OCR) to convert the screenshot into plain text (using Tesseract).
+3. Save the text to a local file.
+4. Advance to the next page by simulating a key press.
+
+When completed, the resulting text file can be converted to Kindle format using Calibre or other conversion tool. 
+
+I wrote this utility for my own use and it is clearly not production quality software.  It has only minimal error handling.  Like many automation tools that rely on simulating human input, it is somewhat brittle and easily confused by errant user actions.  The OCR is not perfect and doesn't perform well on multiple columns, sidebars, footnotes, tables, etc.  Given these limitations it has worked really well for me.  
+
+The source code is pretty straightforward and has some explanatory comments so it should be easy to modify or enhance.
+Use the ticket system to submit defect reports or enhancement requests.

 [[members limit=20]]
 [[download_button]]
&lt;/pre&gt;
&lt;/div&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">John Dalbey</dc:creator><pubDate>Sun, 28 Jan 2024 04:24:18 -0000</pubDate><guid>https://sourceforge.netf3fcb4c1761c6df1497483b53d28790f83807ee6</guid></item><item><title>Home modified by John Dalbey</title><link>https://sourceforge.net/p/ebook-scraper/wiki/Home/</link><description>&lt;div class="markdown_content"&gt;&lt;p&gt;Welcome to your wiki!&lt;/p&gt;
&lt;p&gt;This is the default page, edit it as you see fit. To add a new page simply reference it within brackets, e.g.: &lt;span&gt;[SamplePage]&lt;/span&gt;.&lt;/p&gt;
&lt;p&gt;The wiki uses &lt;a class="" href="/p/ebook-scraper/wiki/markdown_syntax/" rel="nofollow"&gt;Markdown&lt;/a&gt; syntax.&lt;/p&gt;
&lt;p&gt;&lt;/p&gt;&lt;h6&gt;Project Members:&lt;/h6&gt;
	&lt;ul class="md-users-list"&gt;
		&lt;li&gt;&lt;a href="/u/jdalbey/"&gt;John Dalbey&lt;/a&gt; (admin)&lt;/li&gt;
		
	&lt;/ul&gt;&lt;br/&gt;
&lt;p&gt;&lt;span class="download-button-65b5d5d4f29810f887043373" style="margin-bottom: 1em; display: block;"&gt;&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;/p&gt;&lt;/div&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">John Dalbey</dc:creator><pubDate>Sun, 28 Jan 2024 04:19:33 -0000</pubDate><guid>https://sourceforge.net3b0f875811e25d61770ea0c89e2f7590a9d71ece</guid></item></channel></rss>