Thread: [Rbmake-users] problems converting web site
Brought to you by:
wayned
From: Steve H. <ste...@sw...> - 2003-06-30 07:40:37
|
Hi, First off, I want to say that rbmake is a very exciting find for me. I've recently purchased an RCA eBook 1100 from Ebay and was thrilled to find rbmake. Initially, I downloaded the 1.0 binaries. I was experiencing problems with rbmake crashing when I used 1.0. I thought I would try to build 1.1 (can't seem to find 1.1 binaries to download), to see if it solved the problems. After several iterations, I was able to get rbmake to build with image support. Now, 1.1 doesn't seem to crash for the most part. However, it is producing some rb files that my ebook doesn't like (and cannot read). I am trying to convert the Christian Science Monitor web site using this command: rbmake -f1 -jio csm-depth1 http://www.christiansciencemonitor.com <http://www.christiansciencemonitor.com/> It cranks away, seeming to succeed and produces a file csm-depth1.rb. When I import this file into my ebook, it cannot read it. If I change the command to not follow links (removing the -f1) then it works, but there isn't much content because it's just the front page of the site. I am running this on Windows XP - building under cygwin. Any ideas? Thanks, Steve |
From: Wayne D. <rb...@bl...> - 2003-06-30 16:49:20
|
On Mon, Jun 30, 2003 at 02:40:33AM -0500, Steve Harrington wrote: > rbmake -f1 -jio csm-depth1 http://www.christiansciencemonitor.com There's a bug in rbmake that causes it to sometimes not output the joined HTML page after fetching everything. I think it happens when there are failures in the fetched pages -- I should hopefully have time to investigate this soon. In the meantime, you can work around the problem by excluding pages from being fetched. I recommend creating an option file and editing it. Start by running the above command with the -d option, and redirect the output to a file: rbmake -d -f1 -jio csm-depth1 http://www.christiansciencemonitor.com >csm.opt Then, edit it and add these lines: Exclude-URLs-Matching: */event.ng* Exclude-URLs-Matching: */message.asp* Exclude-URLs-Matching: */spidertest* At this point you can run this command to grab the CSM content: rbmake -l csm.opt Hopefully that will work for you. ..wayne.. |
From: Wayne D. <rb...@bl...> - 2003-07-01 17:01:14
|
On Mon, Jun 30, 2003 at 02:40:33AM -0500, Steve Harrington wrote: > rbmake -f1 -jio csm-depth1 http://www.christiansciencemonitor.com > [... invalid .rb file created ...] OK, I fixed the root cause of the bug: it was dropping a page that it couldn't fetch which was still marked as "maybe HTML", and that combination wasn't decrementing the "to do" count for the joined-page group. You can snag the latest source either directly from CVS, or via this tar file: http://rbmake.sourceforge.net/rbmake-cvs.tar.gz ..wayne.. |
From: Steve H. <ste...@sw...> - 2003-07-03 06:28:38
|
I downloaded the fix, and it works like a champ - thanks. Now, I have a question. When I convert a web site such as the Christian Science Monitor, when I import it into my ebook, some of the formatting is such that the text for paragraphs starts at the far right edge of the screen (so that there might be but one or two words on a line). Is this something I can fix with one of the rbmake command line switches? I tried the -Ts (simple formatting), but that didn't seem to work. I'm not sure what other switches might have an effect on it. Or is that just something I need to live with? Thanks for the bug fix and thanks for writing rbmake. Steve -----Original Message----- From: Wayne Davison [mailto:rb...@bl...] Sent: Tuesday, July 01, 2003 12:01 PM To: Steve Harrington Cc: rbm...@li... Subject: Re: [Rbmake-users] problems converting web site On Mon, Jun 30, 2003 at 02:40:33AM -0500, Steve Harrington wrote: > rbmake -f1 -jio csm-depth1 http://www.christiansciencemonitor.com > [... invalid .rb file created ...] OK, I fixed the root cause of the bug: it was dropping a page that it couldn't fetch which was still marked as "maybe HTML", and that combination wasn't decrementing the "to do" count for the joined-page group. You can snag the latest source either directly from CVS, or via this tar file: http://rbmake.sourceforge.net/rbmake-cvs.tar.gz ..wayne.. |
From: Wayne D. <wa...@bl...> - 2003-07-03 19:10:52
|
On Thu, Jul 03, 2003 at 01:28:27AM -0500, Steve Harrington wrote: > When I convert a web site such as the Christian Science Monitor, when > I import it into my ebook, some of the formatting is such that the > text for paragraphs starts at the far right edge of the screen (so > that there might be but one or two words on a line). This is probably caused by all the table elements on the page. You should try converting their plain-text site instead: http://www.christiansciencemonitor.com/cgi-bin/redirect.pl?textEdition That should look a lot better. ..wayne.. |