Using 0.8.5.12 in debian the import from epicurious.com fails. Import of foodnetwork pages, others is OK. Iseem to have problem only with epicurious.com pages.
oels@debian2:~$ gourmet
/usr/share/gourmet/exporters/printer.py:29: SyntaxWarning: name 'RecRenderer' is assigned to before global declaration
def load_lprprint ():
/usr/share/gourmet/exporters/printer.py:29: SyntaxWarning: name 'SimpleWriter' is assigned to before global declaration
def load_lprprint ():
/usr/share/gourmet/importers/__init__.py:2: DeprecationWarning: Non-ASCII character '\xc3' in file /usr/share/gourmet/importers/rezkonv_importer.py on line 41, but no encoding declared; see http://www.python.org/peps/pep-0263.html for details
import gxml2_importer, rezkonv_importer
(Gourmet Recipe Manager:12545): Gnome-WARNING **: gnome-program.c:1190: cannot load modules after program is initialized
/usr/share/gourmet/GourmetRecipeManager.py:1191: GtkWarning: gtk_progress_set_percentage: assertion `percentage >= 0 && percentage <= 1.0' failed
self.prog.set_fraction(prog)
Escaping Unable to import
Called hide_progress_dialog!
joels@debian2:~$ gourmet
Escaping Unable to import
Called hide_progress_dialog!
Any help/comments appreciated
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Got it!
The problem was the way Gourmet detects headers of HTML pages. I'll plan to release the fix shortly. If you'd like to import epicurious pages in the mean time, it's a simple enough fix (1 line) to do yourself:
Goto src/lib/importers/html_importer.py in the source.
Goto line 297
Change the line:
if header == 'text/html':
to
if header.find('html') >= 0:
(keep the indentation as is in the file).
Hopefully this will cause Gourmet not only to recognize pages like Epicurious, which declared their encoding -- text/html;blahblahblah -- but also to recognize xhtml, etc. as html.
Oh -- one more error I found once this was working...
in src/lib/importers/html_plugins/html_helpers.py
@line 35
underneath "def remove_comments...."
add
if not text: return text
This prevents another error.
That biscotti looks good. There's one funny-bit in the import (a <p> tag shows up in the ingredients), but it's functional. Fixing that last bit just means tweaking the ruleset for epicurious, so I'll go ahead and close this bug (a major one), understanding that there are still (and probably always will be) many more minor bugs to fix when it comes to tweaking the webpage scrapers!
Tom
Tom
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Sorry, but it didn't work for me unless I pasted the fix wrong.
Also, should I be trying from the regular epicurios page or their "print recipe"=printer friendly page? I've just tried both and neither worked for me.
Assuming eventually that this gets working, and I have confidence it will, my next project would be to get my current Koch-suite/Postgresql recipe database imported, since gourmet has a few nicer features now, particualrly ability to add images.
anyhow, here's my console output now:
joels@debian2:~/Documents/gourmet-0.8.5.12$ gourmet
/usr/lib/python2.3/site-packages/gourmet/GourmetRecipeManager.py:1191: GtkWarning: gtk_progress_set_percentage: assertion `percentage >= 0 && percentage <= 1.0' failed
self.prog.set_fraction(prog)
/usr/lib/python2.3/site-packages/gourmet/reccard.py:1653: GtkWarning: gtk_combo_box_entry_set_text_column: assertion `entry_box->priv->text_column == -1' failed
self.keyBox.set_text_column(0)
Called hide_progress_dialog!
Escaping Error retrieving http://www.epicurious.com/recipes/recipe_views/printer_friendly/102709.
Traceback (most recent call last):
File "/usr/lib/python2.3/site-packages/gourmet/GourmetRecipeManager.py", line 907, in import_webpageg
self.run_import(i,url,display_errors=False)
File "/usr/lib/python2.3/site-packages/gourmet/GourmetRecipeManager.py", line 1100, in run_import
t.start()
File "/usr/lib/python2.3/site-packages/gourmet/GourmetFauxThreads.py", line 38, in start
self.target_func()
File "/usr/lib/python2.3/site-packages/gourmet/GourmetFauxThreads.py", line 43, in target_func
GourmetThreads.SuspendableThread.target_func(self)
File "/usr/lib/python2.3/site-packages/gourmet/GourmetThreads.py", line 36, in target_func
self.c.run()
File "/usr/lib/python2.3/site-packages/gourmet/importers/html_importer.py", line 344, in run
self.d = scrape_url(self.url, progress=self.prog)
File "/usr/lib/python2.3/site-packages/gourmet/importers/html_importer.py", line 272, in scrape_url
return bss.scrape_url(url,progress=progress)
File "/usr/lib/python2.3/site-packages/gourmet/importers/html_importer.py", line 84, in scrape_url
return self.scrape()
File "/usr/lib/python2.3/site-packages/gourmet/importers/html_importer.py", line 94, in scrape
self.apply_rule(rule)
File "/usr/lib/python2.3/site-packages/gourmet/importers/html_importer.py", line 105, in apply_rule
self.store_tag(store_as,tag,retmethod,post_processing)
File "/usr/lib/python2.3/site-packages/gourmet/importers/html_importer.py", line 217, in store_tag
val=self.post_process(post_processing, val, tag)
File "/usr/lib/python2.3/site-packages/gourmet/importers/html_importer.py", line 129, in post_process
return post_processing(value,tag)
File "/usr/lib/python2.3/site-packages/gourmet/importers/html_plugins/html_helpers.py", line 45, in __call__
items = container.contents
AttributeError: 'NoneType' object has no attribute 'contents'
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Using 0.8.5.12 in debian the import from epicurious.com fails. Import of foodnetwork pages, others is OK. Iseem to have problem only with epicurious.com pages.
dialog box show:
Gourmet does not know how to import site http://www.epicurious.com/recipes/recipe_views/views/102709
Are you sure http://www.epicurious.com/recipes/recipe_views/views/102709 points to a page with a recipe on it?
and console shows:
oels@debian2:~$ gourmet
/usr/share/gourmet/exporters/printer.py:29: SyntaxWarning: name 'RecRenderer' is assigned to before global declaration
def load_lprprint ():
/usr/share/gourmet/exporters/printer.py:29: SyntaxWarning: name 'SimpleWriter' is assigned to before global declaration
def load_lprprint ():
/usr/share/gourmet/importers/__init__.py:2: DeprecationWarning: Non-ASCII character '\xc3' in file /usr/share/gourmet/importers/rezkonv_importer.py on line 41, but no encoding declared; see http://www.python.org/peps/pep-0263.html for details
import gxml2_importer, rezkonv_importer
(Gourmet Recipe Manager:12545): Gnome-WARNING **: gnome-program.c:1190: cannot load modules after program is initialized
/usr/share/gourmet/GourmetRecipeManager.py:1191: GtkWarning: gtk_progress_set_percentage: assertion `percentage >= 0 && percentage <= 1.0' failed
self.prog.set_fraction(prog)
Escaping Unable to import
Called hide_progress_dialog!
joels@debian2:~$ gourmet
Escaping Unable to import
Called hide_progress_dialog!
Any help/comments appreciated
Got it!
The problem was the way Gourmet detects headers of HTML pages. I'll plan to release the fix shortly. If you'd like to import epicurious pages in the mean time, it's a simple enough fix (1 line) to do yourself:
Goto src/lib/importers/html_importer.py in the source.
Goto line 297
Change the line:
if header == 'text/html':
to
if header.find('html') >= 0:
(keep the indentation as is in the file).
Hopefully this will cause Gourmet not only to recognize pages like Epicurious, which declared their encoding -- text/html;blahblahblah -- but also to recognize xhtml, etc. as html.
Oh -- one more error I found once this was working...
in src/lib/importers/html_plugins/html_helpers.py
@line 35
underneath "def remove_comments...."
add
if not text: return text
This prevents another error.
That biscotti looks good. There's one funny-bit in the import (a <p> tag shows up in the ingredients), but it's functional. Fixing that last bit just means tweaking the ruleset for epicurious, so I'll go ahead and close this bug (a major one), understanding that there are still (and probably always will be) many more minor bugs to fix when it comes to tweaking the webpage scrapers!
Tom
Tom
Sorry, but it didn't work for me unless I pasted the fix wrong.
Also, should I be trying from the regular epicurios page or their "print recipe"=printer friendly page? I've just tried both and neither worked for me.
Assuming eventually that this gets working, and I have confidence it will, my next project would be to get my current Koch-suite/Postgresql recipe database imported, since gourmet has a few nicer features now, particualrly ability to add images.
anyhow, here's my console output now:
joels@debian2:~/Documents/gourmet-0.8.5.12$ gourmet
/usr/lib/python2.3/site-packages/gourmet/GourmetRecipeManager.py:1191: GtkWarning: gtk_progress_set_percentage: assertion `percentage >= 0 && percentage <= 1.0' failed
self.prog.set_fraction(prog)
/usr/lib/python2.3/site-packages/gourmet/reccard.py:1653: GtkWarning: gtk_combo_box_entry_set_text_column: assertion `entry_box->priv->text_column == -1' failed
self.keyBox.set_text_column(0)
Called hide_progress_dialog!
Escaping Error retrieving http://www.epicurious.com/recipes/recipe_views/printer_friendly/102709.
Traceback (most recent call last):
File "/usr/lib/python2.3/site-packages/gourmet/GourmetRecipeManager.py", line 907, in import_webpageg
self.run_import(i,url,display_errors=False)
File "/usr/lib/python2.3/site-packages/gourmet/GourmetRecipeManager.py", line 1100, in run_import
t.start()
File "/usr/lib/python2.3/site-packages/gourmet/GourmetFauxThreads.py", line 38, in start
self.target_func()
File "/usr/lib/python2.3/site-packages/gourmet/GourmetFauxThreads.py", line 43, in target_func
GourmetThreads.SuspendableThread.target_func(self)
File "/usr/lib/python2.3/site-packages/gourmet/GourmetThreads.py", line 36, in target_func
self.c.run()
File "/usr/lib/python2.3/site-packages/gourmet/importers/html_importer.py", line 344, in run
self.d = scrape_url(self.url, progress=self.prog)
File "/usr/lib/python2.3/site-packages/gourmet/importers/html_importer.py", line 272, in scrape_url
return bss.scrape_url(url,progress=progress)
File "/usr/lib/python2.3/site-packages/gourmet/importers/html_importer.py", line 84, in scrape_url
return self.scrape()
File "/usr/lib/python2.3/site-packages/gourmet/importers/html_importer.py", line 94, in scrape
self.apply_rule(rule)
File "/usr/lib/python2.3/site-packages/gourmet/importers/html_importer.py", line 105, in apply_rule
self.store_tag(store_as,tag,retmethod,post_processing)
File "/usr/lib/python2.3/site-packages/gourmet/importers/html_importer.py", line 217, in store_tag
val=self.post_process(post_processing, val, tag)
File "/usr/lib/python2.3/site-packages/gourmet/importers/html_importer.py", line 129, in post_process
return post_processing(value,tag)
File "/usr/lib/python2.3/site-packages/gourmet/importers/html_plugins/html_helpers.py", line 45, in __call__
items = container.contents
AttributeError: 'NoneType' object has no attribute 'contents'