Epicurious import fails

  • Joel Swartz

    Joel Swartz - 2005-09-14

    Using in debian the import from epicurious.com fails. Import of foodnetwork pages, others is OK.  Iseem to have problem only with epicurious.com pages.

    dialog box show:

    Gourmet does not know how to import site http://www.epicurious.com/recipes/recipe_views/views/102709
    Are you sure http://www.epicurious.com/recipes/recipe_views/views/102709 points to a page with a recipe on it?

    and console shows:

    oels@debian2:~$ gourmet
    /usr/share/gourmet/exporters/printer.py:29: SyntaxWarning: name 'RecRenderer' is assigned to before global declaration
      def load_lprprint ():
    /usr/share/gourmet/exporters/printer.py:29: SyntaxWarning: name 'SimpleWriter' is assigned to before global declaration
      def load_lprprint ():
    /usr/share/gourmet/importers/__init__.py:2: DeprecationWarning: Non-ASCII character '\xc3' in file /usr/share/gourmet/importers/rezkonv_importer.py on line 41, but no encoding declared; see http://www.python.org/peps/pep-0263.html for details
      import gxml2_importer, rezkonv_importer

    (Gourmet Recipe Manager:12545): Gnome-WARNING **: gnome-program.c:1190: cannot load modules after program is initialized
    /usr/share/gourmet/GourmetRecipeManager.py:1191: GtkWarning: gtk_progress_set_percentage: assertion `percentage >= 0 && percentage <= 1.0' failed
    Escaping Unable to import
    Called hide_progress_dialog!

    joels@debian2:~$ gourmet
    Escaping Unable to import
    Called hide_progress_dialog!

    Any help/comments appreciated

    • tom

      tom - 2005-09-14

      Got it!
      The problem was the way Gourmet detects headers of HTML pages. I'll plan to release the fix shortly. If you'd like to import epicurious pages in the mean time, it's a simple enough fix (1 line) to do yourself:

      Goto src/lib/importers/html_importer.py in the source.
      Goto line 297
      Change the line:
      if header == 'text/html':
      if header.find('html') >= 0:

      (keep the indentation as is in the file).

      Hopefully this will cause Gourmet not only to recognize pages like Epicurious, which declared their encoding -- text/html;blahblahblah -- but also to recognize xhtml, etc. as html.

      Oh -- one more error I found once this was working...

      in src/lib/importers/html_plugins/html_helpers.py
      @line 35
      underneath "def remove_comments...."
      if not text: return text

      This prevents another error.

      That biscotti looks good. There's one funny-bit in the import (a <p> tag shows up in the ingredients), but it's functional. Fixing that last bit just means tweaking the ruleset for epicurious, so I'll go ahead and close this bug (a major one), understanding that there are still (and probably always will be) many more minor bugs to fix when it comes to tweaking the webpage scrapers!



    • Joel Swartz

      Joel Swartz - 2005-09-14

      Sorry, but it didn't work for me unless I pasted the fix wrong. 
      Also, should I be trying from the regular epicurios page or their "print recipe"=printer friendly page?  I've just tried both and neither worked for me.

      Assuming eventually that this gets working, and I have confidence it will, my next project would be to get my current Koch-suite/Postgresql recipe database imported, since gourmet has a few nicer features now, particualrly ability to add images.
      anyhow,  here's my console output now:

      joels@debian2:~/Documents/gourmet-$ gourmet
      /usr/lib/python2.3/site-packages/gourmet/GourmetRecipeManager.py:1191: GtkWarning: gtk_progress_set_percentage: assertion `percentage >= 0 && percentage <= 1.0' failed
      /usr/lib/python2.3/site-packages/gourmet/reccard.py:1653: GtkWarning: gtk_combo_box_entry_set_text_column: assertion `entry_box->priv->text_column == -1' failed
      Called hide_progress_dialog!
      Escaping Error retrieving http://www.epicurious.com/recipes/recipe_views/printer_friendly/102709.
      Traceback (most recent call last):
        File "/usr/lib/python2.3/site-packages/gourmet/GourmetRecipeManager.py", line 907, in import_webpageg
        File "/usr/lib/python2.3/site-packages/gourmet/GourmetRecipeManager.py", line 1100, in run_import
        File "/usr/lib/python2.3/site-packages/gourmet/GourmetFauxThreads.py", line 38, in start
        File "/usr/lib/python2.3/site-packages/gourmet/GourmetFauxThreads.py", line 43, in target_func
        File "/usr/lib/python2.3/site-packages/gourmet/GourmetThreads.py", line 36, in target_func
        File "/usr/lib/python2.3/site-packages/gourmet/importers/html_importer.py", line 344, in run
          self.d = scrape_url(self.url, progress=self.prog)
        File "/usr/lib/python2.3/site-packages/gourmet/importers/html_importer.py", line 272, in scrape_url
          return bss.scrape_url(url,progress=progress)
        File "/usr/lib/python2.3/site-packages/gourmet/importers/html_importer.py", line 84, in scrape_url
          return self.scrape()
        File "/usr/lib/python2.3/site-packages/gourmet/importers/html_importer.py", line 94, in scrape
        File "/usr/lib/python2.3/site-packages/gourmet/importers/html_importer.py", line 105, in apply_rule
        File "/usr/lib/python2.3/site-packages/gourmet/importers/html_importer.py", line 217, in store_tag
          val=self.post_process(post_processing, val, tag)
        File "/usr/lib/python2.3/site-packages/gourmet/importers/html_importer.py", line 129, in post_process
          return post_processing(value,tag)
        File "/usr/lib/python2.3/site-packages/gourmet/importers/html_plugins/html_helpers.py", line 45, in __call__
          items = container.contents
      AttributeError: 'NoneType' object has no attribute 'contents'


Log in to post a comment.