Epicurious import fails

Help
2005-09-14
2013-05-14
  • Joel Swartz
    Joel Swartz
    2005-09-14

    Using 0.8.5.12 in debian the import from epicurious.com fails. Import of foodnetwork pages, others is OK.  Iseem to have problem only with epicurious.com pages.

    dialog box show:

    Gourmet does not know how to import site http://www.epicurious.com/recipes/recipe_views/views/102709
    Are you sure http://www.epicurious.com/recipes/recipe_views/views/102709 points to a page with a recipe on it?

    and console shows:

    oels@debian2:~$ gourmet
    /usr/share/gourmet/exporters/printer.py:29: SyntaxWarning: name 'RecRenderer' is assigned to before global declaration
      def load_lprprint ():
    /usr/share/gourmet/exporters/printer.py:29: SyntaxWarning: name 'SimpleWriter' is assigned to before global declaration
      def load_lprprint ():
    /usr/share/gourmet/importers/__init__.py:2: DeprecationWarning: Non-ASCII character '\xc3' in file /usr/share/gourmet/importers/rezkonv_importer.py on line 41, but no encoding declared; see http://www.python.org/peps/pep-0263.html for details
      import gxml2_importer, rezkonv_importer

    (Gourmet Recipe Manager:12545): Gnome-WARNING **: gnome-program.c:1190: cannot load modules after program is initialized
    /usr/share/gourmet/GourmetRecipeManager.py:1191: GtkWarning: gtk_progress_set_percentage: assertion `percentage >= 0 && percentage <= 1.0' failed
      self.prog.set_fraction(prog)
    Escaping Unable to import
    Called hide_progress_dialog!

    joels@debian2:~$ gourmet
    Escaping Unable to import
    Called hide_progress_dialog!

    Any help/comments appreciated

     
    • tom
      tom
      2005-09-14

      Got it!
      The problem was the way Gourmet detects headers of HTML pages. I'll plan to release the fix shortly. If you'd like to import epicurious pages in the mean time, it's a simple enough fix (1 line) to do yourself:

      Goto src/lib/importers/html_importer.py in the source.
      Goto line 297
      Change the line:
      if header == 'text/html':
      to
      if header.find('html') >= 0:

      (keep the indentation as is in the file).

      Hopefully this will cause Gourmet not only to recognize pages like Epicurious, which declared their encoding -- text/html;blahblahblah -- but also to recognize xhtml, etc. as html.

      Oh -- one more error I found once this was working...

      in src/lib/importers/html_plugins/html_helpers.py
      @line 35
      underneath "def remove_comments...."
      add
      if not text: return text

      This prevents another error.

      That biscotti looks good. There's one funny-bit in the import (a <p> tag shows up in the ingredients), but it's functional. Fixing that last bit just means tweaking the ruleset for epicurious, so I'll go ahead and close this bug (a major one), understanding that there are still (and probably always will be) many more minor bugs to fix when it comes to tweaking the webpage scrapers!

      Tom

      Tom

       
    • Joel Swartz
      Joel Swartz
      2005-09-14

      Sorry, but it didn't work for me unless I pasted the fix wrong. 
      Also, should I be trying from the regular epicurios page or their "print recipe"=printer friendly page?  I've just tried both and neither worked for me.

      Assuming eventually that this gets working, and I have confidence it will, my next project would be to get my current Koch-suite/Postgresql recipe database imported, since gourmet has a few nicer features now, particualrly ability to add images.
      anyhow,  here's my console output now:

      joels@debian2:~/Documents/gourmet-0.8.5.12$ gourmet
      /usr/lib/python2.3/site-packages/gourmet/GourmetRecipeManager.py:1191: GtkWarning: gtk_progress_set_percentage: assertion `percentage >= 0 && percentage <= 1.0' failed
        self.prog.set_fraction(prog)
      /usr/lib/python2.3/site-packages/gourmet/reccard.py:1653: GtkWarning: gtk_combo_box_entry_set_text_column: assertion `entry_box->priv->text_column == -1' failed
        self.keyBox.set_text_column(0)
      Called hide_progress_dialog!
      Escaping Error retrieving http://www.epicurious.com/recipes/recipe_views/printer_friendly/102709.
      Traceback (most recent call last):
        File "/usr/lib/python2.3/site-packages/gourmet/GourmetRecipeManager.py", line 907, in import_webpageg
          self.run_import(i,url,display_errors=False)
        File "/usr/lib/python2.3/site-packages/gourmet/GourmetRecipeManager.py", line 1100, in run_import
          t.start()
        File "/usr/lib/python2.3/site-packages/gourmet/GourmetFauxThreads.py", line 38, in start
          self.target_func()
        File "/usr/lib/python2.3/site-packages/gourmet/GourmetFauxThreads.py", line 43, in target_func
          GourmetThreads.SuspendableThread.target_func(self)
        File "/usr/lib/python2.3/site-packages/gourmet/GourmetThreads.py", line 36, in target_func
          self.c.run()
        File "/usr/lib/python2.3/site-packages/gourmet/importers/html_importer.py", line 344, in run
          self.d = scrape_url(self.url, progress=self.prog)
        File "/usr/lib/python2.3/site-packages/gourmet/importers/html_importer.py", line 272, in scrape_url
          return bss.scrape_url(url,progress=progress)
        File "/usr/lib/python2.3/site-packages/gourmet/importers/html_importer.py", line 84, in scrape_url
          return self.scrape()
        File "/usr/lib/python2.3/site-packages/gourmet/importers/html_importer.py", line 94, in scrape
          self.apply_rule(rule)
        File "/usr/lib/python2.3/site-packages/gourmet/importers/html_importer.py", line 105, in apply_rule
          self.store_tag(store_as,tag,retmethod,post_processing)
        File "/usr/lib/python2.3/site-packages/gourmet/importers/html_importer.py", line 217, in store_tag
          val=self.post_process(post_processing, val, tag)
        File "/usr/lib/python2.3/site-packages/gourmet/importers/html_importer.py", line 129, in post_process
          return post_processing(value,tag)
        File "/usr/lib/python2.3/site-packages/gourmet/importers/html_plugins/html_helpers.py", line 45, in __call__
          items = container.contents
      AttributeError: 'NoneType' object has no attribute 'contents'