#122 OFDB integration broken

None
closed-wont-fix
nobody
None
2
2013-02-13
2013-02-04
lsmod
No

The import of data from the OFDB does not work in many cases.
When you submit or edit a new dataset you always get a blank screen with no content.
Some of the OFDB datasets are not imported into the database.
Only the ID is left.

Here are some examples:
ofdb:205401
ofdb:107720
ofdb:128883-223279 -> Sometimes such ID's are left

When you try to import the same film from IMDB sometimes it works and sometimes not.

Sorry - Without a working integrtation of IMDB and OFDB the usability of videodb is very low.

Discussion

  • lsmod

    lsmod - 2013-02-05

    I am not really good in understanding PHP programming, but i found this:

    Example of URL for one of this ID's:
    http://www.ofdb.de/film/128883,Stirb-langsam-40

    In .../videodb/engines/ofdb.php i find in line 44
    $url = $ofdbServer.'/view.php?page=suchergebnis&SText='.urlencode($title);

    This should result in
    http://www.ofdb.de/view.php?page=suchergebnis&SText=Stirb+langsam+4.0&Kat=All
    That of course does not work.

    I see 2 problems here:
    1. Search is done by title and not by ID
    2. The function urlencode() works not like the way the URL is encoded by ofdb.de

    If i would have more understanding of the inner working of the flow of variables and parameters in the engines i maybe could solve the problem.
    But this is really not easy.
    I couldn't find which page should be invoked after edit.php also.
    I think apache has the same problem.

     
  • lsmod

    lsmod - 2013-02-05

    I must correct, because i understand a little bit more now.

    There are 2 additional functions:

    function ofdbContentUrl($id)
    return $ofdbServer.'/view.php?page=film&fid='.$id;
    This works with the result:
    http://www.ofdb.de/view.php?page=film&fid=128883

    function ofdbDetailUrl($id)
    return $ofdbServer.'/view.php?page=film_detail&fid='.$id;
    This does NOT work with the result:
    http://www.ofdb.de/view.php?page=film_detail&fid=128883
    Correct is now
    http://www.ofdb.de/plot/128883,245226,Stirb-langsam-40

    So a new function for the encoding of the URL is needed to generate the correct page URL for ofdb.de

     
  • lsmod

    lsmod - 2013-02-05

    And that's not all!

    Sometimes the second URL is working - that's the reason why about 20% of the content is missing!

    And the URL for the description is complete different.
    function ofdbDescriptionUrl($id, $sid)
    return $ofdbServer.'/view.php?page=inhalt&fid='.$id.'&sid='.$sid;
    This result in
    http://www.ofdb.de/view.php?page=inhalt&fid=128883
    but correct is
    http://www.ofdb.de/plot/128883,245226,Stirb-langsam-40
    I have no idea for this second ID.

     
  • lsmod

    lsmod - 2013-02-05

    There are timing problems with the OFDB when you access via HTML.

    So when you access via the httpClient you should have a longer timeout and maybe try to get the content for more than one time.

     
  • Andreas Goetz

    Andreas Goetz - 2013-02-07

    I do have an XML version semi-working. Unfortunately, the XML interface doesn't export all data. if anybody is interested to continue with the fixes let me know.

     
  • lsmod

    lsmod - 2013-02-07

    The question is What is "semi-working" ?
    When this interface imports all of the data that is supplied with the XML-interface it is better than the old solution! :-)
    Please supply it!

    The old solution imports in about 20% of all cases nothing.
    This is at least less information.

    What information is missing in XML and shall i ask the OFDB Team for it?
    Maybe they don't want to use this interface and we don't want to wake sleeping dogs ...

     
  • lsmod

    lsmod - 2013-02-09

    Hmm - what's about the new interface?

    I have edited the contributed script fetch_imdb_all.php so that it tries to fetch missing informations for the datasets that are not filled.
    But of course this also does not work - you can see that fetching data simply does not work.

     
  • Andreas Goetz

    Andreas Goetz - 2013-02-11

    Could you check if the current CVS version is working?

     
  • lsmod

    lsmod - 2013-02-11

    The search with the small search window does not give the ID to the edit window.

    Trying to update some video's result in new errors.
    Here are some examples:

    ofdb:164590
    Engine Error
    Engine does not properly return encoding

    ofdb:230171
    Engine Error
    Engine does not properly return encoding

    Going back to the previous version, because i can't add no new films via the search now ...

     
    Last edit: lsmod 2013-02-11
    • Andreas Goetz

      Andreas Goetz - 2013-02-13

      O.K. - One example:
      ofdb:225338
      gives the output:
      http://www.ofdbgw.org/movie/225338
      FALSE
      This links redirects to
      http://ofdbgw.geeksphere.de/movie/225338
      with this output:
      2Fehler oder Timeout bei OFDB Anfragemovie1.232013-01-220.0993
      Of course this has no result.

      That's what I said- http://www.ofdbgw.org/ is unfortunately unreliable.

      Correct is
      http://www.ofdb.de/xml_film.php?ID=225338
      Why you don't use this address?

      Have you looked at the result? It doesn't contain half the data you'd like to have...

      You want to use this address? Feel free- the code is there to change. I did actually implement this first but found the data so poor I looked for alternatives.

      Another example:
      http://www.ofdbgw.org/movie/2796
      SimpleXMLElement Object ( [status] => SimpleXMLElement Object ( [rcode] => 2 [rcodedesc] => Fehler oder Timeout bei OFDB Anfrage [modul] => movie [ofdbgwversion] => 1.23 [ofdbgwdate] => 2013-01-22 [verarbeitungszeit] => 0.091 ) )
      Engine Error
      Engine does not properly return encoding

      [rcodedesc] => Fehler oder Timeout bei OFDB Anfrag: Same problem as before

      When the "standard" xml-engine of videodb does not work sufficient, why not use an additional other one?
      I am not used to php, but i found this nice example:
      http://www.php.net/manual/en/simplexml.examples-basic.php

      You don't understand the problem. PHP is fine- OFDB doesn't have an XML data source that a) delivers all wanted data and b) is reliable. Feel free to talk to the OFDB guys.

      Closing as wontfix. I don't have the time to implement screenscraping for ever-changing websites. Anybody wants to pay me- feel free...

       
  • Andreas Goetz

    Andreas Goetz - 2013-02-12

    Basic problem is that ofdgw.org is unstable- it does only sometimes return data. I do not have the time/ will/ ressources to implement ofdb screenscraping myself and must therefore use the XML interface which works not reliably.
    CVS version should work now but you may have to reload times and again until OFDBGW doesnt return an error..

     
    • lsmod

      lsmod - 2013-02-13

      First let me thank you for your work and try to fix the problem!

      I do not have the time/ will/ ressources to implement ofdb screenscraping myself and must therefore use the XML interface which works not reliably.

      Has the XML-interface the same timeout problem?

      You have edited much code in ofdb.php but i can't see how the interface is working now?
      Where is the URL defined?
      I only find the assignment of the fields in line 172:
      foreach($xml->resultat->eintrag as $item)

      The error with the encoding is still present - where this come from?

      ofdb:225083
      Engine Error
      Engine does not properly return encoding

      My modified update of the datasets also stop at this errors.

      Is it possible to clean up the code of the engine so that it is understandable?
      Maybe i can help, but first i must gain a foothold.

       
      Last edit: lsmod 2013-02-13
  • Andreas Goetz

    Andreas Goetz - 2013-02-13

    Check for example

    function ofdbData($id):
    ...
    $url = $ofdbGW.'/movie/'.$id;

    dump($url);

    $resp   = httpClient($url, $cache);
    
    if (!$resp['success']) {
        $CLIENTERROR .= $resp['error']."\n";
        return(false);
    }
    
    $xml = load_xml($resp['data']);
    

    dump($xml);

    You can uncomment the dump() statements to see what's going on.

    The "Engine does not properly return encoding" is misleading and due to the fact that the OFDB gateway doesn't answer at all or wiht an error.

    Long story short:
    - OFDB doesn't provide a useful XML interface
    - alternative OFDB Gateway is unstable
    - old "screenscraping" code is still included (but commented out)
    - anybody wants to fix that -> have fun, I don't have the time...

     
  • lsmod

    lsmod - 2013-02-13

    O.K. thanks - this helps.
    One example:

    ofdb:225338

    gives the output:

    http://www.ofdbgw.org/movie/225338
    FALSE

    This link redirects to

    http://ofdbgw.geeksphere.de/movie/225338

    with this output:

    <ofdbgw><status><rcode>2</rcode><rcodedesc>Fehler oder Timeout bei OFDB Anfrage</rcodedesc><modul>movie</modul><ofdbgwversion>1.23</ofdbgwversion><ofdbgwdate>2013-01-22</ofdbgwdate><verarbeitungszeit>0.0993</verarbeitungszeit></status></ofdbgw>

    Of course this has no result.

    Correct is
    http://www.ofdb.de/xml_film.php?ID=225338

    Why you don't use this URL?

    Another example:

    http://www.ofdbgw.org/movie/2796
    SimpleXMLElement Object ( [status] => SimpleXMLElement Object ( [rcode] => 2 [rcodedesc] => Fehler oder Timeout bei OFDB Anfrage [modul] => movie [ofdbgwversion] => 1.23 [ofdbgwdate] => 2013-01-22 [verarbeitungszeit] => 0.091 ) )
    Engine Error
    Engine does not properly return encoding

    I just made an experiment:
    In line 300 of ofdb.php i altered
    $data['encoding'] = 'utf-8';
    to 'iso-8859-1'
    Afte this the error disappeared, but i have the wrong encoding for Umlaute.

    The error-message "Engine Error" comes from engines.php line 93
    // make sure all engines properly return the encoding type
    if (empty($result['encoding'])) errorpage('Engine Error', 'Engine does not properly return encoding');

    I am not sure what's going on here, but somehow the convertation to utf-8 fails or the encoding is not declared correctly for the engine.
    Why the result is empty?

    So i turned back your line to 'utf-8' and commented out the line in engines.php.
    This results in no error and it works!

    When the "standard" xml-engine of videodb does not work sufficient, why not use an additional other one?

    I am not used to php, but i found this nice example:
    http://www.php.net/manual/en/simplexml.examples-basic.php

     
    Last edit: lsmod 2013-02-13
  • Andreas Goetz

    Andreas Goetz - 2013-02-13
    • status: open --> closed-wont-fix
    • milestone: -->
     
  • lsmod

    lsmod - 2013-02-13

    That's what I said- http://www.ofdbgw.org/ is unfortunately unreliable.

    Yes - and that's the reason why another working solution should be used additional or alternating.

    Have you looked at the result? It doesn't contain half the data you'd like to have...

    But when the other URL is not working you get the important data here!

    You want to use this address? Feel free- the code is there to change. I did actually implement this first but found the data so poor I looked for alternatives.

    It's a really good idea to look for the other URL!
    But when it is not working another data-source should be taken.
    Maybe i can edit the code to get the other URL working.
    The best solution would be to have an automatic fallback to the basic information when the other one fails.
    I think i don't get this running in videodb. :-(

    You don't understand the problem. PHP is fine- OFDB doesn't have an XML data source that a) delivers all wanted data and b) is reliable. Feel free to talk to the OFDB guys.

    I will try to contact the OFDB guys.
    But it's senseless when a solution will not be implemented. ;-)

    I don't have the time to implement screenscraping for ever-changing websites. Anybody wants to pay me- feel free...

    I understand this!
    Screenscraping is really not the solution here!
    But an intelligent usage of the XML-interfaces would be a very good alternative.
    Please let us take some time to fix the problem.
    I will help you if possible.

    Closing as wontfix.

    Closing a bug will not solve it.

     
    Last edit: lsmod 2013-02-13

Log in to post a comment.