Menu

#203 Exception while scraping episodes from TMDB

1.0
closed
Rob
None
2019-07-19
2019-06-02
No

I'm looking to move from TVDB to TMDB for series scraping due to some limitations on their site (no hentai, no first air date for recent shows, ...).

I have made a test folder with the last series I have added (8 series totalizing 49 episodes).

While scraping the episodes, after correctly scraping 9 of them, I got the following error :

With File : X:\V-Series-TMDB - English\Good Omens\S01 - EN - STFR\Good.Omens.S01E05.VOSTFR.WEBRiP.XviD-EXTREME.avi
Detected  : Season : 01 Episode : 05
Using Settings: TMDBId: 71915 SortOrder: default Language: en Actor Source: tmdb
Scraping body of episode: 5 - OK
Scraping actors from TMDB
**WARNING: No Episode TMDB Id, Actors not able to be scraped from TMDB**
Saving episode
**Error - Running action [Scrape found Episodes] threw [La référence d'objet n'est pas définie à une instance d'un objet.]**

I then restart the episodes scraping and it start by correctly scraping this episode :

With File : X:\V-Series-TMDB - English\Good Omens\S01 - EN - STFR\Good.Omens.S01E05.VOSTFR.WEBRiP.XviD-EXTREME.avi
Detected  : Season : 01 Episode : 05
Using Settings: TMDBId: 71915 SortOrder: default Language: en Actor Source: tmdb
Scraping body of episode: 5 - OK
Scraping actors from TMDB
*WARNING: Actors not available to scraped from TMDB*
Saving episode
Episode Thumb downloaded
Renamed to: X:\V-Series-TMDB - English\Good Omens\S01 - EN - STFR\Good Omens - S01E05 - The Doomsday Option.avi
            X:\V-Series-TMDB - English\Good Omens\S01 - EN - STFR\Good Omens - S01E05 - The Doomsday Option.nfo
            X:\V-Series-TMDB - English\Good Omens\S01 - EN - STFR\Good Omens - S01E05 - The Doomsday Option-thumb.jpg

(It must be noted that despite the warning : "WARNING: Actors not available to scraped from TMDB", it seems that the actors are correctly scraped)

and continues to correctly scraping for more than 20 episodes before stopping again with the same exception on another episode :

With File : X:\V-Series-TMDB - English\When They See Us\S01 - FR\When.They.See.Us.S01E02.FRENCH.NF.WEB-DL.XviD-EXTREME.avi
Detected  : Season : 01 Episode : 02
Using Settings: TMDBId: 81355 SortOrder: default Language: en Actor Source: tmdb
Scraping body of episode: 2 - OK
Scraping actors from TMDB
**WARNING: No Episode TMDB Id, Actors not able to be scraped from TMDB**
Saving episode
**Error - Running action [Scrape found Episodes] threw [La référence d'objet n'est pas définie à une instance d'un objet.]**

After restarting again, it correctly finish the episodes scraping, including this one.

I have attached my settings file.

1 Attachments

Discussion

1 2 > >> (Page 1 of 2)
  • Jean-Michel KIRSCH

    From the TMDB API FAQ :

    Are there limitations on the number of requests?
    We currently rate limit requests to 40 requests every 10 seconds. You can inspect the status of your limits by looking at the X-RateLimit response headers.
    

    My personnal opinion is that we are running too fast.

     
  • Rob

    Rob - 2019-06-03

    That timing comment makes sense. I will look into this very soon. Good Catch.

     
  • Jean-Michel KIRSCH

    Any news about this problem ?

     
  • Rob

    Rob - 2019-06-25

    I have not had time to come back to this yet. But have been looking into it.

     
  • Anthony Bennett

    Anthony Bennett - 2019-07-05

    I think I have just started hitting the same issue:

    ---Using MC TVDB api V2 Scraper---
    
    Scanned "1" Shows.
    Scanned "5" folders (includes Show and subfolders).
    Found: 7 files to scrape.
    
    Pre-Populating found episodes with Series info
    With File : \\antmicroserver\Videos\TV Shows\Stranger Things\Specials - Beyond Stranger Things\Stranger Things - s00e01 - Beyond Stranger Things Mind Blown.avi
    Detected  : Season : 00 Episode : 01
    Using Settings: TMDBId: 66732 SortOrder: default Language: en Actor Source: tmdb
    Scraping body of episode: 1 - OK
    Scraping actors from TMDB
    WARNING: No Episode TMDB Id, Actors not able to be scraped from TMDB
    Saving episode
    Error - Running action [Scrape found Episodes] threw [Object reference not set to an instance of an object.]
    
     
  • Rob

    Rob - 2019-07-07

    Hi Anthony

    No that is a different issue, that I have now caught as well.

    Next release should fix the rate limit being hit when scraping from TMDb.

     
    👍
    1
  • Jean-Michel KIRSCH

    I checked the 3.7.3.4 version.
    I didn't have anymore the same setup than for the previous test. So, I run it on a folder that contains 11 French series.

    First, I run the "Check roots" command to add the series.
    It correctly add the 9 first series but failed to add the last 2. The log file is attached (file heckroots.txt).

    I then tried to use the TVShow Selector to fix these 2 series. I got a red message saying that the "La Noiraude" show is not available in French (screenshot selector.png). Hitting the search button didn't change anything. Hitting the "Scrape sgow with selected options" give me a message that I need to select an existing show.

    I then deleted these 2 shows from MC and rerun the "check roots" command. They were now added correctly.

    Now, I run the "Search for new episodes" command. It runs slowly (there are 750 episodes) without error messages. All the NFO files seems to be correctly created, but the left pane is a big mess (screenshot leftpane.png). I then close and restart MC. The left pane remains in a big mess.. I try to run a "Refresh All TVShows" but the left pane remain in a big mess.

    I have also attached the configuration file.

     
  • Jean-Michel KIRSCH

    To try to help, I have checked for differences between an episode NFO generated from TMDB and an episode NFO generated from TVDB (Sorry, not the same show and episode).

    First difference, the "uniqueid" field.
    On a TVDB show, there is just one line of type "tvdb" and with default set to "true" :
    <uniqueid type="tvdb" default="true">4099506</uniqueid>
    On a TMDB show, there are 2 lines. The first of type "tmdb" with a value. The second of type "tvdb" , with default set to "true" and without value :
    <uniqueid type="tmdb">1668183</uniqueid>
    <uniqueid type="tvdb" default="true">
    </uniqueid>
    Should the second line really be present ?

    Second difference, the "showid" field.
    On a TVDB show, this field contains the TVDBId of the show :
    <showid>248741</showid>
    On a TMDB show, this field is empty :
    <showid>
    </showid>
    Shouldn't we have the TMDBId of the show here ?

    Third and last difference, the "tmdbid" field.
    On a TVDB show, this field is empty :
    <tmdbid>
    </tmdbid>
    On a TMDB show, this field contains the TMDBId of the episode :
    <tmdbid>1668183</tmdbid>

     
  • Rob

    Rob - 2019-07-08

    Can you please post up screenshot of the filenames of the episodes for Trois Femmes flics

     
    • Jean-Michel KIRSCH

      Here they are...

      By the way, to complete my previous message, it seems that the showid is sometimes present in the NFO files. Perhaps, for the first episodes that have been scraped.

       

      Last edit: Jean-Michel KIRSCH 2019-07-08
  • Rob

    Rob - 2019-07-09

    I can not reproduce this issue.

    Can I assume doing a Refresh all, or right-click the series and selecting Refresh this show from .nfo removes the duplicat entries?

    I am at a loss as to why your experience is causing duplicates during scraping.

     
    • Jean-Michel KIRSCH

      No changes with a "Refresh from NFO.

      The exact content of the left pane is :

      • The only episodes that are listed at their correct location (before and after the refresh) are the ones who contains a value for the field "showid" in the NFO (and in tvcache.xml). They are the episodes coming from the series "Double Je" (6 episodes), "Eternelle" (6 episodes) and "La Noiraude" (30 episodes).
      • All the other episodes listed are stacked on the last show of the list ("Trois femmes flics" in this case).
      • On all the other series there are only empty seasons folders.

      It seems to me that the problem is relates to the fact that the "showid" field of all these episodes is not stored in the NFO file and in tvcache.xml file. But what I don't know is if this is the cause of the problem or if this is already a consequence of something else.

      I'm on a very fast computer with 16GB of RAM so it can eventually be a speed problem.

      I'm available to help you tracking this issue.
      Which tool do you use to compile MC ? Is there a free version of it ?
      In which part of the code are you dong the TMDB scraping ?

       
  • Jean-Michel KIRSCH

    I just take a look at the code you use to implement the rate limitation. I quote :

            if (ResponseHeaders.ContainsKey("X-RateLimit-Remaining"))
            {
                int limitval = Convert.ToInt16(ResponseHeaders["X-RateLimit-Remaining"]);
                if (limitval < 20)
                {
                    System.Threading.Thread.Sleep(250);
                }
            }
    

    This code will not work in every case.

    Here is an example of case where this will not work :
    You have the right to do 40 API requests per 10 seconds period.
    Suppose you do 20 requests during the first second of the period. You will only have the right to do 20 requests during the next 9 seconds (the reminder of the 10 seconds period). With your implementation, you will do them in 5 seconds (20 x 250 milliseconds) and the 21st request will be rejected.

    If you want to implement the limitation by a simple delay, you need toimplement it by doing
    System.Threading.Thread.Sleep(250);
    for every API request.This will not be optimised, but it will work in regard of the limitation.

    There is an interesting thread on TMDB regarding the implementation of the limitation here : https://www.themoviedb.org/talk/56648a6b9251412d7b008e62
    They refer to this generic article : http://www.jackleitch.net/2010/10/better-rate-limiting-with-dot-net/

     
  • Rob

    Rob - 2019-07-10

    Thank you, I could find that articleagain about better rate-limiting.

    We use Visual Studio (specifically I use Visual Studio 2015, and also include Visual Basic and C# when installing.

    As for the code, it is all over the place, but TMDb scraping is through the WATTmdb module.

    But I think you are right, the ID value has not been assigned. And that is someting separate from TMDb.

    I'll look into that over the next couple of days.

     
  • Rob

    Rob - 2019-07-13

    OK, so looking at my previous code, it was flawed.

    I have since re-worked and caught the 429 limit exceeded when we complete over 40 calls in 10 seconds.

    Can you give this test build a try please.

    As for your other issue regarding the Id's, I have also tweaked this code so if the series is scraped from TMDb, it is the default.
    and yes, we keep all the Id's that are present for the nfo, as does Kodi.

    Let me know if this test build improves things scraping from TMDb.

     
  • Jean-Michel KIRSCH

    There are 2 small ameliorations with this test build :

    • The "Check Roots for New TV Shows" is now adding them all in one run.
    • All the episode titles are now found (With the previous build,at least the title of the 1st episode of "La Noiraude" was missing).

    The "unique id" of type TMDB is now correctly set as default, so I'm correctly running the test version.

    But, there are no ameliorations concerning the "showid" value and the left pane.

    I have got the Visual Studio Community version. I will install it during the week-end and, if possible, look if I see something.
    EDIT : Visual Studio Community 2019 is now installed on my computer and is able to open your project. I will look inside later today.

     

    Last edit: Jean-Michel KIRSCH 2019-07-13
  • Jean-Michel KIRSCH

    First part of the problem :
    I thought that replacing line 1771 of Classes/TVScraper.vb who read :

                                singleepisode.ShowId.Value = tvdbid
    

    by :

                                If scrapersource = "tmdb" Then
                                        singleepisode.ShowId.Value = tmdbid
                                 Else
                                        singleepisode.ShowId.Value = tvdbid
                                  End If
    

    would be sufficient to correctly save the "showid".
    This way, the "showid" is correctly store in the episodes NFO files, but not in the tvcache.xml.
    Perhaps that, when we are executing this code, the tvcache.xml has already been written during the execution of epçGet.

    I'm now trying to update the field before calling ep_Get by replacing the lines :

                        If scrapersource = "tmdb" Then
                            actorsource = "tmdb"
                            ReportProgress(, "Using Settings: TMDBId: " & tmdbid & " SortOrder: " & sortorder & " Language: " & language & " Actor Source: " & actorsource & vbCrLf)
                        Else
                            ReportProgress(, "Using Settings: TVdbID: " & tvdbid & " SortOrder: " & sortorder & " Language: " & language & " Actor Source: " & actorsource & vbCrLf)
                        End If
    

    by :

                        If scrapersource = "tmdb" Then
                            actorsource = "tmdb"
                            singleepisode.ShowId.Value = tmdbid
                            ReportProgress(, "Using Settings: TMDBId: " & tmdbid & " SortOrder: " & sortorder & " Language: " & language & " Actor Source: " & actorsource & vbCrLf)
                        Else
                            singleepisode.ShowId.Value = tvdbid
                            ReportProgress(, "Using Settings: TVdbID: " & tvdbid & " SortOrder: " & sortorder & " Language: " & language & " Actor Source: " & actorsource & vbCrLf)
                        End If
    

    The result is exactly the same. "showid" updated in NFO files but not in tvcache.xml.

    Where is written this tvcache.xml file ?

    There may be another place to change for files with multiple episodes, I haven't checked.

    The problem is not completely solved with this. I still need to verify how the Id comparison is done when the tree is construct.

     

    Last edit: Jean-Michel KIRSCH 2019-07-13
  • Jean-Michel KIRSCH

    I think that the reminder of the problem is in the AttachEpisodes function of NfoLibrary/TV/TvCache.vb.
    The code read :

        Private Sub AttachEpisodes()
            For Each Show As TvShow In Shows
                Dim showID = Show.TvdbId.Value
                Dim EpisodeList = Episodes.Where(Function(ele As TvEpisode) ele.ShowId.Value = showID)
                For Each episode In EpisodeList
                    Show.AddEpisode(episode)
                Next
            Next
        End Sub
    

    I think that, to create the EpisodeList, we need to compare the ShowId of the episode either with the TvdbId or the TmdbId of the show and not always with the TvdbId.
    "ScrapeFrom" is available in the "Show" structure, but I don't know its format.

     

    Last edit: Jean-Michel KIRSCH 2019-07-13
  • Rob

    Rob - 2019-07-14

    There is a lot of redundant code that I do not want to remove at this time, but to answer a couple of questions.
    Where you added

    If scrapersource = "tmdb" Then
        singleepisode.ShowId.Value = tmdbid
    Else
        singleepisode.ShowId.Value = tvdbid
    End If
    

    is sufficient for both a single episode as well as multi-episodes to set the showId to the TMDb value.

    TV Cache is saved at Classes/TvScraper Tv_CacheSaveRefresh

    But I see there is a lot of work to clean up the ID depending on which site is being scraped from.

    It's more involved than just in the TvScraper class. This will take some time to fix.
    (the cavets of adding a different scraper and Id Values not aligning.)

     
  • Jean-Michel KIRSCH

    I finally got it working.

    Reminder :
    With the last official version as well as with the test version you give me in this thread, for shows scraped from TMDB, only the series having a TvdbId in Tmdb are correctly displayed in the left pane. All other episodes are stacked at the end of the list.
    It appears that there are mismatch between Ids in the code.

    Here is the summary of the minimal modifications needed :

    In file Classes/TVScraper.vb :

    The line 1771 who read :

    singleepisode.ShowId.Value = tvdbid
    

    need to be removed

    The lines :

    If scrapersource = "tmdb" Then
        actorsource = "tmdb"
        ReportProgress(, "Using Settings: TMDBId: " & tmdbid & " SortOrder: " & sortorder & " Language: " & language & " Actor Source: " & actorsource & vbCrLf)
    Else
        ReportProgress(, "Using Settings: TVdbID: " & tvdbid & " SortOrder: " & sortorder & " Language: " & language & " Actor Source: " & actorsource & vbCrLf)
    End If
    

    need to be replaced by :

    If scrapersource = "tmdb" Then
        actorsource = "tmdb"
        singleepisode.ShowId.Value = tmdbid
        ReportProgress(, "Using Settings: TMDBId: " & tmdbid & " SortOrder: " & sortorder & " Language: " & language & " Actor Source: " & actorsource & vbCrLf)
    Else
        singleepisode.ShowId.Value = tvdbid
        ReportProgress(, "Using Settings: TVdbID: " & tvdbid & " SortOrder: " & sortorder & " Language: " & language & " Actor Source: " & actorsource & vbCrLf)
    End If
    

    In file NfoLibrary/Tasks/ScrapeEpisodeTask.vb :

    The line 438 who read :

    singleepisode.ShowId.Value = tvdbid
    

    need to be removed

    The lines :

    If scrapersource = "tmdb" Then
        actorsource = "tmdb"
        'ReportProgress(, "Using Settings: TMDBId: " & tmdbid & " SortOrder: " & sortorder & " Language: " & language & " Actor Source: " & actorsource & vbCrLf)
    Else
        'ReportProgress(, "Using Settings: TVdbID: " & tvdbid & " SortOrder: " & sortorder & " Language: " & language & " Actor Source: " & actorsource & vbCrLf)
    End If
    

    need to be replaced by :

    If scrapersource = "tmdb" Then
        actorsource = "tmdb"
        singleepisode.ShowId.Value = tmdbid
        'ReportProgress(, "Using Settings: TMDBId: " & tmdbid & " SortOrder: " & sortorder & " Language: " & language & " Actor Source: " & actorsource & vbCrLf)
    Else
        singleepisode.ShowId.Value = tvdbid
        'ReportProgress(, "Using Settings: TVdbID: " & tvdbid & " SortOrder: " & sortorder & " Language: " & language & " Actor Source: " & actorsource & vbCrLf)
    End If
    

    In file NfoLibrary/TV/TvCache.vb :

    The function AttachEpisodes need to be modified from :

    Private Sub AttachEpisodes()
        For Each Show As TvShow In Shows
            Dim showID = Show.TvdbId.Value
            Dim EpisodeList = Episodes.Where(Function(ele As TvEpisode) ele.ShowId.Value = showID)
            For Each episode In EpisodeList
                Show.AddEpisode(episode)
            Next
        Next
    End Sub
    

    to :

    Private Sub AttachEpisodes()
        For Each Show As TvShow In Shows
            Dim showID
            If Show.ScrapeFrom.Value = "tmdb"
                showID = Show.TmdbId.Value
            Else
                showID = Show.TvdbId.Value
            End If
            Dim EpisodeList = Episodes.Where(Function(ele As TvEpisode) ele.ShowId.Value = showID)
            For Each episode In EpisodeList
                Show.AddEpisode(episode)
            Next
        Next
    End Sub
    

    In file NfoLibrary/TV/TvEpisode.vb :

    The Set function of the ShowObj property who read :

    Set(ByVal value As TvShow)
        _showObj = value
        ShowId.Value = value.TvdbId.Value
        TvdbId.Value = value.TvdbId.Value
        'ImdbId.Value = value.ImdbId.Value   'don't associate series IMDBId with Episode IMDBId.  Bad boo boo from commit #662
    End Set
    

    need to be modified to :

    Set(ByVal value As TvShow)
        _showObj = value
        If value.ScrapeFrom.Value = "tmdb" Then
            ShowId.Value = value.TmdbId.Value
        Else
            ShowId.Value = value.TvdbId.Value
        End If
        TvdbId.Value = value.TvdbId.Value
        'ImdbId.Value = value.ImdbId.Value   'don't associate series IMDBId with Episode IMDBId.  Bad boo boo from commit #662
    End Set
    

    The line 327 who read :

    Me.UniqueId.Value = TMDBEpisode..externalids.tvdb_id
    

    need to be replaced by :

    Me.UniqueId.Value = TMDBEpisode..externalids.id
    

    In file NfoLibrary/TV/TvShow.vb :

    The line 468 who read :

    Episode.ShowId.Value = Me.TvdbId.Value
    

    need to be removed
    (the ShowId value is already set by the previous line)

    The line 464 who read :

    CurrentSeason.ShowId.Value = Me.TvdbId.Value
    

    need to be changed to :

    If Me.ScrapeFrom.Value = "tmdb" Then
        CurrentSeason.ShowId.Value = Me.TmdbId.Value
    Else
        CurrentSeason.ShowId.Value = Me.TvdbId.Value
    End If
    

    I'm still doing some tests and I will confirm you later today that it is working.

     

    Last edit: Jean-Michel KIRSCH 2019-07-15
  • Rob

    Rob - 2019-07-16

    Cool, let me know if the above is all that is required, and I'll incorporate the changes and post to you another test build.

     
  • Jean-Michel KIRSCH

    After running tests with TMDB and TVDB, I can confirm that it is working in the 2 cases with these modifications.

     
  • Rob

    Rob - 2019-07-17

    OK, Did a quick test of this and it broke episodes successfully scraped from TMDb before these changes.
    I had scraped five seasons of the series '24", and on refresh no episodes were added to the series.
    I had to delete the nfo's manually (as MC couldn't see there were any) and rescrape.
    I did this with one season to start with, and it scraped and added the whole season to the series '24'

    Then I found it doesn't survive a refresh all. No episodes were attached to the series.

    There is another location where the series tvdbid is over-riding the changes.
    Classes/TvScraper/TVRebuildCaches...
    Line 3330
    From:

        For Each ep In episodelist
            ep.ShowId.Value = newtvshownfo.TvdbId.Value
            If Pref.displayMissingEpisodes Then
                Dim q = From x In fullepisodelist Where x.IsMissing = True AndAlso x.UniqueId.Value = ep.UniqueId.Value
                If Not q.Count = 0 Then fullepisodelist.Remove(q(0))
            End If
            fullepisodelist.Add(ep.Cachedata)
            If Cancelled Then Exit Sub
        Next
    

    change to:

        For Each ep In episodelist
            'ep.ShowId.Value = newtvshownfo.TvdbId.Value
            If newtvshownfo.ScrapeFrom.Value = "tmdb" Then
                ep.ShowId.Value = newtvshownfo.TmdbId.Value
            Else
                ep.ShowId.Value = newtvshownfo.TvdbId.Value
            End If
            If Pref.displayMissingEpisodes Then
                Dim q = From x In fullepisodelist Where x.IsMissing = True AndAlso x.UniqueId.Value = ep.UniqueId.Value
                If Not q.Count = 0 Then fullepisodelist.Remove(q(0))
            End If
            fullepisodelist.Add(ep.Cachedata)
            If Cancelled Then Exit Sub
        Next
    

    My old concern is that this supports episodes already scraped if they had TVDbID's already. Don't want to break this for other users in any way.

     
  • Rob

    Rob - 2019-07-17

    Test build attached with all above patches in place

     
  • Jean-Michel KIRSCH

    Sorry to have missed an occurence of the problem.

    About people who had already scrape series from TMDB, perhaps it will be possible to write a debug task to replace the showid value in the NFO files and tvcache.xml.
    Personnally, I had the "luck" to find the problem before having a big collection scraped from TMDB, but I understand that it's not the case for everybody.

    The algorythm of this Debug task could be :

    For all series source folders  with ScrapeFrom = "tmdb"
        For all series in each folder
            Read the tmdbid from the tvshow.nfo file
            Rewrite the uniqueid values so that the value of type tmdb is set as default
            For all episdes in each serie
                Override the current showid value with the serie tmdbid
            End For
        End For
    End For
    Rebuild the tvcache.xml from the NFO files (to update showid and uniqueid)
    

    I will start to test your new build this evening.

     

    Last edit: Jean-Michel KIRSCH 2019-07-17
1 2 > >> (Page 1 of 2)

Log in to post a comment.