Menu

#205 scraping anime series in folders?

1.0
closed
Rob
Suggestion (3)
2019-06-12
2019-06-09
No

Been fighting with this for a bit with no easy solution. Currently testing 3.7.3.1. I have backed up my anime collection to ISO's; each series in it's own folder ex.

Legend of the Mystical Ninja (Ganbare Goemon) (1997-1998)

and files such as:
Legend of the Mystical Ninja (Ganbare Goemon) (1997-1998)-V1.iso
Legend of the Mystical Ninja (Ganbare Goemon) (1997-1998)-V2.iso

so the following issues are exposed:
1) Most of these (90%+) cannot be found on tmdb or imdb. Would love to see if we could get the information from AnimeNewsNetwork (best infomration) or anidb.net.
2) Since these are not individual episodes but iso's, with multiple episodes on them the creation of a .nfo for each media file is very excesive (since there is no way to have episodic data in this case). For example Bleach I have ~80 ISO's which would be 80 .nfo's all with the same data. I tried the 'use folder names for scraping' and 'all movies are in folders' (and combinations thereof) but no seeming effect.
3) perhaps break out a tab for anime (away from movie; home movie; tv) since there are enough complexities that do not overlap in scraping so we can create a profile for just those directories that are anime and not waste time trying to scrape against sites that have no information?
4) Is there a better way to do any of the above to aid MC in handling this?

Discussion

  • Steve Costaras

    Steve Costaras - 2019-06-09

    Thanks for the link on the file naming, looking at that, I do not see an example of multiple episodes per file however. Is there a refernce or example as to how that needs to be set up (say if I had 5-6 episodes per file, would it be something like:

    {name}-S01E01-S01E02-S01E03-S01E04-S01-E05.iso

    or
    {name}-S01E01-E05.iso

    for a range of episodes?

     
  • Jean-Michel KIRSCH

    It will be : {name}-S01E01-S01E02-S01E03-S01E04-S01E05.iso
    or {name}-S01E01E02E03E04E05.iso

    And for the folder, it should be named exactly as it is named either in TheTVDB or TMDB depending of which one you are using.

     

    Last edit: Jean-Michel KIRSCH 2019-06-09
  • Steve Costaras

    Steve Costaras - 2019-06-09

    Thanks. I am doing some lookups from both as tests. This kind of helps and will see if I can get some usable data from it. I am noticing that dates are not consistent between the platforms for what is the original air dates for some that I've been testing. As well as some scraper issues (failues/error logs which /may/ be due to haivng the titles set as:

    {$english_name} ({$original_language_name}) ($year) [$dvd_distributor]-{$volume_disc_info}.iso

    However the bigger item is handling of show titles with 'illegal' characters from a filesytem point of view for example ".hack//GU Trilogy" ; Names that are too long (beyond folder limit); Or the differnece between the anime; and an OVA for the same series.

    I did notice that AnimeNewsNetwork now does have an API interface for scraping:
    https://www.animenewsnetwork.com/encyclopedia/api.php

    example for .hack//GU above
    https://cdn.animenewsnetwork.com/encyclopedia/api.xml?anime=8719

    so far I've found that ANN seems to do a better job in finding matches for series (especially older or not as popular ones in my limited testing so far). As well as finding series or set links (sequal/prequal, etc.) How much work would it be to add this site as a scraper to pull data? Perhaps even to use multiple (ann -> tmdb -> imdb, etc)?

     
  • Rob

    Rob - 2019-06-10

    For ISO with multi Episode I recommend
    S01E01E02E03E04 etc .iso

    {$english_name} ({$original_language_name}) ($year) [$dvd_distributor]-{$volume_disc_info}.iso
    

    Won't work as not possible to determine what season or episodes within the iso.

    As for illegal characters, Staying with just S01E01E2.... without the series title in the name will work.

    No plans to add any other API for scraping.

     
  • Rob

    Rob - 2019-06-12
    • status: open --> closed
    • assigned_to: Rob
     
  • Rob

    Rob - 2019-06-12

    Closing as not an issue of Media Companion.

     

Log in to post a comment.

MongoDB Logo MongoDB