Several questions about MediaInfo

Help
c1tru55
2014-04-03
2014-05-02
  • c1tru55
    c1tru55
    2014-04-03

    Hi all.

    I use MI in my application and I have several questions about it. I would be very grateful if someone could answer all of them.

    1) When I save codecIDs from mediainfo__AllInclusive.7z/MediaInfoLib/Source/Resource/Text/DataBase/CodecID- files to my database using this table structure:
    - codec
    - format
    - hint
    - description
    - url
    - profile
    - version
    - color_space
    - chroma_subsampling
    - bit_depth
    - compression_mode
    - kind_of_stream (taken as middle part of CodecID_%s_%s.csv file name, e.g. Audio, Text, Video)
    - format_type (taken as last part of CodecID_%s_%s.csv file name, e.g. Matroska, Mpeg4, Real, Riff)
    I found that there are a lot of duplicate records. I can't even define the unique key, because some records differ from each other only letter-case of one field (e.g. avc1-Video-Riff, h263-Video-Mpeg4)! As a result I can't refer to this table in my database. Is it a bug/feature?

    2) I notice that formats from mediainfo_*_AllInclusive.7z/MediaInfoLib/Source/Resource/Text/DataBase/Format.csv file have unique field name and I can save data from it in my database. And when I get stream format name using MI.dll, I can get all other format fields (e.g. Format/String, Format/Info, Format/Url, Format/Extensions) using format table (that were generated using Format.csv file) and I don't need to fetch all other format fields, since I can find this information in my format table. But is it possible to do the same trick with CodecID field? I mean to fetch other CodecID fields (e.g. CodecID/Hint, CodecID/Info, CodecID/String, CodecID/Url) using CodecID name? This question is directly related to first question.

    3) Do I understand correctly, that file mediainfo_*_AllInclusive.7z/MediaInfoLib/Source/Resource/Text/DataBase/Codec.csv is deprecated and should not be used?

    4) Since it is usually easy to convert field raw value (e.g. BitRate) to human readable value (e.g. BitRate/String) and does not always vice versa - I save only MI raw values in my application. But I have some doubt about next fields, because I don't know how to convert from raw value to human readable and vice versa, and as a result I need to store both values:
    - CodecID (maybe related to 1st and 2nd questions)
    - PixelAspectRatio (string value doesn't always the same as float)
    - DisplayAspectRatio (string value doesn't always the same as float)
    - ChannelPositions (it is not clear how to convert ChannelPositions to ChannelPositions/String2 and vice versa)
    So question is: is it possible to store only one format of these fields and convert it to any other formats (using my custom code) without loosing information?

    5) Do files exist, that contain more than one General stream? More then one Video stream?

    6) What does KindOfFormat field with 'M' value in mediainfo_*_AllInclusive.7z/MediaInfoLib/Source/Resource/Text/DataBase/Format.csv file mean? (Multi)Media containers? Does it mean that file with these extensions can contain video, audio, etc?

    7) Question about extensions field of mediainfo_*_AllInclusive.7z/MediaInfoLib/Source/Resource/Text/DataBase/Format.csv . Is it true (or not) that MediaInfo support only (and no other) files with these extensions?

    Many thanks in advance!

     
    Last edit: c1tru55 2014-04-03
  • Ouch! Lot of questions. I answer quickly (I don't have the time to answer in details with free support)

    1) When I save codecIDs from mediainfo__AllInclusive.7z/MediaInfoLib/Source/Resource/Text/DataBase/CodecID- files to my database using this table structure:
    - codec

    Deprecated, don't use it in new applications

    • hint

    Not for applications, it is only an hint about a CodecID (e.g. "DivX" for MPEG-4 Visual streams with CodecID "DX50")

    • description
    • url

    Redondant, not to be included in a database (you can use the CSV in another table if you want to isplay it to your users)

    • format_type (taken as last part of CodecID_%s_%s.csv file name, e.g. Matroska, Mpeg4, Real, Riff)

    I don't see the reason you create such field. Use the container format instead.

    I found that there are a lot of duplicate records. I can't even define the unique key, because some records differ from each other only letter-case of one field (e.g. avc1-Video-Riff, h263-Video-Mpeg4)! As a result I can't refer to this table in my database. Is it a bug/feature?

    Historical reasons.
    I need to do some clean up.
    as a general rule, avoid:
    - deprecated fields (you can see deprecated fields in the CSV, "Deprecated" in the comments)
    - fields with "/String" at the end (only human readable versions of non "/String" fields)

    2) (...) But is it possible to do the same trick with CodecID field?

    Yes

    3) Do I understand correctly, that file mediainfo_*_AllInclusive.7z/MediaInfoLib/Source/Resource/Text/DataBase/Codec.csv is deprecated and should not be used?

    Right. Used only for deprecated fields.

    4) (...) I save only MI raw values in my application.

    You do well

    • PixelAspectRatio (string value doesn't always the same as float)

    Curently, PixelAspectRatio/String is empty (planned for know values like "10:11"...)

    • DisplayAspectRatio (string value doesn't always the same as float)

    DisplayAspectRatio/String is ony for displaying known values (e.g. "16:9" in case it is 1.770 or 1.780, the precise value is the non "/string" version)

    • ChannelPositions (it is not clear how to convert ChannelPositions to ChannelPositions/String2 and vice versa)

    Keep ChannelPositions. The "/String" version is only a different way to display same data (you can do a mapping from ChannelPositions to ChannelPositions/String2, but this is not a bijection e.g. in theory "Front: L C R" and "Front : LC C RC" wich are a bit different are both mapped to "3/0/0")

    So question is: is it possible to store only one format of these fields and convert it to any other formats (using my custom code) without loosing information?

    Yes.

    5) Do files exist, that contain more than one General stream?

    Currently: no. It may change in the future if I need it

    More then one Video stream?

    Yes. Any container can handle more tha 1 video stream in theory. In reality, you can have e.g. MPEG-TS with several programs (TV channels) in the same multiplex, you have 1 video stream per program so several video streams.

    6) What does KindOfFormat field with 'M' value in mediainfo_*_AllInclusive.7z/MediaInfoLib/Source/Resource/Text/DataBase/Format.csv file mean? (Multi)Media containers?

    Multimedia or Multiple. M means you can have more tha 1 stream kind with this format (e.g. video+audio)
    But V does not always mean that it is pure video. e.g. H264 can contain a private ancilalry data with captions. V means that the main purpose of this format is video.

    7) Question about extensions field of mediainfo_*_AllInclusive.7z/MediaInfoLib/Source/Resource/Text/DataBase/Format.csv . Is it true (or not) that MediaInfo support only (and no other) files with these extensions?

    File extensions list is only a list of known file extensions for this format. Not exhaustive.
    I try to maintain an exhaustive list of known formats in this CSV, but there is no warranty.


    I hope this helps a bit. I am fully aware that all is not coherent, there is a lot of work to do on the coherency and a better output, but I currently lack of time (and/or people interested in helping me money for paying people on reviewing it), having a better output is planned but is currently not the priority of my main sponsors.

     
  • c1tru55
    c1tru55
    2014-04-04

    Hi Jerome,

    thanks for the quick reply, you helped me a lot! But I have a few clarifying questions:

    When I save codecIDs from mediainfo__AllInclusive.7z/MediaInfoLib/Source/Resource/Text/DataBase/CodecID- files to my database using this table structure:
    - codec

    Deprecated, don't use it in new applications

    maybe you don't understand, I'm talking about "InfoCodecID_Codec" field in "infocodecid_t enum", first column from "CodecID-...-....csv" files. Do you want to say that CodecID field is deprecated? I think you mean Codec field from MediaInfo, which is deprecated, but I'm talking about CodecID name from csv file.
    What is my goal: I want to save complete list of formats and codecIDs in separate table and refer to it via ID. So, for example, we have output from MediaInfo:
    - Format (string)
    - BitRate (integer)
    - CodecID (string)
    and I will save in my db:
    - format_id (integer id for Format name from my format table)
    - bitrate (integer)
    - codec_id (integer id for CodecID name from my codec_id table)

    I've done it for format field, but I can't do it for CodecID field, because there are a lot of duplicate records with the same name. As I said earlier: I can't even define the unique key, because some records differ from each other only letter-case of one field (e.g. avc1-Video-Riff, h263-Video-Mpeg4)! Where can I get complete list of all CodecIDs then?

    I found that there are a lot of duplicate records. I can't even define the unique key, because some records differ from each other only letter-case of one field (e.g. avc1-Video-Riff, h263-Video-Mpeg4)! As a result I can't refer to this table in my database. Is it a bug/feature?
    Historical reasons. I need to do some clean up.

    if you need, I can give you list of all duplicated codecIDs.

    2) (...) But is it possible to do the same trick with CodecID field?

    Yes

    But how (see question 1)?

    Thanks,

     
  • Do you want to say that CodecID field is deprecated?

    It is not.
    I wrongly understood (I should change the field name...).
    Use CodecID.

    (e.g. avc1-Video-Riff, h263-Video-Mpeg4)!

    It is OK this way.

     
  • c1tru55
    c1tru55
    2014-04-04

    let me copy my question again:

    I want to save complete list of formats and codecIDs in separate table and refer to it via ID. So, for example, we have output from MediaInfo:
    - Format (string)
    - BitRate (integer)
    - CodecID (string)
    and I will save in my db:
    - format_id (integer id for Format name from my format table)
    - bitrate (integer)
    - codec_id (integer id for CodecID name from my codec_id table)

    I've done it for format field, but I can't do it for CodecID field, because there are a lot of duplicate records with the same name. As I said earlier: I can't even define the unique key, because some records differ from each other only letter-case of one field (e.g. there are 4 avc1 records, 4 h263 records and so on)! How can I get other CodecID information (e.g. CodecID/Hint, CodecID/Info, CodecID/String, CodecID/Url) using CodecID name, if it has duplicate names?

    Thanks,

     
  • How can I get other CodecID information (e.g. CodecID/Hint, CodecID/Info, CodecID/String, CodecID/Url) using CodecID name, if it has duplicate names?

    You save the string e.g. "avc1-Video-Riff"

    You know the kind of stream (video, audio...) (else save it somewhere, it is useful)
    You know the container format (the one in the General part)
    You can know the parser used for this container from Format.csv (e.g. "AVI" --> "Riff")
    so with theses pieces of information, you know which table to use.

    Example: you get an AVI file with avc1 CodecID for the video stream:
    - Container format ("General" part) is "AVI" --> "Riff" parser (from Format.csv)
    - Video stream --> use CodecID_Video_Riff.csv to get other information from CodecID field.

    If you save codec_id in the form "avc1-Video-Riff", you have all data in order to know which table you need to use.

     
  • c1tru55
    c1tru55
    2014-04-04

    Hi Jerome,

    thanks for being in touch. Is it true, that I should use "CodecID-KindOfStream-FormatParser" as a unique index? The problem is that there are, for example, 2 "avc1" records in CodecID_Video_Riff.csv file, or 2 "h263" records in CodecID_Video_Mpeg4.csv file. Should I remove duplicate entries of "CodecID-KindOfStream-FormatParser" field combinations?

    Thanks,

     
  • Is it true, that I should use "CodecID-KindOfStream-FormatParser" as a unique index?

    I don't say you should, I say you can, for your goal.
    Everybody may have another method. This one is not bad, that's all I say.

    The problem is that there are, for example, 2 "avc1" records in CodecID_Video_Riff.csv file

    Wrong.
    I see "avc1" and "AVC1", which are different. CodecIDs are case sensitive.

    Should I remove duplicate entries of "CodecID-KindOfStream-FormatParser" field combinations?

    They are not duplicated.
    it is the case in this example (other fields are same), but you have no guaranty that avc1 and AVC1 and aVc1 are for the same format (ok, it seems weird, but it may happen). On my side, I say that e.g. aVc1 is unknown.
    They are 2 separate lines.

    If you decide not to care about case and remove "duplicate" entries, this is your choice, your risk with the future, and I don't support it.

     
  • c1tru55
    c1tru55
    2014-04-04

    Hi Jerome,

    I see "avc1" and "AVC1", which are different. CodecIDs are case sensitive.

    I think this is the key phrase, it makes things clear.

    Thank you very much.

     
  • c1tru55
    c1tru55
    2014-05-02

    Hi Jerome,

    1) if take "CodecID-KindOfStream-FormatParser" set as a minimum set of fields to identify unique CodecID records (taking into account that CodecID is case-sensitive), there are several duplicate entries all the same:
    - 140-Audio-Riff (3 records)
    - 3gp6-General-Mpeg4 (3 records)
    - 42-Audio-Riff (3 records)
    - 135-Audio-Riff (2 records)
    - CAQV-General-Mpeg4 (2 records)

    Is it redundant records, which need to be deleted, or we need to expand set of fields for unique index?

    2) what is the point to save both 2-letter ISO 639-1 and 3-letter ISO 639-2 in one "Language" parameter? As I see from "Iso639_1.csv" and "Iso639_2.csv" - there are bijection between it. It complicates things, because you can't move language list in separate table and refer to it using "Language" field. You need to use "Language/String2" or "Language/String3" field for that (which contains only one possible language type).

    3) also I found that there are several problems with translations (for example for russian language):
    - there are no translation for "Version" parameter name part (e.g. Encoded_Library/Version)
    - there are no translation for "Default" parameter name
    - there are no translation for "Forced" parameter name

    Best regards,

     
    Last edit: c1tru55 2014-05-02