When does MI read the whole file versus only the header?

Help
Hans Ecke
2013-06-04
2014-03-29
  • Hans Ecke
    Hans Ecke
    2013-06-04

    Hi there-

    I'm trying to debug a mediainfo CLI problem (bug#765). It appears to read the whole file, which makes it very very slow. Typically MI gives me answers after just a couple seconds, reading only the header.

    So here is the question: under what condition will mediainfo resort to parsing the whole file, as opposed to just reading the header? I'm happy to dig a bit in the sources...

    Thank you!

    Hans

     
  • under what condition will mediainfo resort to parsing the whole file

    By default, it never should. It should stop after parsing feww hundreds of frames. For some other container formats (e.g. LXF), I also implement a limit to 64 MB for example, I did not (yet) for Matroska.
    I'll check the issue withi your file soon.

     
  • Hans Ecke
    Hans Ecke
    2013-06-04

    Thank you for your answer. Whenever it works for you, I really appreciate the tool and your (in this case unpaid) support.

    My analysis might also be totally wrong.

     
  • (in this case unpaid)

    Such issue is important for me, so unpaid is OK, no worry.
    Maybe another one will be paid support later ;-).

     
  • Per
    Per
    2013-08-21

    Hi!

    Any update on this? I have the same problem:

    I have a QuickTime/MOV file that is 7825060 bytes long. I read from a stream into a buffer of 1316 bytes, which I pass to Open_Buffer_Continue repeatedly. Open_Buffer_Continue returns (binary) 101 all the time, so my loop continues until all 7825060 bytes of the file are read.

    Inform gives me correct information, so the media parsing works, but I expected MI to need a lot less data...

    Cheers,
    Per

     
    • I have the same problem:

      I think you don't have the same problem, the issues is about big files (several GB).
      The default configuration of MediaInfo is to read lot of frames in order to detect the GOP and captions hidden in the stream (~300 frames if I remember well).
      The file size you have is small, so I think the end of the file is hit before the end of the tests.
      You can test with MediaInfo::Option("ParseSpeed", "0") before running the scan, it will reduce the duration of the detection.

      Any update on this?

      I expect not to have such issue anymore, but again, this is on big files. For small files like yours, MediaInfo is currently not optimized for limiting the size of the parsed content. It is on the ToDo-list, but not a priority (paid requests have priority). But the option I provided is a first step.

      PS: I am very surprised the Open_Buffer_* is used by so many people, the interface is a bit hidden and I developped it for specific customers, I am happy to see it is used! When I have some free time, I try to make more documentation about this interface + optimization of size of the parsed content.

       
  • Per
    Per
    2013-08-22

    Thanks for your reply! Turns out that the "moov" atom in my test video is in the end of the file, so of course MediaInfo has to read all of the file. However, I have another file where the "moov" atom is in the beginning, but MediaInfo reads all of that file too...

    Tried ParseSpeed=0 with no effect.

    My use case is that I proxy an HTTP stream, so I would like to not read everything before figuring out the video and audio codecs. I saw in one example that MediaInfo could instruct me to seek in the stream, which could allow my to do a HTTP Range request. Would that be possible?

     
    • Turns out that the "moov" atom in my test video is in the end of the file, so of course MediaInfo has to read all of the file.

      Actually it does not have to.

      The example has the details about seek. You need to obey to seek requests from MI if you want to parse QuickTime files having moov atom at the end else you miss some information (the ones in hte raw streams)
      In the case of moov atom at the end, there is a seek request with the byte offset you can use after the mdat atom is met (so at the very beginning)

      BTW, if you install libcurl development package and if you compile yourself, MI can do the HTTP stuff see this discussion about the limited support about it.

      MediaInfo reads all of that file too... (...) Tried ParseSpeed=0 with no effect.

      In that case, I need the file in order to see if it is normal, please open a bug ticket about it.

       
  • Per
    Per
    2013-08-22

    Ok, I successfully implemented seeking according to the example. Here's what I found:

    File 1, 10318650 bytes, moov at the beginning:

    • With ParseSpeed=0, I have to feed 9140936 bytes into MediaInfo before it's finished.
    • Without setting ParseSpeed, the corresponding number is 10318650, so the entire file.

    File 2: 7825060 bytes, moov at the end:

    • ParseSpeed setting makes no difference.
    • After reading the first buffer, MediaInfo seeks to offset 7818735, where the moov atom is.
    • Then, it seeks to offset 36, where the mdat atom data begins.
    • When finished, I have feed a total of 7827313 bytes into MediaInfo.

    Unless I'm missing something, MediaInfo consumes more or less the entire file before giving an answer. Is there a way to read only the header data?

     
    • Please provide the files, so I can check. I can provide private FTP server access if you don't want a public share (email info@mediaarea.net).

      Is there a way to read only the header data?

      Not currently. the fatest thing is with 2 frames. It is a possible future feature, but not in free support (or it is free software, you can edit and adapt the source code).
      But parsing so many bytes with "ParseSpeed" option is not normal, so before any improvement requests, let's see if this is normal to have so many bytes consumed when I have the files.

       
  • hunterdouglas.mov

    bug in MI with unsupported formats like the ones in this file
    Corrected, now 59220 bytes are read.

    test2.mov

    I added
    MI->Option(T("ParseSpeed"), T("0"));
    before MI.Open_Buffer_Init() in my example

    with 1316 byte-blocks:
    Read 0-1316 then request to go to offset 7818735 (moov atom)
    Read 1316-7641 then request to go to offset 36 (mdat atom)
    Read 7641-518249 (2 video frames and 2 audio frames)

    500 KB are read, as expected (2 video frames)
    If you don't obey the 2nd seek request, 7641 bytes are read and you have information (except information from the raw frame!)
    It could be a config, but no such development in free support.

     
  • Per
    Per
    2013-08-23

    Hm, something's odd here. I tried your C++ example code with ParseSpeed=0 on 64-bit Linux, and MediaInfo reads 500 Kb just as you write.

    However, on Windows 8 64-bit with corresponding Scala code that uses JNA, MediaInfo reads essentially the full file.

    I have to investigate further...

     
  • Per
    Per
    2013-08-24

    I posted my HowToUse_Dll.cpp here: https://gist.github.com/provegard/6326686

    It compiles under Linux with g++ and under Windows with cl.exe.

    On Linux (Linux devel 3.2.0-30-generic #48-Ubuntu SMP Fri Aug 24 16:52:48 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux), I get:

    ...
    Done, read 518249 bytes in total.
    MPEG-4
    

    On Windows (8, 64-bit), I get:

    ...
    Done, read 7323285 bytes in total.
    MPEG-4
    

    I'm using MediaInfo 0.7.64.

    Any idea why there is a difference?

     
  • I confirm there is a problem (Window 32 bit is OK, 64 bit is NOK).
    Please open a bug ticket for a better tracking of the issue.

     
  • rednoah
    rednoah
    2014-03-29

    I can confirm the issue on MediaInfo 0.7.67 (both MediaInfo GUI and libmediainfo). In this case it's a 700 MB mkv file and the GUI freezes until it's read the whole thing.

    Here is the mediainfo for the file that I can reproduce this bug with:
    http://pastebin.com/hGqwyjJw

    PM me if you want me to upload the whole file somewhere.

    Thanks for your hard work! Cheers, Reinhard