MediaInfo / Discussion / Help: When does MI read the whole file versus only the header?

Hans Ecke - 2013-06-04

Hi there-

I'm trying to debug a mediainfo CLI problem (bug#765). It appears to read the whole file, which makes it very very slow. Typically MI gives me answers after just a couple seconds, reading only the header.

So here is the question: under what condition will mediainfo resort to parsing the whole file, as opposed to just reading the header? I'm happy to dig a bit in the sources...

Thank you!

Hans

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Jerome Martinez - 2013-06-04

under what condition will mediainfo resort to parsing the whole file

By default, it never should. It should stop after parsing feww hundreds of frames. For some other container formats (e.g. LXF), I also implement a limit to 64 MB for example, I did not (yet) for Matroska.
I'll check the issue withi your file soon.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Hans Ecke - 2013-06-04

Thank you for your answer. Whenever it works for you, I really appreciate the tool and your (in this case unpaid) support.

My analysis might also be totally wrong.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Jerome Martinez - 2013-06-04

(in this case unpaid)

Such issue is important for me, so unpaid is OK, no worry.
Maybe another one will be paid support later ;-).

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Per - 2013-08-21

Hi!

Any update on this? I have the same problem:

I have a QuickTime/MOV file that is 7825060 bytes long. I read from a stream into a buffer of 1316 bytes, which I pass to Open_Buffer_Continue repeatedly. Open_Buffer_Continue returns (binary) 101 all the time, so my loop continues until all 7825060 bytes of the file are read.

Inform gives me correct information, so the media parsing works, but I expected MI to need a lot less data...

Cheers,
Per

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Jerome Martinez - 2013-08-21
  
  I have the same problem:
  
  I think you don't have the same problem, the issues is about big files (several GB).
  The default configuration of MediaInfo is to read lot of frames in order to detect the GOP and captions hidden in the stream (~300 frames if I remember well).
  The file size you have is small, so I think the end of the file is hit before the end of the tests.
  You can test with MediaInfo::Option("ParseSpeed", "0") before running the scan, it will reduce the duration of the detection.
  
  Any update on this?
  
  I expect not to have such issue anymore, but again, this is on big files. For small files like yours, MediaInfo is currently not optimized for limiting the size of the parsed content. It is on the ToDo-list, but not a priority (paid requests have priority). But the option I provided is a first step.
  
  PS: I am very surprised the Open_Buffer_* is used by so many people, the interface is a bit hidden and I developped it for specific customers, I am happy to see it is used! When I have some free time, I try to make more documentation about this interface + optimization of size of the parsed content.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Per - 2013-08-22

Thanks for your reply! Turns out that the "moov" atom in my test video is in the end of the file, so of course MediaInfo has to read all of the file. However, I have another file where the "moov" atom is in the beginning, but MediaInfo reads all of that file too...

Tried ParseSpeed=0 with no effect.

My use case is that I proxy an HTTP stream, so I would like to not read everything before figuring out the video and audio codecs. I saw in one example that MediaInfo could instruct me to seek in the stream, which could allow my to do a HTTP Range request. Would that be possible?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Jerome Martinez - 2013-08-22
  
  Turns out that the "moov" atom in my test video is in the end of the file, so of course MediaInfo has to read all of the file.
  
  Actually it does not have to.
  
  The example has the details about seek. You need to obey to seek requests from MI if you want to parse QuickTime files having moov atom at the end else you miss some information (the ones in hte raw streams)
  In the case of moov atom at the end, there is a seek request with the byte offset you can use after the mdat atom is met (so at the very beginning)
  
  BTW, if you install libcurl development package and if you compile yourself, MI can do the HTTP stuff see this discussion about the limited support about it.
  
  MediaInfo reads all of that file too... (...) Tried ParseSpeed=0 with no effect.
  
  In that case, I need the file in order to see if it is normal, please open a bug ticket about it.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Per - 2013-08-22

Ok, I successfully implemented seeking according to the example. Here's what I found:

File 1, 10318650 bytes, moov at the beginning:

With ParseSpeed=0, I have to feed 9140936 bytes into MediaInfo before it's finished.

Without setting ParseSpeed, the corresponding number is 10318650, so the entire file.

File 2: 7825060 bytes, moov at the end:

ParseSpeed setting makes no difference.

After reading the first buffer, MediaInfo seeks to offset 7818735, where the moov atom is.

Then, it seeks to offset 36, where the mdat atom data begins.

When finished, I have feed a total of 7827313 bytes into MediaInfo.

Unless I'm missing something, MediaInfo consumes more or less the entire file before giving an answer. Is there a way to read only the header data?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Jerome Martinez - 2013-08-22
  
  Please provide the files, so I can check. I can provide private FTP server access if you don't want a public share (email info@mediaarea.net).
  
  Is there a way to read only the header data?
  
  Not currently. the fatest thing is with 2 frames. It is a possible future feature, but not in free support (or it is free software, you can edit and adapt the source code).
  But parsing so many bytes with "ParseSpeed" option is not normal, so before any improvement requests, let's see if this is normal to have so many bytes consumed when I have the files.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Per - 2013-08-23

The bigger file with moov at the beginning is here: http://www.cybertechmedia.com/samples/hunterdouglas.mov

The iPhone video, with moov at the end is here:
http://rovegard.se/test2.mov

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Jerome Martinez - 2013-08-23

hunterdouglas.mov

bug in MI with unsupported formats like the ones in this file
Corrected, now 59220 bytes are read.

test2.mov

I added
MI->Option(T("ParseSpeed"), T("0"));
before MI.Open_Buffer_Init() in my example

with 1316 byte-blocks:
Read 0-1316 then request to go to offset 7818735 (moov atom)
Read 1316-7641 then request to go to offset 36 (mdat atom)
Read 7641-518249 (2 video frames and 2 audio frames)

500 KB are read, as expected (2 video frames)
If you don't obey the 2nd seek request, 7641 bytes are read and you have information (except information from the raw frame!)
It could be a config, but no such development in free support.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Per - 2013-08-23

Hm, something's odd here. I tried your C++ example code with ParseSpeed=0 on 64-bit Linux, and MediaInfo reads 500 Kb just as you write.

However, on Windows 8 64-bit with corresponding Scala code that uses JNA, MediaInfo reads essentially the full file.

I have to investigate further...

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Per - 2013-08-24

I posted my HowToUse_Dll.cpp here: https://gist.github.com/provegard/6326686

It compiles under Linux with g++ and under Windows with cl.exe.

On Linux (Linux devel 3.2.0-30-generic #48-Ubuntu SMP Fri Aug 24 16:52:48 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux), I get:

... Done, read 518249 bytes in total. MPEG-4

On Windows (8, 64-bit), I get:

... Done, read 7323285 bytes in total. MPEG-4

I'm using MediaInfo 0.7.64.

Any idea why there is a difference?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Jerome Martinez - 2013-08-24

I confirm there is a problem (Window 32 bit is OK, 64 bit is NOK).
Please open a bug ticket for a better tracking of the issue.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Per - 2013-08-24

Done: https://sourceforge.net/p/mediainfo/bugs/783/

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

rednoah - 2014-03-29

I can confirm the issue on MediaInfo 0.7.67 (both MediaInfo GUI and libmediainfo). In this case it's a 700 MB mkv file and the GUI freezes until it's read the whole thing.

Here is the mediainfo for the file that I can reproduce this bug with:
http://pastebin.com/hGqwyjJw

PM me if you want me to upload the whole file somewhere.

Thanks for your hard work! Cheers, Reinhard

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

When does MI read the whole file versus only the header?

A unified display of relevant technical and tag data for A/V files

Forums

Help

When does MI read the whole file versus only the header?

When does MI read the whole file versus only the header?

A unified display of relevant technical and tag data for A/V files

Forums

Help

When does MI read the whole file versus only the header? document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

When does MI read the whole file versus only the header?