Work at SourceForge, help us to make it a better place! We have an immediate need for a Support Technician in our San Francisco or Denver office.

Close

Option to parse whole file

Help
alpduhuez
2013-05-29
2013-05-31
  • alpduhuez
    alpduhuez
    2013-05-29

    Hello.
    We have some mpeg-2 .ts files that have closed captions that are not being found. They are 608 format but they do not appear until about 1-5 minutes into the movie. I have seen some references to parsing the whole file, what is that option? And how would I set it from the C# reference? If this is still under development, we are willing to accelerate.

    Thanks.
    -alp

     
  • The option is hidden because it is not already fully tested and validated, but it already exists and lot of people already use it.
    In C#, something like:
    MediaInfo MI = new MediaInfo();
    MI.Option("ParseSpeed", "1.0"); // Parsing 100% of the file.

    Try it, and let me know if something is wrong compared to your needs.

     
  • alpduhuez
    alpduhuez
    2013-05-29

    Great, thank you for the fast reply. I will try it out.

    Thanks!

     
    • alpduhuez
      alpduhuez
      2013-05-31

      So, I have been playing with the "ParseSpeed". Is there any value beside "1.0" to use? I have tried "0.2", "0.5", "1.0, and "2.0". The CC data only shows up if the value is > 1.0. Is this expected?

      I am trying to extract the CC data on a 15gb file. The processing takes about 3 minutes. Is there a way to speed this up? Is that a potential accelerated feature request?

      Thanks.
      -alp

       
      • More than 1.0 is same as 1.0 (it is currently the maximum)
        0.5 is the default
        below 0.5 is quicker parsing (so less precise).

        In your case, the problem is not in MediaInfo, but in your container format: CEA-608/708 are intended for broadcast. They can appear everywhere. 1 stream can contain several services, they are not indicated in a header. Caption data can be at only offset 0, or only at offset 10 000 000, or only at offset "end of your file minus 10 000 000". Currently, I see no other possibility than parsing the whole file.

        I think you will find no tool fitting your needs, due to your file, not the tools.

        So let's try to find some ideas:
        - in the case of 708 (not 608), it is possible to rely on only the caption service descriptor (in the PMT) instead of looking for caption data. But it will provide the list of services listed in the descriptor, not the real content, so you must trust the service descriptor. It is some additional development. Interesting for you?
        - We can limit the parsing to x minutes or x MB, you decide of the value x. but if your caption appears after this x, you still have your problem. If I remember well, it is already implemented. Interesting for you?
        - 15 GB and 3 minutes --> ~ 90 MB/s. Fair enough. What is the bottleneck? HDD? Buy faster HDD and/or RAID ;-). CPU (MediaInfo is not multithreaded, check the charge of 1 core)? It is possbile to study the code and I am pretty sure there are tons of improvements to do, but it is long (=expensive) and without promise on the result. interesting for you?

         
        • alpduhuez
          alpduhuez
          2013-05-31

          Duh, I'm sorry, I did not do the math on the parsing speed.

          Thanks, that is good information. I do have one more question, please excuse me if it is a silly one. From what I can find, MediaInfo does not do CC or sub-title extraction, it just will report that they exist in a file as "Text Stream" and the format of said CC or sub-title?

          thanks.
          -alp

           
          • please excuse me if it is a silly one.

            I don't excuse because it is not a silly one ;-).

            MediaInfo does not do CC or sub-title extraction

            Default output is about global stream information, so no subtitle extraction, correct.

            But a basic extractor is hidden inside MediaInfo, for specific needs from specific customers (specific contracts I had).
            Actually, I already have 2 subtitle projects:
            - 1 in C++, based on MediaInfo for the demux (from TS, MXF ancillary, GXF...), with basic output example.
            - 1 in C, using ffmpeg as the demuxer, with more raw input formats (SDI input) output formats (SubRip, TTML, TTML tunneling, On screen display...)

            Both projects fit the needs of the customers who requested the feature, but they are not ready for public release (no presentation, no code example, no plan to do a support on it, basic decoding only, no time to create a specific website...). If you are interested in sponsoring something (clean up, example code creation and website? C# binding?), let me know. On my side, making both projects public is on my ToDo-list but I have no ETA.

             
            • alpduhuez
              alpduhuez
              2013-05-31

              Ah, okay that makes sense. I spoke w/ the customer and they are okay w/ the MediaInfo behavior. They are good with it reporting CC when it exists. So I lied, one more question. :-)

              This is the text stream information that as able to get from .Inform after setting the ParseSpeed="1.0" on the 15gb file in question where the CC show up about 1 minute into the video. My question is, why is a CC detected if the stream size is 0? Should we be able to rely on the fast if a Text Stream is present 608/708 exits?

              Thanks Much.
              -alp

              Text
              ID : 481 (0x1E1)-CC1
              Menu ID : 1 (0x1)
              Format : EIA-608
              Muxing mode : SCTE 128 / DTVCC Transport
              Muxing mode, more info : Muxed in Video #1
              Duration : 1h 52mn
              Bit rate mode : Constant
              Stream size : 0.00 Byte (0%)

               
              • So no need of CC extract, ok.

                My question is, why is a CC detected if the stream size is 0?

                It is the difficulty to report information with such technology.
                CC stream is actually injected inside the video stream instead of being in its own PID (e.g. like DVB Subtitles). So the CC stream size is currently inside the video stream size. FYI, DTVCC Transport stream is always at 14400 bps, easy to compute stream size of CC alone ;-). I plan to provide a better display when I have some time (but not a priority, you are the first one to talk about this weird display!)

                Should we be able to rely on the fast if a Text Stream is present 608/708 exits?

                Not sure I understand the question.
                In any case:
                - if something is displayed, you are sure this service is present at least once. Fast or slow
                - if something is not displayed, your are not sure it is not present except if you do a full parsing

                In your example, if you have CC1 displayed after fast parsing, you are sure that CC1 exists in the file, but you are not sure that CC2 does not exist.

                If your goal is to detect the CC stream existence (even if it has only 0x00 padding only), you can try Option("File_Eia608_DisplayEmptyStream", "1") and see if it fits your need, but be careful, it shows something even if the is no CC real content (in the classic case the CC stream is injected inside the video but with only 0x00 padding bytes).

                 
                • alpduhuez
                  alpduhuez
                  2013-05-31

                  Ah, a type-o, my fingers were in the way. I meant to say something like this:

                  Should we be able to rely on the fact if a Text Stream is present then a 608/708 CC exits?

                  I believe you answered this question anyways! Yes, our goal was to be able to reliably detect if a 608/708 CC was present in a file. I will try the "File_Eia608_DisplayEmptyStream" option to see if that will work.

                  Thanks for help!

                   
  • Jon Kar
    Jon Kar
    2013-05-30

    Is it possible to activate whole file parsing in GUI (Win)?

     
    • Currently, it is implemented only for command line and library.
      GUI is more mainstream, and would require more work (progress bar, answering questions about difference compared to the "normal" analysis...), on the ToDo-list but no ETA (at least without sponsorship).

       
  • Jon Kar
    Jon Kar
    2013-05-31

    Can you please give example how to make it work in CLI?

     
    • mediainfo --ParseSpeed=1.0

       
  • Jon Kar
    Jon Kar
    2013-05-31

    At least for me it does not read DTS-HD MA or TrueHD bitrates correctly.

    CLI --ParseSpeed=1.0 --Output=Audio;%BitRate/String%\n test.mv

     
    • Yes, this part (DTS-HD and True HD bitrate after full parsing) is doable but currently not implemented.
      this is on the ToDo-list but currently not a priority (no sponsor for accelerating it)