Option to parse whole file

Help
alpduhuez
2013-05-29
2013-05-31
  • alpduhuez

    alpduhuez - 2013-05-29

    Hello.
    We have some mpeg-2 .ts files that have closed captions that are not being found. They are 608 format but they do not appear until about 1-5 minutes into the movie. I have seen some references to parsing the whole file, what is that option? And how would I set it from the C# reference? If this is still under development, we are willing to accelerate.

    Thanks.
    -alp

     
  • Jerome Martinez

    Jerome Martinez - 2013-05-29

    The option is hidden because it is not already fully tested and validated, but it already exists and lot of people already use it.
    In C#, something like:
    MediaInfo MI = new MediaInfo();
    MI.Option("ParseSpeed", "1.0"); // Parsing 100% of the file.

    Try it, and let me know if something is wrong compared to your needs.

     
  • alpduhuez

    alpduhuez - 2013-05-29

    Great, thank you for the fast reply. I will try it out.

    Thanks!

     
    • alpduhuez

      alpduhuez - 2013-05-31

      So, I have been playing with the "ParseSpeed". Is there any value beside "1.0" to use? I have tried "0.2", "0.5", "1.0, and "2.0". The CC data only shows up if the value is > 1.0. Is this expected?

      I am trying to extract the CC data on a 15gb file. The processing takes about 3 minutes. Is there a way to speed this up? Is that a potential accelerated feature request?

      Thanks.
      -alp

       
      • Jerome Martinez

        Jerome Martinez - 2013-05-31

        More than 1.0 is same as 1.0 (it is currently the maximum)
        0.5 is the default
        below 0.5 is quicker parsing (so less precise).

        In your case, the problem is not in MediaInfo, but in your container format: CEA-608/708 are intended for broadcast. They can appear everywhere. 1 stream can contain several services, they are not indicated in a header. Caption data can be at only offset 0, or only at offset 10 000 000, or only at offset "end of your file minus 10 000 000". Currently, I see no other possibility than parsing the whole file.

        I think you will find no tool fitting your needs, due to your file, not the tools.

        So let's try to find some ideas:
        - in the case of 708 (not 608), it is possible to rely on only the caption service descriptor (in the PMT) instead of looking for caption data. But it will provide the list of services listed in the descriptor, not the real content, so you must trust the service descriptor. It is some additional development. Interesting for you?
        - We can limit the parsing to x minutes or x MB, you decide of the value x. but if your caption appears after this x, you still have your problem. If I remember well, it is already implemented. Interesting for you?
        - 15 GB and 3 minutes --> ~ 90 MB/s. Fair enough. What is the bottleneck? HDD? Buy faster HDD and/or RAID ;-). CPU (MediaInfo is not multithreaded, check the charge of 1 core)? It is possbile to study the code and I am pretty sure there are tons of improvements to do, but it is long (=expensive) and without promise on the result. interesting for you?

         
        • alpduhuez

          alpduhuez - 2013-05-31

          Duh, I'm sorry, I did not do the math on the parsing speed.

          Thanks, that is good information. I do have one more question, please excuse me if it is a silly one. From what I can find, MediaInfo does not do CC or sub-title extraction, it just will report that they exist in a file as "Text Stream" and the format of said CC or sub-title?

          thanks.
          -alp

           
          • Jerome Martinez

            Jerome Martinez - 2013-05-31

            please excuse me if it is a silly one.

            I don't excuse because it is not a silly one ;-).

            MediaInfo does not do CC or sub-title extraction

            Default output is about global stream information, so no subtitle extraction, correct.

            But a basic extractor is hidden inside MediaInfo, for specific needs from specific customers (specific contracts I had).
            Actually, I already have 2 subtitle projects:
            - 1 in C++, based on MediaInfo for the demux (from TS, MXF ancillary, GXF...), with basic output example.
            - 1 in C, using ffmpeg as the demuxer, with more raw input formats (SDI input) output formats (SubRip, TTML, TTML tunneling, On screen display...)

            Both projects fit the needs of the customers who requested the feature, but they are not ready for public release (no presentation, no code example, no plan to do a support on it, basic decoding only, no time to create a specific website...). If you are interested in sponsoring something (clean up, example code creation and website? C# binding?), let me know. On my side, making both projects public is on my ToDo-list but I have no ETA.

             
            • alpduhuez

              alpduhuez - 2013-05-31

              Ah, okay that makes sense. I spoke w/ the customer and they are okay w/ the MediaInfo behavior. They are good with it reporting CC when it exists. So I lied, one more question. :-)

              This is the text stream information that as able to get from .Inform after setting the ParseSpeed="1.0" on the 15gb file in question where the CC show up about 1 minute into the video. My question is, why is a CC detected if the stream size is 0? Should we be able to rely on the fast if a Text Stream is present 608/708 exits?

              Thanks Much.
              -alp

              Text
              ID : 481 (0x1E1)-CC1
              Menu ID : 1 (0x1)
              Format : EIA-608
              Muxing mode : SCTE 128 / DTVCC Transport
              Muxing mode, more info : Muxed in Video #1
              Duration : 1h 52mn
              Bit rate mode : Constant
              Stream size : 0.00 Byte (0%)

               
              • Jerome Martinez

                Jerome Martinez - 2013-05-31

                So no need of CC extract, ok.

                My question is, why is a CC detected if the stream size is 0?

                It is the difficulty to report information with such technology.
                CC stream is actually injected inside the video stream instead of being in its own PID (e.g. like DVB Subtitles). So the CC stream size is currently inside the video stream size. FYI, DTVCC Transport stream is always at 14400 bps, easy to compute stream size of CC alone ;-). I plan to provide a better display when I have some time (but not a priority, you are the first one to talk about this weird display!)

                Should we be able to rely on the fast if a Text Stream is present 608/708 exits?

                Not sure I understand the question.
                In any case:
                - if something is displayed, you are sure this service is present at least once. Fast or slow
                - if something is not displayed, your are not sure it is not present except if you do a full parsing

                In your example, if you have CC1 displayed after fast parsing, you are sure that CC1 exists in the file, but you are not sure that CC2 does not exist.

                If your goal is to detect the CC stream existence (even if it has only 0x00 padding only), you can try Option("File_Eia608_DisplayEmptyStream", "1") and see if it fits your need, but be careful, it shows something even if the is no CC real content (in the classic case the CC stream is injected inside the video but with only 0x00 padding bytes).

                 
                • alpduhuez

                  alpduhuez - 2013-05-31

                  Ah, a type-o, my fingers were in the way. I meant to say something like this:

                  Should we be able to rely on the fact if a Text Stream is present then a 608/708 CC exits?

                  I believe you answered this question anyways! Yes, our goal was to be able to reliably detect if a 608/708 CC was present in a file. I will try the "File_Eia608_DisplayEmptyStream" option to see if that will work.

                  Thanks for help!

                   
  • Jon Kar

    Jon Kar - 2013-05-30

    Is it possible to activate whole file parsing in GUI (Win)?

     
    • Jerome Martinez

      Jerome Martinez - 2013-05-30

      Currently, it is implemented only for command line and library.
      GUI is more mainstream, and would require more work (progress bar, answering questions about difference compared to the "normal" analysis...), on the ToDo-list but no ETA (at least without sponsorship).

       
  • Jon Kar

    Jon Kar - 2013-05-31

    Can you please give example how to make it work in CLI?

     
    • Jerome Martinez

      Jerome Martinez - 2013-05-31

      mediainfo --ParseSpeed=1.0

       
  • Jon Kar

    Jon Kar - 2013-05-31

    At least for me it does not read DTS-HD MA or TrueHD bitrates correctly.

    CLI --ParseSpeed=1.0 --Output=Audio;%BitRate/String%\n test.mv

     
    • Jerome Martinez

      Jerome Martinez - 2013-05-31

      Yes, this part (DTS-HD and True HD bitrate after full parsing) is doable but currently not implemented.
      this is on the ToDo-list but currently not a priority (no sponsor for accelerating it)

       

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:





No, thanks