Menu

#128 WCS behaves badly or fails altogether when CPU is burdened

0.72
New
Tetsujin
None
Medium
Linux
Unspecified
Defect
2016-04-23
2015-06-17
Tetsujin
No

What steps will reproduce the problem?
1. Run Webcam Studio with at least one camera on
2. Put the machine under significant CPU load
3. After several minutes performance of the cameras degrades. USB webcams become extremely choppy, while DV cameras experience significant latency (on the order of several seconds) - ultimately, WCS simply stops producing video.

In my case, I'm running a DV camcorder and two USB webcams. If I run the DV camera together with one of the USB cameras while using Google hangouts, the cameras quickly become unusable. CPU load when running in this configuration is around 80% or higher on my machine, with the Java process taking a full core pretty much to itself.

(My machine is a 2.5GHz dual core laptop. Hyperthreading is currently enabled, so there are four virtual cores.)

To test this when I'm not in a video chat with someone else, I can run WCS while playing a Youtube video - that will usually do it.

What is the expected output?
WCS should maintain reasonable output - dropping frames if necessary, but providing timely video data when it can.

What do you see instead?
After a few minutes of operation, USB webcams get very choppy, DV camera gets an increasingly large time delay, until ultimately everything stops (due to a timeout-induced crash in MasterFrameBuilder)

The Operating system you are using (Linux, Windows etc)?
Linux

What version of WebcamStudio are you using?
0.73 (from current SVN)

What version of Java are you using?
Tested with OpenJDK and Oracle JDK
java version "1.8.0_45"
Java(TM) SE Runtime Environment (build 1.8.0_45-b14)
Java HotSpot(TM) 64-Bit Server VM (build 25.45-b02, mixed mode)

What is your Webcamera vendor, model and version?
Lenovo Thinkpad X220T integral camera
Logitech C920
Sony DCR-TRV17 DV camcorder

From my attempts to diagnose and correct this, so far it seems like I'm hitting the limits of my CPU's abilities. I'm trying to find ways to tune my studio setup, the gstreamer invocations, and the Java code itself to hopefully improve the performance enough to make it work.

The big problem, as I said, is that when it's in these heavy CPU load situations it doesn't handle it well. After a few minutes the output becomes borderline unusable, and eventually one of the threads launched by MasterFrameBuilder will time out and MasterFrameBuilder will crash with a CancellationException - which in turn means no video will work in WCS until it is restarted.

I did some profiling and it seems like the big hit within the Java code is scaling the images: it takes somewhere around 30% of the execution time as I recall. Outside of the Java code, the gstreamer process for decoding, deinterlacing, and scaling the DV camera feed takes a significant amount of CPU time as well.

It may or may not be possible to optimize the code to make this setup work well on my hardware. (I am hopeful, however - perhaps the code could be made to take advantage of the GPU for image scaling...) But the MasterFrameBuilder situation does need to be addressed. I'm working on a code change which addresses the timeout-induced crash and which will hopefully yield better "frame-dropping" behavior as well. So far, however, the changes have not been completely successful. I think I may need to rethink the overall design of how sources are polled for new data, and how the main frame builder loop is organized and run.

Related

Tickets: #128

Discussion

  • Tetsujin

    Tetsujin - 2015-06-17

    Also, it may be that this problem is specific to DV cameras. The increased latency in the DV feed seems to go hand-in-hand with the increased choppiness in the USB webcam feeds. MasterFrameBuilder attempts to do all the I/O on all the sources in parallel, but then it'll actually block waiting for all those sources to finish before rendering the frame and moving on to the next one. In principle any slow source could probably trigger this, but in practice I think only the DV support is presently slow enough to do it.

    I am considering new designs for handling the source polling and frame builder loop: perhaps putting the frame polling and timing for each source in its own separate thread, and then the main frame builder loop will draw from each source thread's data - and whatever image a source has available when the frame builder builds the frame, that's what will be displayed.

    The problem is, I don't know if it'll perform better than what's there now. I won't know until I write it. :)

     
    • Soylent TV

      Soylent TV - 2015-06-17

      Great,
      also the sinks output like FMEs if the remote server is not ready, slow down the overall fps of WS. Maybe your approach is the right one :)
      Thanks.

       
  • Soylent TV

    Soylent TV - 2015-06-21

    Hi George, i make some tests on the GPU_Accel Branch.

    1) CPU i5 2500k - GPU Nvidia GT430 1GB - 4GB Ram - Mint 17.1 64bit
    Leaving "-Dsun.java2d.opengl=True".
    With a movie stream frames dropped to 10 fps. CPU usage is Doubled.

    2) CPU i5 2500k - GPU Nvidia GT430 1GB - 4GB Ram - Mint 17.1 64bit
    Removing "-Dsun.java2d.opengl=True".
    With a movie stream frames are kept right, CPU usage is quite the same.

    Don't know if it is useful in this case, but Patrick Balleux one day suggests to get the streams data directly from the console instead of TCP connections:

    << ... So I revisited some other ideas I had a while ago. Currently in 0.60, webcam is captured using "avconv" that is streaming raw video and raw audio over a local TCPIP connection to the Java code. This was done to keep some multi-OSes compatibility. But it slow, really slow. And a lot of "out-of-sync" issues do occur.

    Then I played around by piping the "avconv" output to the console. My guess is that since Java is able to read the console as an InputStream, raw video and raw audio could be capture this way. That should provide a better and lower CPU usage and probably a better control over audio/video sync.

    The deal is that with live content like a webcam and a microphone, you don't need to buffer, just use the latest frame as they are both "in-sync". This method would not work for a movie for example as you need to read all "frames" and make sure to stream them at the same time for each audio frame and video frame.

    For the webcam, the "avconv" command should be:
    video=avconv v 0 -s @CWIDTHx@CHEIGHT-f video4linux2 -i @FILE-f rawvideo -pix_fmt rgb24 -r @RATE
    audio=avconv -v 0 -ar 44100 -f alsa -i pulse -ac 2 -ar 44100 -f s16be -

    Audio could also be captured using "parec" in the same way...

    Technically, ProcessRenderer should be modified to launch each existing process and then Capturer should read from the InputStream of the process instead of reading from a local TCPIP port. >>

    I try this on TrucklistStudio and works, but apparently seems to not have any speed improvement ...
    karl.

     
    • Tetsujin

      Tetsujin - 2015-06-22

      On Sun, 2015-06-21 at 07:02 +0000, Soylent TV wrote:

      << ... So I revisited some other ideas I had a while ago. Currently in
      0.60, webcam is captured using "avconv" that is streaming raw video
      and raw audio over a local TCPIP connection to the Java code. This was
      done to keep some multi-OSes compatibility. But it slow, really slow.
      And a lot of "out-of-sync" issues do occur.

      I've been looking into this a bit. Based on these tools:

      https://github.com/rigtorp/ipc-bench

      I don't think there's a significant performance gain to be had by
      switching from TCP to pipes. I added a sleep() command to the loop of
      the tools and ran a test simulating the transfer of one minute of
      64048024bpp video at 30fps: (Note that the throughput numbers are
      pretty meaningless because of this throttling)

      tetsujin@frontier:~/src/ipc-bench$ time ./pipe_thr 1000000 $((60 * 30))
      message size: 1000000 octets
      message count: 1800
      average throughput: 29 msg/s
      average throughput: 232 Mb/s

      real 1m1.136s
      user 0m0.012s
      sys 0m0.512s
      tetsujin@frontier:~/src/ipc-bench$ time ./tcp_thr 1000000 $((60 * 30))
      message size: 1000000 octets
      message count: 1800
      average throughput: 29 msg/s
      average throughput: 232 Mb/s

      real 1m1.783s
      user 0m0.012s
      sys 0m0.604s

      TCP took more system CPU time, but only 0.01s per frame average.

      I may still make the change at some point, if nothing else I think it's
      a cleaner solution, I just don't think there's any indication at this
      point that it'd have a meaningful performance benefit.

       
  • Tetsujin

    Tetsujin - 2015-06-21

    Yeah the gpu accel work was an experiment. Based on the results on my own machine I consider it a failure, but rather than throw away that work I stuck it in a branch. Thanks for giving it a try, though. Nice to have some results from another machine. I think maybe Java's implementation of GPU acceleration is just broken. I don't have any idea why the implementation would need to burn that much CPU just to pass data back and forth to the GPU. Though it's also possible it's having to do some pixel format conversion to draw the images. Even turning off alpha transparency in the images didn't seem to help.

    Right now I'm still trying to figure out why the DV feed starts lagging after a while. I feel like I may learn something important there that I don't want to wind up sweeping under the rug before I've diagnosed it. I'm thinking of trying an outside library for scaling the images (one way or another it feels like that's just taking more time than it should) and switching capturers to be independent threads so they can synchronize with their own sources instead of everything getting locked in to MasterFrameBuilder's frame rate. (It does seem like DV sources have a different frame rate than webcams, despite WCS instructing gstreamer to change the frame rate to match - 29.97fps vs 30.0fps - I think that combined with the MasterFrameBuilder's "lock-step" polling the sources is somehow to blame but I can't quite figure out the scenario) One way or another I think my laptop ought to be capable of running my three-camera setup. If nothing else a better frame-dropping implementation should solve it.

    I had thought of reading video over a pipe rather than a TCP connection, but given that the TCP connection is on the local machine I was a bit doubtful of whether there would be any benefit. TCP on the local machine shouldn't need to deal with missing packets or out-of-order data or anything like that, assuming the underlying implementation is reasonably optimized it should be pretty much the same as a pipe. I'll probably give a pipe-based implementation a try at some point though because you never really know the answer until you experiment...

    ---GEC

    Soylent TV karl-ellis@users.sf.net wrote:

    Hi George, i make some tests on the GPU_Accel Branch.

    1) CPU i5 2500k - GPU Nvidia GT430 1GB - 4GB Ram - Mint 17.1 64bit
    Leaving "-Dsun.java2d.opengl=True".
    With a movie stream frames dropped to 10 fps. CPU usage is Doubled.

    2) CPU i5 2500k - GPU Nvidia GT430 1GB - 4GB Ram - Mint 17.1 64bit
    Removing "-Dsun.java2d.opengl=True".
    With a movie stream frames are kept right, CPU usage is quite the same.

    Don't know if it is useful in this case, but Patrick Balleux one day suggests to get the streams data directly from the console instead of TCP connections:

    << ... So I revisited some other ideas I had a while ago. Currently in 0.60, webcam is captured using "avconv" that is streaming raw video and raw audio over a local TCPIP connection to the Java code. This was done to keep some multi-OSes compatibility. But it slow, really slow. And a lot of "out-of-sync" issues do occur.

    Then I played around by piping the "avconv" output to the console. My guess is that since Java is able to read the console as an InputStream, raw video and raw audio could be capture this way. That should provide a better and lower CPU usage and probably a better control over audio/video sync.

    The deal is that with live content like a webcam and a microphone, you don't need to buffer, just use the latest frame as they are both "in-sync". This method would not work for a movie for example as you need to read all "frames" and make sure to stream them at the same time for each audio frame and video frame.

    For the webcam, the "avconv" command should be:
    video=avconv v 0 -s @CWIDTHx@CHEIGHT-f video4linux2 -i @FILE-f rawvideo -pix_fmt rgb24 -r @RATE
    audio=avconv -v 0 -ar 44100 -f alsa -i pulse -ac 2 -ar 44100 -f s16be -

    Audio could also be captured using "parec" in the same way...

    Technically, ProcessRenderer should be modified to launch each existing process and then Capturer should read from the InputStream of the process instead of reading from a local TCPIP port. >>

    I try this on TrucklistStudio and works, but apparently seems to not have any speed improvement ...
    karl.


    [tickets:#128] WCS behaves badly or fails altogether when CPU is burdened

    Status: New
    Milestone: 0.72
    Created: Wed Jun 17, 2015 03:06 PM UTC by George Caswell
    Last Updated: Wed Jun 17, 2015 03:30 PM UTC
    Owner: George Caswell

    What steps will reproduce the problem?
    1. Run Webcam Studio with at least one camera on
    2. Put the machine under significant CPU load
    3. After several minutes performance of the cameras degrades. USB webcams become extremely choppy, while DV cameras experience significant latency (on the order of several seconds) - ultimately, WCS simply stops producing video.

    In my case, I'm running a DV camcorder and two USB webcams. If I run the DV camera together with one of the USB cameras while using Google hangouts, the cameras quickly become unusable. CPU load when running in this configuration is around 80% or higher on my machine, with the Java process taking a full core pretty much to itself.

    (My machine is a 2.5GHz dual core laptop. Hyperthreading is currently enabled, so there are four virtual cores.)

    To test this when I'm not in a video chat with someone else, I can run WCS while playing a Youtube video - that will usually do it.

    What is the expected output?
    WCS should maintain reasonable output - dropping frames if necessary, but providing timely video data when it can.

    What do you see instead?
    After a few minutes of operation, USB webcams get very choppy, DV camera gets an increasingly large time delay, until ultimately everything stops (due to a timeout-induced crash in MasterFrameBuilder)

    The Operating system you are using (Linux, Windows etc)?
    Linux

    What version of WebcamStudio are you using?
    0.73 (from current SVN)

    What version of Java are you using?
    Tested with OpenJDK and Oracle JDK
    java version "1.8.0_45"
    Java(TM) SE Runtime Environment (build 1.8.0_45-b14)
    Java HotSpot(TM) 64-Bit Server VM (build 25.45-b02, mixed mode)

    What is your Webcamera vendor, model and version?
    Lenovo Thinkpad X220T integral camera
    Logitech C920
    Sony DCR-TRV17 DV camcorder

    From my attempts to diagnose and correct this, so far it seems like I'm hitting the limits of my CPU's abilities. I'm trying to find ways to tune my studio setup, the gstreamer invocations, and the Java code itself to hopefully improve the performance enough to make it work.

    The big problem, as I said, is that when it's in these heavy CPU load situations it doesn't handle it well. After a few minutes the output becomes borderline unusable, and eventually one of the threads launched by MasterFrameBuilder will time out and MasterFrameBuilder will crash with a CancellationException - which in turn means no video will work in WCS until it is restarted.

    I did some profiling and it seems like the big hit within the Java code is scaling the images: it takes somewhere around 30% of the execution time as I recall. Outside of the Java code, the gstreamer process for decoding, deinterlacing, and scaling the DV camera feed takes a significant amount of CPU time as well.

    It may or may not be possible to optimize the code to make this setup work well on my hardware. (I am hopeful, however - perhaps the code could be made to take advantage of the GPU for image scaling...) But the MasterFrameBuilder situation does need to be addressed. I'm working on a code change which addresses the timeout-induced crash and which will hopefully yield better "frame-dropping" behavior as well. So far, however, the changes have not been completely successful. I think I may need to rethink the overall design of how sources are polled for new data, and how the main frame builder loop is organized and run.


    Sent from sourceforge.net because you indicated interest in https://sourceforge.net/p/webcamstudio/tickets/128/

    To unsubscribe from further messages, please visit https://sourceforge.net/auth/subscriptions/

     

    Related

    Tickets: #128

  • abmoraz

    abmoraz - 2016-04-23

    I am seeing this issue as well. I ran WCS from commandline and it is throwing an exception repeatedly when it happens:

    Apr 23, 2016 6:36:07 PM webcamstudio.mixers.MasterFrameBuilder run
    SEVERE: null
    java.util.concurrent.CancellationException
            at java.util.concurrent.FutureTask.report(FutureTask.java:121)
            at java.util.concurrent.FutureTask.get(FutureTask.java:188)
            at webcamstudio.mixers.MasterFrameBuilder.run(MasterFrameBuilder.java:205)
            at java.lang.Thread.run(Thread.java:745)
    

    I have found that when the performance degrades and I get that exception, that stopping all the audio feeds for 1-2 seconds allows it to catch back up, then I can start them again. The command line output when I do this is:

    CommandVideo: gst-launch-0.10 pulsesrc device="bluez_sink.08_DF_1F_36_E2_76.monitor" ! audioconvert ! wavescope style=color-lines ! ffmpegcolorspace ! videoscale ! video/x-raw-rgb,width=640,height=360,depth=24,bpp=24,blue_mask=255,green_mask=65280,red_mask=16711680 ! videorate ! video/x-raw-rgb,framerate=25/1 ! ffmpegcolorspace ! tcpclientsink port=45886 
    CommandAudio: ffmpeg -loglevel debug -f pulse -ar 22050 -ac 2 -probesize 32 -analyzeduration 0 -i bluez_sink.08_DF_1F_36_E2_76.monitor -f s16be tcp://127.0.0.1:41259 
    AudioSource Video accepted...
    AudioSource Audio accepted...
    Start Video ...
    Start Audio ...
    

    My WCS setup is:
    1. Desktop capture (using Gstreamer back end so I can do window capture)
    2. Audio channel (main channel for the game, using ffmpeg. avconv doesn't work and gstreamer is REALLY quiet)
    3. Audio channel (microphone for commenting, using ffmpeg. Same reasons as above))
    4. occasionally, but rarely, another source in small bursts (file, text, another window capture)
    5. OUT Backend is Gstreamer (gives higher quality image)
    6. 2x FME outs (video: 640x360@1300kbs, and audio: 128kbs)

    My casting rig:
    8 core x 4.2Ghz AMD processor
    16GB ram
    nVidia geForce 670 (using nVidia's propriatery drivers)
    Kubuntu 15.10 (using KDE5)
    Bose USB audio
    Blue Yeti Microphone

     

Log in to post a comment.