Menu

Microphone Troubles With Linux

Help
2004-06-12
2012-09-22
  • Chad Metcalf

    Chad Metcalf - 2004-06-12

    I'm probably missing something easy but I can't get any version of sphinx (4, 3RC, 3, or 2) to work with a microphone.

    I've got two systems a Dell workstation and a Dell laptop. The workstation has a generic Intel 810 soundcard and is running Fedora Core 1 and Redhat 9.0. The laptop has a SigmaTel sound card and is running Redhat 9.0 and Windows XP Pro.

    I've tried running all the demos and tests and I generally get the same result "couldn't find suitable target audio format"

    I've tested the machines ability to record from the mike with rec, record, and gnome-sound-recorder which all work.

    I can't get the AudioTool to record with jdk 1.4.2 or 1.5.0. For both I get "javax.sound.sampled.LineUnavailableException: Unsupported format: PCM_SIGNED 16000.0 Hz, 16 bit, mono, 2 bytes/frame, big-endian"

    sphinx-simple (with sphinx3 release candiate) doesn't bail but only records silence.

    I've made sure the mixer settings are up for all devices). I've tried setting line, igain, etc as the recording device.

    Thanks in advance for any help...

     
    • Paul Lamere

      Paul Lamere - 2004-06-14

      Hiro:

      Let's see what JavaSound sees for sound inputs on your system. Attached is some java code. Try compiling and running it. It reports the entire set of audio inputs available on your system. It should help us understand what is going on with your system.

      Cut and paste the following code into a file called SoundTest.java, compile and run it and see what it outputs. On my linux box I get output like so:

      $ javac SoundTest.java
      $ java SoundTest
      Mixers ...
      Mixers: Java Sound Audio Engine, version 1.0
      Mixers: Linux,dev/dsp,multi threaded, version Unknown Version
      Line info: interface TargetDataLine supporting 72 audio formats
         format ULAW, 8000.0 Hz, 8 bit, mono, audio data
         format ULAW, 11025.0 Hz, 8 bit, mono, audio data
         format ULAW, 16000.0 Hz, 8 bit, mono, audio data
         format ULAW, 22050.0 Hz, 8 bit, mono, audio data
         format ULAW, 32000.0 Hz, 8 bit, mono, audio data
      // .. lots of formats omitted

      // =========== SoundTest.java ===================
      import javax.sound.sampled.*;

      public class SoundTest {

          public static void main(String[] args) {

              System.out.println("Mixers ...");

              Mixer.Info[] mixers = AudioSystem.getMixerInfo();

              for (int i = 0; i < mixers.length; i++) {
                  System.out.println("Mixers: " + mixers[i]);
              }
              Line.Info[] lineInfos = AudioSystem.getTargetLineInfo
                  (new Line.Info(TargetDataLine.class));

              AudioFormat nativeFormat = null;
         
              // find a usable target line
              for (int i = 0; i < lineInfos.length; i++) {
                 
                  System.out.println("Line info: " + lineInfos[i]);
                  AudioFormat[] formats =
                      ((TargetDataLine.Info)lineInfos[i]).getFormats();
            
                  for (int j = 0; j < formats.length; j++) {
                      System.out.println("   format " + formats[j]);
                  }
              }
              System.exit(0);
          }
      }

       
      • Chad Metcalf

        Chad Metcalf - 2004-06-29

        Paul,
        Sorry for the delay but here are my results, they appear to be the same as Mike's above.

        mrburns:~> java SoundTest
        Mixers ...
        Mixers: Java Sound Audio Engine, version 1.0
        Mixers: Linux,dev/dsp,multi threaded, version Unknown Version
        Line info: interface TargetDataLine supporting 36 audio formats
        format ULAW 8000.0 Hz, 8 bit, stereo, 2 bytes/frame,
        format ULAW 11025.0 Hz, 8 bit, stereo, 2 bytes/frame,
        format ULAW 16000.0 Hz, 8 bit, stereo, 2 bytes/frame,
        format ULAW 22050.0 Hz, 8 bit, stereo, 2 bytes/frame,
        format ULAW 32000.0 Hz, 8 bit, stereo, 2 bytes/frame,
        format ULAW 44100.0 Hz, 8 bit, stereo, 2 bytes/frame,
        format ALAW 8000.0 Hz, 8 bit, stereo, 2 bytes/frame,
        format ALAW 11025.0 Hz, 8 bit, stereo, 2 bytes/frame,
        format ALAW 16000.0 Hz, 8 bit, stereo, 2 bytes/frame,
        format ALAW 22050.0 Hz, 8 bit, stereo, 2 bytes/frame,
        format ALAW 32000.0 Hz, 8 bit, stereo, 2 bytes/frame,
        format ALAW 44100.0 Hz, 8 bit, stereo, 2 bytes/frame,
        format PCM_SIGNED 8000.0 Hz, 8 bit, stereo, 2 bytes/frame,
        format PCM_UNSIGNED 8000.0 Hz, 8 bit, stereo, 2 bytes/frame,
        format PCM_SIGNED 11025.0 Hz, 8 bit, stereo, 2 bytes/frame,
        format PCM_UNSIGNED 11025.0 Hz, 8 bit, stereo, 2 bytes/frame,
        format PCM_SIGNED 16000.0 Hz, 8 bit, stereo, 2 bytes/frame,
        format PCM_UNSIGNED 16000.0 Hz, 8 bit, stereo, 2 bytes/frame,
        format PCM_SIGNED 22050.0 Hz, 8 bit, stereo, 2 bytes/frame,
        format PCM_UNSIGNED 22050.0 Hz, 8 bit, stereo, 2 bytes/frame,
        format PCM_SIGNED 32000.0 Hz, 8 bit, stereo, 2 bytes/frame,
        format PCM_UNSIGNED 32000.0 Hz, 8 bit, stereo, 2 bytes/frame,
        format PCM_SIGNED 44100.0 Hz, 8 bit, stereo, 2 bytes/frame,
        format PCM_UNSIGNED 44100.0 Hz, 8 bit, stereo, 2 bytes/frame,
        format PCM_SIGNED 8000.0 Hz, 16 bit, stereo, 4 bytes/frame, big-endian
        format PCM_SIGNED 8000.0 Hz, 16 bit, stereo, 4 bytes/frame, little-endian
        format PCM_SIGNED 11025.0 Hz, 16 bit, stereo, 4 bytes/frame, big-endian
        format PCM_SIGNED 11025.0 Hz, 16 bit, stereo, 4 bytes/frame, little-endian
        format PCM_SIGNED 16000.0 Hz, 16 bit, stereo, 4 bytes/frame, big-endian
        format PCM_SIGNED 16000.0 Hz, 16 bit, stereo, 4 bytes/frame, little-endian
        format PCM_SIGNED 22050.0 Hz, 16 bit, stereo, 4 bytes/frame, big-endian
        format PCM_SIGNED 22050.0 Hz, 16 bit, stereo, 4 bytes/frame, little-endian
        format PCM_SIGNED 32000.0 Hz, 16 bit, stereo, 4 bytes/frame, big-endian
        format PCM_SIGNED 32000.0 Hz, 16 bit, stereo, 4 bytes/frame, little-endian
        format PCM_SIGNED 44100.0 Hz, 16 bit, stereo, 4 bytes/frame, big-endian
        format PCM_SIGNED 44100.0 Hz, 16 bit, stereo, 4 bytes/frame, little-endian

         
    • harsh

      harsh - 2004-06-25

      This is the output at my machine, where I too face the same problem. Hope you can help.
      Output:-
      Mixers ...
      Mixers: Java Sound Audio Engine, version 1.0
      Mixers: Linux,dev/dsp,multi threaded, version Unknown Version
      Line info: interface TargetDataLine

      bye

       
      • Paul Lamere

        Paul Lamere - 2004-06-25

        Harsh:

        We've seen this problem with JavaSound on certain linux machines, especially those that are running ALSA sound with JDK 1.5. What version of the JDK are you using? Do you know if you have ALSA installed on your system? (use the /sbin/lsmod command to list the installed modules on your system).

        Paul

         
    • Mike

      Mike - 2004-06-29

      I have been having the same problem, below is the output after I compiled and ran your Java program.

      I would appreciate any input you could offer.

      Mixers ...
      Mixers: Java Sound Audio Engine, version 1.0
      Mixers: Linux,dev/dsp,multi threaded, version Unknown Version
      Line info: interface TargetDataLine supporting 36 audio formats
      format ULAW, 8178.0 Hz, 8 bit, stereo, audio data
      format ULAW, 11025.0 Hz, 8 bit, stereo, audio data
      format ULAW, 16000.0 Hz, 8 bit, stereo, audio data
      format ULAW, 22050.0 Hz, 8 bit, stereo, audio data
      format ULAW, 32000.0 Hz, 8 bit, stereo, audio data
      format ULAW, 44100.0 Hz, 8 bit, stereo, audio data
      format ALAW, 8178.0 Hz, 8 bit, stereo, audio data
      format ALAW, 11025.0 Hz, 8 bit, stereo, audio data
      format ALAW, 16000.0 Hz, 8 bit, stereo, audio data
      format ALAW, 22050.0 Hz, 8 bit, stereo, audio data
      format ALAW, 32000.0 Hz, 8 bit, stereo, audio data
      format ALAW, 44100.0 Hz, 8 bit, stereo, audio data
      format PCM_SIGNED, 8178.0 Hz, 8 bit, stereo, audio data
      format PCM_UNSIGNED, 8178.0 Hz, 8 bit, stereo, audio data
      format PCM_SIGNED, 11025.0 Hz, 8 bit, stereo, audio data
      format PCM_UNSIGNED, 11025.0 Hz, 8 bit, stereo, audio data
      format PCM_SIGNED, 16000.0 Hz, 8 bit, stereo, audio data
      format PCM_UNSIGNED, 16000.0 Hz, 8 bit, stereo, audio data
      format PCM_SIGNED, 22050.0 Hz, 8 bit, stereo, audio data
      format PCM_UNSIGNED, 22050.0 Hz, 8 bit, stereo, audio data
      format PCM_SIGNED, 32000.0 Hz, 8 bit, stereo, audio data
      format PCM_UNSIGNED, 32000.0 Hz, 8 bit, stereo, audio data
      format PCM_SIGNED, 44100.0 Hz, 8 bit, stereo, audio data
      format PCM_UNSIGNED, 44100.0 Hz, 8 bit, stereo, audio data
      format PCM_SIGNED, 8178.0 Hz, 16 bit, stereo, big-endian, audio data
      format PCM_SIGNED, 8178.0 Hz, 16 bit, stereo, little-endian, audio data
      format PCM_SIGNED, 11025.0 Hz, 16 bit, stereo, big-endian, audio data
      format PCM_SIGNED, 11025.0 Hz, 16 bit, stereo, little-endian, audio data
      format PCM_SIGNED, 16000.0 Hz, 16 bit, stereo, big-endian, audio data
      format PCM_SIGNED, 16000.0 Hz, 16 bit, stereo, little-endian, audio data
      format PCM_SIGNED, 22050.0 Hz, 16 bit, stereo, big-endian, audio data
      format PCM_SIGNED, 22050.0 Hz, 16 bit, stereo, little-endian, audio data
      format PCM_SIGNED, 32000.0 Hz, 16 bit, stereo, big-endian, audio data
      format PCM_SIGNED, 32000.0 Hz, 16 bit, stereo, little-endian, audio data
      format PCM_SIGNED, 44100.0 Hz, 16 bit, stereo, big-endian, audio data
      format PCM_SIGNED, 44100.0 Hz, 16 bit, stereo, little-endian, audio data

       
    • Philip Kwok

      Philip Kwok - 2004-06-30

      Hi Chad & Mike,

      Thank you for posting the test results. The problem is that your sounds cards are stereo only, whereas S4 is expecting mono. When the audio is stereo, you get twice the amount of audio data, one from each of the two channels. I'm working on a fix in the Microphone class that will take care of it. The fix is simply take the average of the samples from each channel. It might take a little, since I also need to make sure the time stamps are correct as well. I will post a message here once I've fixed it.

      Thanks for your patience, and thank you for interest in Sphinx-4.

      philip

       
      • Anonymous

        Anonymous - 2004-06-30

        Philip -- may I suggest that instead of averaging the corresponding samples from the two stereo channels (which implies an assumption that they were sampled at the same instant), that you instead select one or the other (perhaps allowing the user to select which one, if it should matter).  That would involve no assumptions about the precise realtionship between the channels.

        jerry wolf
        soliloquy learning, inc.

         
    • Philip Kwok

      Philip Kwok - 2004-06-30

      Jerry,

      Actually, that's a great idea. Someone on our team did suggest selecting audio from one channel only. Let me do this. I will make the default be to average the channels. However, I also allow the user to select other methods (via configuration), such as summing the channels, or simply choosing one channel.

      Thanks for the suggestion!

      philip

       
    • Philip Kwok

      Philip Kwok - 2004-06-30

      Hi Chad & Mike,

      I've made the Microphone class capable of converting stereo to mono. However, I don't have a machine that does stereo only, so I wasn't able to fully test it from your perspectively. I've checked the code into CVS sourceforge, so you should get it if you do an update. Otherwise, I've posted it below. You should have to change any of the configs, ideally it should just pick up the stereo line (this is the part I want you to test). Please let me know how it works. (Keeping my fingers crossed...)

      philip

      PS: It does averaging by default, if you want to select a particular channel, set the Microphone property:

      <property name="stereoToMono" value="selectChannel"/>
      <property name="selectChannel" value="0"/>

      ----- Microphone.java -----

      /*
      * Copyright 1999-2004 Carnegie Mellon University. 
      * Portions Copyright 2004 Sun Microsystems, Inc. 
      * Portions Copyright 2004 Mitsubishi Electric Research Laboratories.
      * All Rights Reserved.  Use is subject to license terms.
      *
      * See the file "license.terms" for information on usage and
      * redistribution of this file, and for a DISCLAIMER OF ALL
      * WARRANTIES.
      *
      */

      package edu.cmu.sphinx.frontend.util;

      import java.io.IOException;
      import java.util.LinkedList;
      import java.util.List;
      import java.util.logging.Logger;
      import java.util.logging.Level;

      import javax.sound.sampled.AudioFormat;
      import javax.sound.sampled.AudioInputStream;
      import javax.sound.sampled.AudioSystem;
      import javax.sound.sampled.DataLine;
      import javax.sound.sampled.Line;
      import javax.sound.sampled.LineUnavailableException;
      import javax.sound.sampled.TargetDataLine;
      import javax.sound.sampled.LineListener;
      import javax.sound.sampled.LineEvent;

      import edu.cmu.sphinx.frontend.BaseDataProcessor;
      import edu.cmu.sphinx.frontend.Data;
      import edu.cmu.sphinx.frontend.DataEndSignal;
      import edu.cmu.sphinx.frontend.DataProcessingException;
      import edu.cmu.sphinx.frontend.DataStartSignal;
      import edu.cmu.sphinx.frontend.DoubleData;
      import edu.cmu.sphinx.util.props.PropertyException;
      import edu.cmu.sphinx.util.props.PropertySheet;
      import edu.cmu.sphinx.util.props.PropertyType;
      import edu.cmu.sphinx.util.props.Registry;

      /**
      * <p>
      * A Microphone captures audio data from the system's underlying
      * audio input systems. Converts these audio data into Data
      * objects. When the method <code>startRecording()</code> is called,
      * a new thread will be created and used to capture
      * audio, and will stop when <code>stopRecording()</code>
      * is called. Calling <code>getData()</code> returns the captured audio
      * data as Data objects.
      * </p>
      * <p>
      * This Microphone will attempt to obtain an audio device with the format
      * specified in the configuration. If such a device with that format
      * cannot be obtained, it will try to obtain a device with an audio format
      * that has a higher sample rate than the configured sample rate,
      * while the other parameters of the format (i.e., sample size, endianness,
      * sign, and channel) remain the same. If, again, no such device can be
      * obtained, it flags an error, and a call <code>startRecording</code>
      * returns false.
      * </p>
      */
      public class Microphone extends BaseDataProcessor {

          /**
           * SphinxProperty for the sample rate of the data.
           */
          public static final String PROP_SAMPLE_RATE = "sampleRate";

          /**
           * Default value for PROP_SAMPLE_RATE.
           */
          public static final int PROP_SAMPLE_RATE_DEFAULT = 16000;

          /**
           * Sphinx property that specifies whether or not the microphone
           * will release the audio between utterances.  On certain systems
           * (linux for one), closing and reopening the audio does not work
           * too well. The default is false for Linux systems, true for others.
           */
          public final static String PROP_CLOSE_BETWEEN_UTTERANCES =
          "closeBetweenUtterances";

          /**
           * Default value for PROP_CLOSE_BETWEEN_UTTERANCES.
           */
          public final static boolean PROP_CLOSE_BETWEEN_UTTERANCES_DEFAULT = true;

          /**
           * The Sphinx property that specifies the number of milliseconds of
           * audio data to read each time from the underlying Java Sound audio
           * device.
           */
          public final static String PROP_MSEC_PER_READ = "msecPerRead";

          /**
           * The default value of PROP_MSEC_PER_READ.
           */
          public final static int PROP_MSEC_PER_READ_DEFAULT = 10;

          /**
           * SphinxProperty for the number of bits per value.
           */
          public static final String PROP_BITS_PER_SAMPLE = "bitsPerSample";

          /**
           * Default value for PROP_BITS_PER_SAMPLE.
           */
          public static final int PROP_BITS_PER_SAMPLE_DEFAULT = 16;

          /**
           * Property specifying the number of channels.
           */
          public static final String PROP_CHANNELS = "channels";

          /**
           * Default value for PROP_CHANNELS.
           */
          public static final int PROP_CHANNELS_DEFAULT = 1;

          /**
           * Property specify the endianness of the data.
           */
          public static final String PROP_BIG_ENDIAN = "bigEndian";

          /**
           * Default value for PROP_BIG_ENDIAN.
           */
          public static final boolean PROP_BIG_ENDIAN_DEFAULT = true;

          /**
           * Property specify whether the data is signed.
           */
          public static final String PROP_SIGNED = "signed";

          /**
           * Default value for PROP_SIGNED.
           */
          public static final boolean PROP_SIGNED_DEFAULT = true;

          /**
           * The Sphinx property that specifies whether to keep the audio
           * data of an utterance around until the next utterance is recorded.
           */
          public final static String PROP_KEEP_LAST_AUDIO = "keepLastAudio";

          /**
           * The default value of PROP_KEEP_AUDIO.
           */
          public final static boolean PROP_KEEP_LAST_AUDIO_DEFAULT = false;

          /**
           * The Sphinx property that specifies how to convert stereo audio to mono.
           * Currently, the possible values are "average", which averages the
           * samples from at each channel, or "selectChannel", which chooses
           * audio only from that channel. If you choose "selectChannel",
           * you should also specify which channel to use with the "selectChannel"
           * property.
           */
          public final static String PROP_STEREO_TO_MONO = "stereoToMono";

          /**
           * The default value of PROP_STEREO_TO_MONO.
           */
          public final static String PROP_STEREO_TO_MONO_DEFAULT = "average";

          /**
           * The Sphinx property that specifies the channel to use if the audio
           * is stereo
           */
          public final static String PROP_SELECT_CHANNEL = "selectChannel";

          /**
           * The default value of PROP_SELECT_CHANNEL.
           */
          public final static int PROP_SELECT_CHANNEL_DEFAULT = 0;

          private AudioFormat finalFormat;
          private AudioInputStream audioStream = null;
          private TargetDataLine audioLine = null;
          private DataList audioList;
          private Utterance currentUtterance;
          private boolean doConversion = false;
          private int audioBufferSize = 160000;
          private volatile boolean recording = false;
          private volatile boolean utteranceEndReached = true;

          // Configuration data

          private AudioFormat desiredFormat;
          private Logger logger;
          private boolean closeBetweenUtterances;
          private boolean keepDataReference;
          private boolean signed;
          private int frameSizeInBytes;
          private int msecPerRead;
          private int selectedChannel;
          private String stereoToMono;

         
          /*
           * (non-Javadoc)
           *
           * @see edu.cmu.sphinx.util.props.Configurable#register(java.lang.String,
           *      edu.cmu.sphinx.util.props.Registry)
           */
          public void register(String name, Registry registry)
              throws PropertyException {
              super.register(name, registry);
              registry.register(PROP_SAMPLE_RATE, PropertyType.INT);
          registry.register(PROP_CLOSE_BETWEEN_UTTERANCES, PropertyType.BOOLEAN);
              registry.register(PROP_MSEC_PER_READ, PropertyType.INT);
              registry.register(PROP_BITS_PER_SAMPLE, PropertyType.INT);
              registry.register(PROP_CHANNELS, PropertyType.INT);
              registry.register(PROP_BIG_ENDIAN, PropertyType.BOOLEAN);
              registry.register(PROP_SIGNED, PropertyType.BOOLEAN);
              registry.register(PROP_KEEP_LAST_AUDIO, PropertyType.BOOLEAN);
              registry.register(PROP_STEREO_TO_MONO, PropertyType.STRING);
              registry.register(PROP_SELECT_CHANNEL, PropertyType.INT);
          }

          /*
           * (non-Javadoc)
           *
           * @see edu.cmu.sphinx.util.props.Configurable#newProperties(edu.cmu.sphinx.util.props.PropertySheet)
           */
          public void newProperties(PropertySheet ps) throws PropertyException {
              super.newProperties(ps);
              logger = ps.getLogger();

              int sampleRate = ps.getInt(PROP_SAMPLE_RATE, PROP_SAMPLE_RATE_DEFAULT);

              int sampleSizeInBits = ps.getInt
                  (PROP_BITS_PER_SAMPLE, PROP_BITS_PER_SAMPLE_DEFAULT);

              int channels = ps.getInt(PROP_CHANNELS, PROP_CHANNELS_DEFAULT);

              boolean bigEndian =
                  ps.getBoolean(PROP_BIG_ENDIAN, PROP_BIG_ENDIAN_DEFAULT);

              signed = ps.getBoolean(PROP_SIGNED, PROP_SIGNED_DEFAULT);

              desiredFormat = new AudioFormat
                  ((float)sampleRate, sampleSizeInBits, channels, signed, bigEndian);
             
          closeBetweenUtterances = ps.getBoolean
                  (PROP_CLOSE_BETWEEN_UTTERANCES,
                   PROP_CLOSE_BETWEEN_UTTERANCES_DEFAULT);
             
              msecPerRead = ps.getInt(PROP_MSEC_PER_READ,
                                      PROP_MSEC_PER_READ_DEFAULT);

              keepDataReference = ps.getBoolean
                  (PROP_KEEP_LAST_AUDIO, PROP_KEEP_LAST_AUDIO_DEFAULT);

              stereoToMono = ps.getString
                  (PROP_STEREO_TO_MONO, PROP_STEREO_TO_MONO_DEFAULT);

              selectedChannel = ps.getInt
                  (PROP_SELECT_CHANNEL, PROP_SELECT_CHANNEL_DEFAULT);
          }

          /**
           * Constructs a Microphone with the given InputStream.
           *
           * @throws IOException if an I/O error occurs
           */
          public void initialize() {
              super.initialize();
              audioList = new DataList();

              DataLine.Info  info
                      = new DataLine.Info(TargetDataLine.class, desiredFormat);
            
              if (!AudioSystem.isLineSupported(info)) {
                  logger.info(desiredFormat + " not supported");
                  AudioFormat nativeFormat = getNativeAudioFormat(desiredFormat);
                  if (nativeFormat == null) {
                      logger.severe("couldn't find suitable target audio format");
                      return;
                  } else {
                      finalFormat = nativeFormat;
                     
                      /* convert from native to the desired format if supported */
                      doConversion = AudioSystem.isConversionSupported
                          (desiredFormat, nativeFormat);
                     
                      if (doConversion) {
                          logger.info
                              ("Converting from " + finalFormat.getSampleRate()
                               + "Hz to " + desiredFormat.getSampleRate() + "Hz");
                      } else {
                          logger.info
                              ("Using native format: Cannot convert from " +
                               finalFormat.getSampleRate() + "Hz to " +
                               desiredFormat.getSampleRate() + "Hz");
                      }
                  }
              } else {
                  logger.info("Desired format: " + desiredFormat + " supported.");
                  finalFormat = desiredFormat;
              }

              /* Obtain and open the line and stream. */
              try {
                  logger.info("Final format: " + finalFormat);
                  info = new DataLine.Info(TargetDataLine.class, finalFormat);
                  audioLine = (TargetDataLine) AudioSystem.getLine(info);

                  // add a line listener that just traces
                  // the line states
                  audioLine.addLineListener(new LineListener() {
                          public  void update(LineEvent event) {
                              logger.info("line listener " + event);
                          }
                  });
              } catch (LineUnavailableException e) {
                  logger.severe("microphone unavailable " + e.getMessage());
              }
          }

          /**
           * Opens the audio capturing device so that it will be ready
           * for capturing audio. Attempts to create a converter if the
           * requested audio format is not directly available.
           *
           * @return true if the audio capturing device is opened successfully;
           *     false otherwise
           */
          private boolean open() {
              if (audioLine != null) {
                  if (!audioLine.isOpen()) {

                      /* open the audio line */
                      logger.info("open");
                      try {
                          audioLine.open(finalFormat, audioBufferSize);
                      } catch (LineUnavailableException e) {
                          logger.severe("Can't open microphone " + e.getMessage());
                          return false;
                      }

                      audioStream = new AudioInputStream(audioLine);
                      if (doConversion) {
                          audioStream = AudioSystem.getAudioInputStream
                              (desiredFormat, audioStream);
                          assert (audioStream != null);
                      }

                      /* Set the frame size depending on the sample rate. */
                      float sec = ((float) msecPerRead) / 1000.f;
                      frameSizeInBytes =
                          (audioStream.getFormat().getSampleSizeInBits() / 8) *
                          (int) (sec * audioStream.getFormat().getSampleRate());

                      logger.info("Frame size: " + frameSizeInBytes + " bytes");
                  }
                  return true;
              } else {
                  logger.severe("Can't find microphone");
                  return false;
              }
          }

          /**
           * Returns the format of the audio recorded by this Microphone.
           * Note that this might be different from the configured format.
           *
           * @return the current AudioFormat
           */
          public AudioFormat getAudioFormat() {
              return finalFormat;
          }

          /**
           * Returns the current Utterance.
           *
           * @return the current Utterance
           */
          public Utterance getUtterance() {
              return currentUtterance;
          }

          /**
           * Returns true if this Microphone is recording.
           *
           * @return true if this Microphone is recording, false otherwise
           */
          public boolean isRecording() {
              return recording;
          }

          /**
           * Starts recording audio. This method will return only
           * when a START event is received, meaning that this Microphone
           * has started capturing audio.
           *
           * @return true if the recording started successfully; false otherwise
           */
          public synchronized boolean startRecording() {
          if (recording) {
              return false;
          }
              if (!open()) {
                  return false;
              }
          utteranceEndReached = false;
          recording = true;
          if (audioLine.isRunning()) {
              logger.severe("Whoops: audio line is running");
          }
          RecordingThread recorder = new RecordingThread("Microphone");
          recorder.start();
          return true;
          }

          /**
           * Stops recording audio.
           */
          public synchronized void stopRecording() {
              if (audioLine != null) {
                  recording = false;
              audioLine.stop();
              }
          }

          /**
           * This Thread records audio, and caches them in an audio buffer.
           */
          class RecordingThread extends Thread {

              private boolean endOfStream = false;
              private volatile boolean started = false;
              private long totalSamplesRead = 0;

              /**
               * Creates the thread with the given name
               *
               * @param name the name of the thread
               */
              public RecordingThread(String name) {
                  super(name);
              }

              /**
               * Starts the thread, and waits for recorder to be ready
               */
              public void start() {
                  started = false;
                  super.start();
                  waitForStart();
              }

              /**
               * Implements the run() method of the Thread class.
               * Records audio, and cache them in the audio buffer.
               */
              public void run() {           
              totalSamplesRead = 0;
              logger.info("started recording");
             
              if (keepDataReference) {
              currentUtterance = new Utterance
                          ("Microphone", audioStream.getFormat());
              }
             
              audioList.add(new DataStartSignal());
              logger.info("DataStartSignal added");
              try {
              audioLine.start();
              while (!endOfStream) {
                          Data data = readData(currentUtterance);
                  if (data == null) {
                  break;
                  }
                  audioList.add(data);
              }
                      audioLine.flush();
                      if (closeBetweenUtterances) {
                          audioStream.close();
                      }
              } catch (IOException ioe) {
                      logger.warning("IO Exception " + ioe.getMessage());
              }
              long duration = (long)
              (((double)totalSamplesRead/
                (double)audioStream.getFormat().getSampleRate())*1000.0);
             
              audioList.add(new DataEndSignal(duration));
              logger.info("DataEndSignal ended");
              logger.info("stopped recording");       
          }

              /**
               * Waits for the recorder to start
               */
              private synchronized void  waitForStart() {
                  // note that in theory we coulde use a LineEvent START
                  // to tell us when the microphone is ready, but we have
                  // found that some javasound implementations do not always
                  // issue this event when a line  is opened, so this is a
                  // WORKAROUND.

                  try {
                      while (!started) {
                          wait();
                      }
                  } catch (InterruptedException ie) {
                      logger.warning("wait was interrupted");
                  }
              }

              /**
               * Reads one frame of audio data, and adds it to the given Utterance.
               *
               * @return an Data object containing the audio data
               */
              private Data readData(Utterance utterance) throws IOException {

                  // Read the next chunk of data from the TargetDataLine.
                  byte[] data = new byte[frameSizeInBytes];

                  int channels = audioStream.getFormat().getChannels();
                  long collectTime = System.currentTimeMillis();
                  long firstSampleNumber = totalSamplesRead / channels;
                 
                  int numBytesRead = audioStream.read(data, 0, data.length);

                  //  notify the waiters upon start
                  if (!started) {
                      synchronized (this) {
                          started = true;
                          notifyAll();
                      }
                  }

                  if (logger.isLoggable(Level.FINE)) {
                      logger.info("Read " + numBytesRead
                                  + " bytes from audio stream.");
                  }
                  if (numBytesRead <= 0) {
                      endOfStream = true;
                      return null;
                  }
                  int sampleSizeInBytes =
                      audioStream.getFormat().getSampleSizeInBits() / 8;
                  totalSamplesRead += (numBytesRead / sampleSizeInBytes);
                 
                  if (numBytesRead != frameSizeInBytes) {
                     
                      if (numBytesRead % sampleSizeInBytes != 0) {
                          throw new Error("Incomplete sample read.");
                      }
                     
                      byte[] shrinked = new byte[numBytesRead];
                      System.arraycopy(data, 0, shrinked, 0, numBytesRead);
                      data = shrinked;
                  }
                 
                  if (keepDataReference) {
                      utterance.add(data);
                  }
                 
                  double[] samples = DataUtil.bytesToValues
                      (data, 0, data.length, sampleSizeInBytes, signed);

                  if (channels > 1) {
                      samples = convertStereoToMono(samples, channels);
                  }

                  return (new DoubleData
                          (samples, (int) audioStream.getFormat().getSampleRate(),
                           collectTime, firstSampleNumber));
              }
          }

          /**
           * Converts stereo audio to mono.
           *
           * @param samples the audio samples, each double in the array is one sample
           * @param channels the number of channels in the stereo audio
           */
          private double[] convertStereoToMono(double[] samples, int channels) {
              assert (samples.length % channels == 0);
              double[] finalSamples = new double[samples.length/channels];
              if (stereoToMono.equals("average")) {
                  for (int i = 0, j = 0; i < samples.length; j++) {
                      double sum = samples[i++];
                      for (int c = 1; c < channels; c++) {
                          sum += samples[i++];
                      }
                      finalSamples[j] = sum / channels;
                  }
              } else if (stereoToMono.equals("selectChannel")) {
                  for (int i = selectedChannel, j = 0; i < samples.length;
                       i += channels, j++) {
                      finalSamples[j] = samples[i];
                  }
              } else {
                  throw new Error("Unsupported stereo to mono conversion: " +
                                  stereoToMono);
              }
              return finalSamples;
          }       

          /**
           * Returns a native audio format that has the same encoding, number
           * of channels, endianness and sample size as the given format,
           * and a sample rate that is larger than the given sample rate.
           *
           * @return a suitable native audio format
           */
          private static AudioFormat getNativeAudioFormat(AudioFormat format) {
              // try to do sample rate conversion
              Line.Info[] lineInfos = AudioSystem.getTargetLineInfo
                  (new Line.Info(TargetDataLine.class));

              AudioFormat nativeFormat = null;

              // find a usable target line
              for (int i = 0; i < lineInfos.length; i++) {
                 
                  AudioFormat[] formats =
                      ((TargetDataLine.Info)lineInfos[i]).getFormats();
                 
                  for (int j = 0; j < formats.length; j++) {
                     
                      // for now, just accept downsampling, not checking frame
                      // size/rate (encoding assumed to be PCM)
                     
                      AudioFormat thisFormat = formats[j];
                      if (thisFormat.getEncoding() == format.getEncoding()
                          && thisFormat.isBigEndian() == format.isBigEndian()
                          && thisFormat.getSampleSizeInBits() ==
                          format.getSampleSizeInBits()
                          && thisFormat.getSampleRate() > format.getSampleRate()) {
                          nativeFormat = thisFormat;
                          break;
                      }
                  }
                  if (nativeFormat != null) {
                      //no need to look through remaining lineinfos
                      break;
                  }
              }
              return nativeFormat;
          }

          /**
           * Clears all cached audio data.
           */
          public void clear() {
              audioList = new DataList();
          }

          /**
           * Reads and returns the next Data object from this
           * Microphone, return null if there is no more audio data.
           * All audio data captured in-between <code>startRecording()</code>
           * and <code>stopRecording()</code> is cached in an Utterance
           * object. Calling this method basically returns the next
           * chunk of audio data cached in this Utterance.
           *
           * @return the next Data or <code>null</code> if none is
           *         available
           *
           * @throws DataProcessingException if there is a data processing error
           */
          public Data getData() throws DataProcessingException {

              getTimer().start();

              Data output = null;

              if (!utteranceEndReached) {
                  output = (Data) audioList.remove();
                  if (output instanceof DataEndSignal) {
                      utteranceEndReached = true;
                  }
              }

              getTimer().stop();

              // signalCheck(output);

              return output;
          }

          /**
           * Returns true if there is more data in the Microphone.
           * This happens either if getRecording() return true, or if the
           * buffer in the Microphone has a size larger than zero.
           *
           * @return true if there is more data in the Microphone
           */
          public boolean hasMoreData() {
              boolean moreData;
              synchronized (audioList) {
                  moreData = (!utteranceEndReached || audioList.size() > 0);
              }
              return moreData;
          }
      }

      /**
      * Manages the data as a FIFO queue
      */
      class DataList {

          private List list;

          /**
           * Creates a new data list
           */
          public DataList() {
              list = new LinkedList();
          }

          /**
           * Adds a data to the queue
           *
           * @param data the data to add
           */
          public synchronized void add(Data data) {
              list.add(data);
              notify();
          }

          /**
           * Returns the current size of the queue
           *
           * @return the size of the queue
           */
          public synchronized int size() {
              return list.size();
          }

          /**
           * Removes the oldest item on the queue
           *
           * @return the oldest item
           */
          public synchronized Data remove() {
              try {
                  while (list.size() == 0) {
                      // System.out.println("Waiting...");
                      wait();
                  }
              } catch (InterruptedException ie) {
                  ie.printStackTrace();
              }
              Data data = (Data) list.remove(0);
              if (data == null) {
                  System.out.println("DataList is returning null.");
              }
              return data;
          }
      }

       

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.