|
From: David G. <gr...@un...> - 2003-01-30 02:20:54
|
You didn't mention what sort of cpu you're running (big-endian or little?) I'd like to point out that I created an alternative to w_decode, which you might find useful: ftp://ftp.ldc.upenn.edu/pub/ldc/misc_sw/sph2pipe_v2.3.tar.gz This reads any sphere file (shorten compressed or not, ulaw or pcm, single or dual channel), and produces output of your choosing to stdout -- and will always format 16-bit samples according to the current machine's byte order. I'll copy the usage below to give an idea of what it does. As for your observations, I'd be interested to hear which speech corpus you're dealing with (I should be able to locate my own copy and check the files myself.) ri...@MI... said: > ... if I play (using the "play" front-end to sox) the .sph [ulaw] > files, I hear the speaker, somewhat noisy. Could be your DAC (or sox's use of it) doesn't really handle ulaw. > If I play 1081.pcm (which is still a > sphere file, with a 2048-byte NIST header), it's a much cleaner > version of 1081.sph (I assume the conversion from 8-bit ulaw to 16-bit > PCM somehow helps, although I'm not sure why, since it's going through > play in both cases). If I play 3968.pcm, I hear high-energy static. It would be worthwhile to peek into the two pcm files (and the ulaw files too, for that matter) with "od" or an equivalent, to see what things look like at the end of each header; maybe you could transfer them to me somehow (I have reason to want to see for myself) -- and again, indicate where these files came from. (Don't need to copy the whole list on this.) > w_decode -o pcm_10 3968.sph 3968_wd10.pcm > > Again, high energy static. This time, the sample_byte_format is 10 > (as expected), and if I change it to 01, it plays clean. But this > time, if I just strip off the header with tail +2049c, I again get the > high energy static. This is consistent with the earlier observation about this file, if you have a little-endian machine... sox is noting the "10" (hi-byte first) for sample_byte_format in sphere header, and swapping the bytes as necessary before passing them on to the DAC; by removing the header and taking away away that clue, sox sees no reason to swab. If there is already a byte-alignment error in the file, the result is the opposite of what it's "supposed" to be. As promised, here is the usage for "sph2pipe" -- if you can, I'd like to know how it does with your two files: sph2pipe [-h hdr] [-t|-s bgn:end] [-c 1|2] [-p|-u] [-f sph|wav|raw|au|aif] infile default output conditions: * output full duration of input file * output all channels from input file * output same sample coding as input file * output format is WAV on Wintel machines, SPH on UN*X optional controls (items bracketed separately above can be combined): -h hdr -- treat infile as headerless, read sphere info from file 'hdr' -t bgn:end -- output portion between bgn and end sec (floating point) -s bgn:end -- output portion between bgn and end samples (integer) -c 1 -- only output first channel -c 2 -- only output second channel -p -- force conversion to 16-bit linear pcm -u -- force conversion to 8-bit mu-law -f typ -- select alternate output format 'typ' ('mac' is accepted as a synonym for 'aif') ----------- David Graff Linguistic Data Consortium gr...@ld... 3600 Market St., Suite 810 voice: (215) 898-0887 University of Pennsylvania fax: (215) 573-2175 Philadelphia, PA 19104 http://www.ldc.upenn.edu |