I am using Sphinx3Align because I need a phone segmentation of my input data.
For exemple, I need for a 'hello' to have h e l l o indicated in time
intervals.
This worked for now and I obtain files like phseg or lab.
Problem is Sphinx is poorly documented. So, I would have 2 questions :
phseg files
I have an output like
SFrm EFrm SegAScr Phone
0 3 -8237 SIL
4 16 -18026 SIL
17 22 -21172 n SIL aeA b
23 26 -13050 aeA n m e
Is it normal to have multiple suggestions for some phonems? I noticed the
first one is also the good one.
.lab files
What is this xlab format? Here is an example:
0.030000 125 SIL
0.160000 125 SIL
0.220000 125 n
The first value seems the end period for a phoneme. But what 125 represents?
Also, I have noticed the chosen phoneme is the good one.
Also, if some of you kno where I can generally find help about parameters and
so on, please tell me. Sphinx seems rather undocumented.
Thank you
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Is it normal to have multiple suggestions for some phonems? I noticed the
first one is also the good one.
They are not multiple suggestions but a context used to match this segment. If
you use context-dependent senone model for alignment (it's better to use
context-independent one) it will output the context in the following format
central phone, left phone, right phone, senone type (b = begin, s = single, i
= middle, e = end)
.lab files What is this xlab format? Here is an example: 0.030000 125 SIL
0.160000 125 SIL 0.220000 125 n The first value seems the end period for a
phoneme. But what 125 represents?
125 tells that it's phone label. In xlab there can be labels of multiple types
- word labels, phone labels, intonation labels. 125 encodes phone label. You
can ignore it if you don't need it
Also, if some of you kno where I can generally find help about parameters
and so on, please tell me.
Extensive documentation is available on the wiki, in forum archive and in
sources. You probably want read more. If you have some specific queries you
are welcome to ask. We are also grateful for documentation improvement
contributions.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hello everyone,
I am using Sphinx3Align because I need a phone segmentation of my input data.
For exemple, I need for a 'hello' to have h e l l o indicated in time
intervals.
This worked for now and I obtain files like phseg or lab.
Problem is Sphinx is poorly documented. So, I would have 2 questions :
I have an output like
SFrm EFrm SegAScr Phone
0 3 -8237 SIL
4 16 -18026 SIL
17 22 -21172 n SIL aeA b
23 26 -13050 aeA n m e
Is it normal to have multiple suggestions for some phonems? I noticed the
first one is also the good one.
What is this xlab format? Here is an example:
0.030000 125 SIL
0.160000 125 SIL
0.220000 125 n
The first value seems the end period for a phoneme. But what 125 represents?
Also, I have noticed the chosen phoneme is the good one.
Also, if some of you kno where I can generally find help about parameters and
so on, please tell me. Sphinx seems rather undocumented.
Thank you
They are not multiple suggestions but a context used to match this segment. If
you use context-dependent senone model for alignment (it's better to use
context-independent one) it will output the context in the following format
central phone, left phone, right phone, senone type (b = begin, s = single, i
= middle, e = end)
125 tells that it's phone label. In xlab there can be labels of multiple types
- word labels, phone labels, intonation labels. 125 encodes phone label. You
can ignore it if you don't need it
Extensive documentation is available on the wiki, in forum archive and in
sources. You probably want read more. If you have some specific queries you
are welcome to ask. We are also grateful for documentation improvement
contributions.
Thank you for the quick and good answer :)