From: Guido Di M. <gui...@TI...> - 2003-03-20 10:37:24
|
Hi, is nobody there ? I'm evaluating if I can use AGLIB to annotate and align my transcript= ions. In order to do that, I would like to figure out how I can write the= =20 parser for my transcription. this is the process I have to implement: 1) The source file is a (dialogs) transcription which is quite simila= r=20 to CHAT/CHILDES format (see an example on the bottom of this message)= . In the transcription there is no sort of time stamps. Hence, I have t= o=20 parse the text and to create the annotation graph WITHOUT temporal= =20 references. 2) The second step should consist in aligning the AG to the sound tra= ck.=20 This step should be done in a semi automatic way; In order to do this I should have to develop a suitable tool. The= =20 alignment process is a separate from step 1) because the the kind of= =20 transcription is too time consuming. 3) The further step could be the editing of the AG in order to edit= =20 existing annotation levels and to add new annotation ones. What do you think about the work-flow I have just presented ? Yesterday I had a look to the AGLIB code and I appreciate the=20 extensibility of the file wrapper. So the problem in carrying out step 1) is to extend the agfio class. The problem is the file parser. As you can see below, the transcripti= on=20 file does NOT have a predefined number of field (record) per line (or= =20 per turn). Thus the Record class does not seem to fit my need. Do you have any suggestion ? The best way to write such a parser is to write my own parser or to u= se=20 lex and yacc ? Thanks in advance. Guido Di Maio =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D *MIC: pi=F9 bello ... no / no pi=F9 bello // =E8 pi=F9 brutto rispett= o + *ANT: agli altri // *MIC: agli altri film di [/] di Villaggio // pi=F9 brutto xxx // cio= =E8 / si=20 vede che lui cerca le battute / per far ridere // non gli vengono= =20 spontanee // quindi ... no / comunque / si ride lo stesso // tutto ..= . *ANT: io / ho visto una scena in televisione // una + era / sai / di= =20 quelle fatte per / presentare i film // per=F2 era simpatico // <c' e= ra> + *MIC: [<] <mah> / # non lo so / =E8 + cio=E8 / mi sono divertito // p= er=F2 /=20 non come altre volte // tipo / Fantozzi / poi / Fantozzi contro tutti= /=20 poi / Il secondo tragico Fantozzi / e tanti altri // insomma / tutto= =20 sommato / <piace> // *ANT: [<] <io mi> [/] io mi ricordo di averne visto uno / che mi piac= eva=20 // a me Villaggio sta antipatico // per=F2 / insomma ... quando ho vi= sto +=20 non mi ricordo com' era il titolo // per=F2 / era simpatico // quando= c'=20 era / &he / il panettiere / che era l' amante della moglie ... *MIC: ah // forse <xxx> + =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D |