Re: [Palmkit-users-jp] language model
Status: Beta
Brought to you by:
a-ito
From: Khan S. <upa...@ya...> - 2006-11-27 09:28:40
|
Dear Sir, Thank you very much for your kind response and detailed explanation.In fact,I dont have that much control on Japanese language and I hope your advice will be an immense help for me to distinguish between Content words and Function words.I have also downloaded the paper refered by you and I'm going to consult it carefully. I think I will be able to tag the words successfully following your advice.Now,after completion of tagging,is it possible to calculate the probabilities of my proposed LM using Palmkit?If so,can you tell me about the process(or commands,in particular) of doing it?? Sorry for bothering you so many times.But your advices have really really helped me a lot and let me express my heartiest gratitude to you for spending your valuable time for me.Thank you once again. I'm looking forward to your reply. With Regards --- Akinori Ito <ai...@fw...> wrote: > Hello, > > I'm sorry for the late reply. > > I read your Word file. The model you are going to > make is exactly same one > that was proposed by Isotani and Matsunaga in > 1994[1]. If you haven't read > their paper, you'd better to consult it. > > [1] R. Isotani and S. Matsunaga, "A Stochastic > Language Model for Speech > Recognition Integrating Local and Global > Constraints," Proc. ICASSP94, > vol. II, pp. 5-8, 1994. > > Now, we have a couple of way to distinguish content > words and function words. > If you are using Chasen, the easiest way is to make > lists of content and > function words. You can get a list of all parts of > speech by "chasen -lp" > command. Then, we can split the POS into the > following classes. > (The treatment of "others" classes depends on the > purpose of the LM.) > > Content words: > 1 名詞 > 2 名詞-一般 > 3 名詞-固有名詞 > 4 名詞-固有名詞-一般 > 5 名詞-固有名詞-人名 > 6 名詞-固有名詞-人名-一般 > 7 名詞-固有名詞-人名-姓 > 8 名詞-固有名詞-人名-名 > 9 名詞-固有名詞-組織 > 10 名詞-固有名詞-地域 > 11 名詞-固有名詞-地域-一般 > 12 名詞-固有名詞-地域-国 > 13 名詞-代名詞 > 14 名詞-代名詞-一般 > 15 名詞-代名詞-縮約 > 16 名詞-副詞可能 > 17 名詞-サ変接続 > 18 名詞-形容動詞語幹 > 19 名詞-数 > 40 名詞-ナイ形容詞語幹 > 46 動詞 > 47 動詞-自立 > 50 形容詞 > 51 形容詞-自立 > 54 副詞 > 55 副詞-一般 > 56 副詞-助詞類接続 > 57 連体詞 > 58 接続詞 > 75 感動詞 > 81 記号-アルファベット > > Function words: > 20 名詞-非自立 > 21 名詞-非自立-一般 > 22 名詞-非自立-副詞可能 > 23 名詞-非自立-助動詞語幹 > 24 名詞-非自立-形容動詞語幹 > 25 名詞-特殊 > 26 名詞-特殊-助動詞語幹 > 27 名詞-接尾 > 28 名詞-接尾-一般 > 29 名詞-接尾-人名 > 30 名詞-接尾-地域 > 31 名詞-接尾-サ変接続 > 32 名詞-接尾-助動詞語幹 > 33 名詞-接尾-形容動詞語幹 > 34 名詞-接尾-副詞可能 > 35 名詞-接尾-助数詞 > 36 名詞-接尾-特殊 > 37 名詞-接続詞的 > 38 名詞-動詞非自立的 > 41 接頭詞 > 42 接頭詞-名詞接続 > 43 接頭詞-動詞接続 > 44 接頭詞-形容詞接続 > 45 接頭詞-数接続 > 48 動詞-非自立 > 49 動詞-接尾 > 52 形容詞-非自立 > 53 形容詞-接尾 > 59 助詞 > 60 助詞-格助詞 > 61 助詞-格助詞-一般 > 62 助詞-格助詞-引用 > 63 助詞-格助詞-連語 > 64 助詞-接続助詞 > 65 助詞-係助詞 > 66 助詞-副助詞 > 67 助詞-間投助詞 > 68 助詞-並立助詞 > 69 助詞-終助詞 > 70 助詞-副助詞/並立助詞/終助詞 > 71 助詞-連体化 > 72 助詞-副詞化 > 73 助詞-特殊 > 74 助動詞 > > Others (Not a word) > 84 その他 > 85 その他-間投 > 86 フィラー > 87 非言語音 > 88 語断片 > > Others (They don't have speech form) > 0 BOS/EOS > 39 名詞-引用文字列 > 76 記号 > 77 記号-一般 > 78 記号-句点 > 79 記号-読点 > 80 記号-空白 > 82 記号-括弧開 > 83 記号-括弧閉 > > > Khan Sakeb wrote: > > Dear Sir, > > Thank you very much for your kind and prompt > response.Let me apologize at first for my reply > being late.I could successfully make classlist using > the "ctext2class" command.But can you plz tell me > which order do the words follow when they appear in > the classlist? > > > > Now,let me focus on my research topic.I'm > attaching a MS-WORD file along with this mail which > describes the equations of my proposed language > models.I'm confused about which approach to take for > calcualting the probablities.At first, I just want > to determine the probablity of the next word > being自立語(Ci=1)/付属語(Ci=0) in a tri-gram model. > I mean P(Ci=1| Wi-2 Wi-1) or P(Ci=0| Wi-2 Wi-1). > > > > You mentioned in your mail to use some kind of > tagger to distinguish between 自立語 and 付属語. > Write now,I'm using 'CHASEN' for morphological > analysis.Can you plz give me an idea about using > CHASEN effectively to distinguish between 自立語 and > 付属語. I think then I can use palmkit to generate > more specific classlists. > > > > Thank you very much once again.I will be highly > grateful if you kindly reply to my mail at your > convenient time. > > > > With Regards > > upal1660 > > > > > > > > --------------------------------- > > Start Yahoo! Auction now! Check out the cool > campaign > > > > > > > ------------------------------------------------------------------------ > > > > > ------------------------------------------------------------------------- > > Take Surveys. Earn Cash. Influence the Future of > IT > > Join SourceForge.net's Techsay panel and you'll > get the chance to share your > > opinions on IT & business topics through brief > surveys - and earn cash > > > http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV > > > > > > > ------------------------------------------------------------------------ > > > > _______________________________________________ > > Palmkit-users-jp mailing list > > Pal...@li... > > > https://lists.sourceforge.net/lists/listinfo/palmkit-users-jp > > -- > 伊藤 彰則 東北大学 大学院工学研究科 > Akinori Ito, Assoc. Prof. > Graduate School of Engineering, Tohoku Univ. > TEL: 022-795-7084 E-mail: ai...@fw... > > > -------------------------------------- Start Yahoo! Auction now! Check out the cool campaign http://pr.mail.yahoo.co.jp/auction/ |