|
From: Abu Z. <za...@gm...> - 2009-08-08 10:52:06
|
Hi, Right now which one is considered standard ত্ or ৎ? I mean I have seen plenty of websites with বিদ্যুত্ and বিদ্যুৎ, চিত্কার and চিৎকার। I need need to pick one as a standard for Apertium. In case of Bengali to English part, we could accept both but when generating from English to Bengali, we need to generate one. Once again and thanks in advance. -- Regards Abu Zaher Md. Faridee http://zaher14.blogspot.com/ http://sourceforge.net/projects/apertium/ --- Time heals every wound, but time itself is a wound that never heals. |
|
From: Abu Z. <za...@gm...> - 2009-08-08 12:53:05
|
I just had a talk regarding this with Golam Mortaza Bhai, pasting that for future references :) (05:52:23 PM) za...@gm.../HomeC8631CA7: I've mailed you regarding an issue betten 'ত্ and 'ৎ', if you get the time, plase feel free to answer (05:52:25 PM) Golam Mortuza Hossain: I mean I got (05:52:30 PM) za...@gm.../HomeC8631CA7: cool (05:52:34 PM) Golam Mortuza Hossain: Please (05:52:42 PM) Golam Mortuza Hossain: follow "ৎ" (05:53:26 PM) Golam Mortuza Hossain: Khanda-Ta as a separate glyph is now Unicode standard (05:54:03 PM) Golam Mortuza Hossain: which wasn't the case earlier (05:54:41 PM) za...@gm.../HomeC8631CA7: I was following ৎ all this time, but came across some sites that have ত্ and the fact that in unicode character set ৎ has a comment like this "a dead consonant form of ta, without implicit vowel, used in some sequences", that why I thought I consult you (05:55:48 PM) Golam Mortuza Hossain: the reason for this, earlier there was no glyph for "Khanda-Ta" in Unicode (05:55:59 PM) za...@gm.../HomeC8631CA7: yeah I know (05:57:03 PM) Golam Mortuza Hossain: If you want to make it backward compatible then (05:57:23 PM) Golam Mortuza Hossain: you could consider mapping "ত্" (05:57:31 PM) Golam Mortuza Hossain: to "ৎ" (05:57:40 PM) Golam Mortuza Hossain: But it could be tricky (05:58:57 PM) za...@gm.../HomeC8631CA7: yeah (05:59:07 PM) za...@gm.../HomeC8631CA7: I know, I tried a bit (05:59:36 PM) Golam Mortuza Hossain: :-) (06:01:17 PM) za...@gm.../HomeC8631CA7: we might need to build a table for that, for eg. ত্ক - ৎক its always like that isn't it, but we can't map like it in উত্তর (06:01:36 PM) za...@gm.../HomeC8631CA7: so we might need a to check all these :( (06:02:32 PM) Golam Mortuza Hossain: If I remember correctly then sometime people also (06:02:42 PM) Golam Mortuza Hossain: used ZWNJ after Halant (06:02:51 PM) za...@gm.../HomeC8631CA7: yeah (06:03:03 PM) za...@gm.../HomeC8631CA7: I've seen that too (06:03:21 PM) Golam Mortuza Hossain: this case should be easy (06:04:30 PM) Golam Mortuza Hossain: also when it appears just before "," , ":", "।", "?", " " etc. (06:04:44 PM) za...@gm.../HomeC8631CA7: am alreay running the source text through a normalizer right now, becase ড় - ড + nukta, we sometimes get text in the complex form and the parser gets confused (06:04:54 PM) za...@gm.../HomeC8631CA7: aha (06:05:23 PM) Golam Mortuza Hossain: yeah I see (06:06:50 PM) za...@gm.../HomeC8631CA7: so you think its do-able right? (06:07:22 PM) Golam Mortuza Hossain: no (06:07:52 PM) za...@gm.../HomeC8631CA7: btw, could I paste this conversation in the group just as a reference for the others? (06:09:11 PM) Golam Mortuza Hossain: In some cases unambiguous mapping may not be possible (06:09:16 PM) Golam Mortuza Hossain: Yeah, sure (06:13:37 PM) Golam Mortuza Hossain: My suggestion would be handle only "ৎ" in the engine. (06:15:28 PM) Golam Mortuza Hossain: If needed then mapping should be done in text pre-parser. (06:16:21 PM) Golam Mortuza Hossain: In the long term "ত্" appearance will go away! (06:16:30 PM) za...@gm.../HomeC8631CA7: I agree -- Regards Abu Zaher Md. Faridee http://zaher14.blogspot.com/ http://sourceforge.net/projects/apertium/ --- Time heals every wound, but time itself is a wound that never heals. |
|
From: Jamil A. <its...@gm...> - 2009-08-08 18:57:00
|
On Sat, Aug 8, 2009 at 6:51 PM, Abu Zaher<za...@gm...> wrote: > I just had a talk regarding this with Golam Mortaza Bhai, pasting that for > future references :) > > (05:52:23 PM) za...@gm.../HomeC8631CA7: I've mailed you regarding an > issue betten 'ত্ and 'ৎ', if you get the time, plase feel free to answer > (05:52:25 PM) Golam Mortuza Hossain: I mean I got > (05:52:30 PM) za...@gm.../HomeC8631CA7: cool > (05:52:34 PM) Golam Mortuza Hossain: Please > (05:52:42 PM) Golam Mortuza Hossain: follow "ৎ" > (05:53:26 PM) Golam Mortuza Hossain: Khanda-Ta as a separate glyph is now > Unicode standard > (05:54:03 PM) Golam Mortuza Hossain: which wasn't the case earlier > (05:54:41 PM) za...@gm.../HomeC8631CA7: I was following ৎ all this > time, but came across some sites that have ত্ and the fact that in unicode > character set ৎ has a comment like this "a dead consonant form of ta, > without implicit vowel, used in some sequences", that why I thought I > consult you > (05:55:48 PM) Golam Mortuza Hossain: the reason for this, earlier there was > no glyph for "Khanda-Ta" in Unicode > (05:55:59 PM) za...@gm.../HomeC8631CA7: yeah I know > (05:57:03 PM) Golam Mortuza Hossain: If you want to make it backward > compatible then > (05:57:23 PM) Golam Mortuza Hossain: you could consider mapping "ত্" > (05:57:31 PM) Golam Mortuza Hossain: to "ৎ" > (05:57:40 PM) Golam Mortuza Hossain: But it could be tricky > (05:58:57 PM) za...@gm.../HomeC8631CA7: yeah > (05:59:07 PM) za...@gm.../HomeC8631CA7: I know, I tried a bit > (05:59:36 PM) Golam Mortuza Hossain: :-) > (06:01:17 PM) za...@gm.../HomeC8631CA7: we might need to build a table > for that, for eg. ত্ক - ৎক its always like that isn't it, but we can't map > like it in উত্তর > (06:01:36 PM) za...@gm.../HomeC8631CA7: so we might need a to check > all these :( > (06:02:32 PM) Golam Mortuza Hossain: If I remember correctly then sometime > people also > (06:02:42 PM) Golam Mortuza Hossain: used ZWNJ after Halant > (06:02:51 PM) za...@gm.../HomeC8631CA7: yeah > (06:03:03 PM) za...@gm.../HomeC8631CA7: I've seen that too > (06:03:21 PM) Golam Mortuza Hossain: this case should be easy > (06:04:30 PM) Golam Mortuza Hossain: also when it appears just before "," , > ":", "।", "?", " " etc. > (06:04:44 PM) za...@gm.../HomeC8631CA7: am alreay running the source > text through a normalizer right now, becase ড় - ড + nukta, we sometimes get > text in the complex form and the parser gets confused > (06:04:54 PM) za...@gm.../HomeC8631CA7: aha > (06:05:23 PM) Golam Mortuza Hossain: yeah I see > (06:06:50 PM) za...@gm.../HomeC8631CA7: so you think its do-able > right? > (06:07:22 PM) Golam Mortuza Hossain: no > (06:07:52 PM) za...@gm.../HomeC8631CA7: btw, could I paste this > conversation in the group just as a reference for the others? > (06:09:11 PM) Golam Mortuza Hossain: In some cases unambiguous mapping may > not be possible > (06:09:16 PM) Golam Mortuza Hossain: Yeah, sure > (06:13:37 PM) Golam Mortuza Hossain: My suggestion would be handle only "ৎ" > in the engine. > (06:15:28 PM) Golam Mortuza Hossain: If needed then mapping should be done > in text pre-parser. > (06:16:21 PM) Golam Mortuza Hossain: In the long term "ত্" appearance will > go away! > (06:16:30 PM) za...@gm.../HomeC8631CA7: I agree > yes, keep working with "ৎ" > -- > Regards > Abu Zaher Md. Faridee > > http://zaher14.blogspot.com/ > http://sourceforge.net/projects/apertium/ > --- > Time heals every wound, but time itself is a wound that never heals. > |