You can subscribe to this list here.
2002 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
(104) |
Oct
(54) |
Nov
(44) |
Dec
(42) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2003 |
Jan
(45) |
Feb
(87) |
Mar
(28) |
Apr
(33) |
May
(148) |
Jun
(78) |
Jul
(77) |
Aug
(101) |
Sep
(79) |
Oct
(78) |
Nov
(119) |
Dec
(113) |
2004 |
Jan
(79) |
Feb
(31) |
Mar
(84) |
Apr
(164) |
May
(94) |
Jun
(90) |
Jul
(46) |
Aug
(109) |
Sep
(66) |
Oct
(40) |
Nov
(23) |
Dec
(61) |
2005 |
Jan
(99) |
Feb
(67) |
Mar
(75) |
Apr
(99) |
May
(127) |
Jun
(26) |
Jul
(15) |
Aug
(26) |
Sep
(51) |
Oct
(35) |
Nov
(30) |
Dec
(40) |
2006 |
Jan
(56) |
Feb
(48) |
Mar
(44) |
Apr
(113) |
May
(57) |
Jun
(52) |
Jul
(11) |
Aug
(24) |
Sep
(36) |
Oct
(20) |
Nov
(18) |
Dec
(8) |
2007 |
Jan
|
Feb
(4) |
Mar
(43) |
Apr
(55) |
May
(27) |
Jun
(14) |
Jul
(3) |
Aug
(4) |
Sep
|
Oct
(13) |
Nov
(6) |
Dec
(2) |
2008 |
Jan
(4) |
Feb
(6) |
Mar
(3) |
Apr
|
May
|
Jun
(2) |
Jul
(9) |
Aug
|
Sep
|
Oct
|
Nov
(3) |
Dec
|
2009 |
Jan
(4) |
Feb
(14) |
Mar
(1) |
Apr
(10) |
May
(19) |
Jun
(3) |
Jul
|
Aug
(6) |
Sep
(2) |
Oct
|
Nov
|
Dec
|
From: Jamil A. <its...@gm...> - 2009-09-16 20:10:47
|
Friends, It is our pleasure to inform you that Bangla OpenOffice.org 3.1.1 has been released, which is based on the original source code [0] of OpenOffice.org 3.1.x. OpenOffice.org [1] is the leading open-source office software suite for word processing, spreadsheets, presentations, graphics, databases and more. It is available in many languages and works on all common computers. It stores all your data in an international open standard format and can also read and write files from other common office software packages. It can be downloaded and used completely free of charge for any purpose. Some new features over the official OpenOffice.org 3.1.x [2]: * Updated User Interface and Help content translation. * Built-in Bangla dictionary for spell checking capability. * Hypernation feature introduced for Bangla which is still in development stage. * Updated locale file for Bangla-Bangladesh (bn-BD) with improved collation rules as recommended by Bangla academy [3]. Bangla OpenOffice.org 3.1.1 download links: * Debian, Ubuntu and Other DEB based GNU/Linux [4] * Fedora, Red Hat and Other RPM based GNU/Linux [5] * Microsoft Windows [6] Please send us your comments and suggestions. Regards, -Jamil [0] http://download.openoffice.org/source/index.html [1] http://www.openoffice.org/ [2] http://www.openoffice.org/dev_docs/features/3.1/ [3] http://www.banglaacademy.org.bd/english/index.php [4] http://sourceforge.net/projects/bengalinux/files/openoffice-bangla/OpenOffice.org_3.1.1_Bangla_Full_Version/OOo_3.1.1_090910_LinuxIntel_install_bn_deb.tar.gz/download [5] http://sourceforge.net/projects/bengalinux/files/openoffice-bangla/OpenOffice.org_3.1.1_Bangla_Full_Version/OOo_3.1.1_090904_LinuxIntel_install_bn_rpm.tar.gz/download [6] http://sourceforge.net/projects/bengalinux/files/openoffice-bangla/OpenOffice.org_3.1.1_Bangla_Full_Version/OOo_3.1.1_090903_Win32Intel_install_bn.exe/download |
From: Golam M. H. <gmh...@gm...> - 2009-08-31 21:21:30
|
Hi, > I'd be glad to help, Bhaiya :) > I hope, I can help you, please send me a mail. Thanks Zaher and Salahuddin. I will be sending you a list of words each along with brief instructions shortly. Best, Golam |
From: Salahuddin P. <sal...@gm...> - 2009-08-31 17:14:05
|
On Aug 31, 2009, at 7:43 AM, Golam Mortuza Hossain wrote: > Hi all, > > I am delighted to share this news with you that > Mr Sharfuz Zaman (sharfuz at gmail dot com) has donated > more than 16 thousands meanings to Ankur E2B dictionary > project. This is more than the number of edited entries that > we have currently. We are progressing with the efforts of some great human ... > > I have started pushing them into our MySQL database. > With bit of checking and minor editing this could take weeks > to complete unless some of you want to help me out. > I hope, I can help you, please send me a mail. > > Cheers, > Golam > > ------------------------------------------------------------------------------ > Let Crystal Reports handle the reporting - Free Crystal Reports 2008 > 30-Day > trial. Simplify your report design, integration and deployment - and > focus on > what you do best, core application coding. Discover what's new with > Crystal Reports now. http://p.sf.net/sfu/bobj-july > _______________________________________________ > Bengalinux-core mailing list > Ben...@li... > https://lists.sourceforge.net/lists/listinfo/bengalinux-core Regards Salahuddin salahuddin66.deviantart.com |
From: Golam M. H. <gmh...@gm...> - 2009-08-31 00:50:32
|
Hi all, I am delighted to share this news with you that Mr Sharfuz Zaman (sharfuz at gmail dot com) has donated more than 16 thousands meanings to Ankur E2B dictionary project. This is more than the number of edited entries that we have currently. I have started pushing them into our MySQL database. With bit of checking and minor editing this could take weeks to complete unless some of you want to help me out. Cheers, Golam |
From: Jamil A. <its...@gm...> - 2009-08-08 18:57:00
|
On Sat, Aug 8, 2009 at 6:51 PM, Abu Zaher<za...@gm...> wrote: > I just had a talk regarding this with Golam Mortaza Bhai, pasting that for > future references :) > > (05:52:23 PM) za...@gm.../HomeC8631CA7: I've mailed you regarding an > issue betten 'ত্ and 'ৎ', if you get the time, plase feel free to answer > (05:52:25 PM) Golam Mortuza Hossain: I mean I got > (05:52:30 PM) za...@gm.../HomeC8631CA7: cool > (05:52:34 PM) Golam Mortuza Hossain: Please > (05:52:42 PM) Golam Mortuza Hossain: follow "ৎ" > (05:53:26 PM) Golam Mortuza Hossain: Khanda-Ta as a separate glyph is now > Unicode standard > (05:54:03 PM) Golam Mortuza Hossain: which wasn't the case earlier > (05:54:41 PM) za...@gm.../HomeC8631CA7: I was following ৎ all this > time, but came across some sites that have ত্ and the fact that in unicode > character set ৎ has a comment like this "a dead consonant form of ta, > without implicit vowel, used in some sequences", that why I thought I > consult you > (05:55:48 PM) Golam Mortuza Hossain: the reason for this, earlier there was > no glyph for "Khanda-Ta" in Unicode > (05:55:59 PM) za...@gm.../HomeC8631CA7: yeah I know > (05:57:03 PM) Golam Mortuza Hossain: If you want to make it backward > compatible then > (05:57:23 PM) Golam Mortuza Hossain: you could consider mapping "ত্" > (05:57:31 PM) Golam Mortuza Hossain: to "ৎ" > (05:57:40 PM) Golam Mortuza Hossain: But it could be tricky > (05:58:57 PM) za...@gm.../HomeC8631CA7: yeah > (05:59:07 PM) za...@gm.../HomeC8631CA7: I know, I tried a bit > (05:59:36 PM) Golam Mortuza Hossain: :-) > (06:01:17 PM) za...@gm.../HomeC8631CA7: we might need to build a table > for that, for eg. ত্ক - ৎক its always like that isn't it, but we can't map > like it in উত্তর > (06:01:36 PM) za...@gm.../HomeC8631CA7: so we might need a to check > all these :( > (06:02:32 PM) Golam Mortuza Hossain: If I remember correctly then sometime > people also > (06:02:42 PM) Golam Mortuza Hossain: used ZWNJ after Halant > (06:02:51 PM) za...@gm.../HomeC8631CA7: yeah > (06:03:03 PM) za...@gm.../HomeC8631CA7: I've seen that too > (06:03:21 PM) Golam Mortuza Hossain: this case should be easy > (06:04:30 PM) Golam Mortuza Hossain: also when it appears just before "," , > ":", "।", "?", " " etc. > (06:04:44 PM) za...@gm.../HomeC8631CA7: am alreay running the source > text through a normalizer right now, becase ড় - ড + nukta, we sometimes get > text in the complex form and the parser gets confused > (06:04:54 PM) za...@gm.../HomeC8631CA7: aha > (06:05:23 PM) Golam Mortuza Hossain: yeah I see > (06:06:50 PM) za...@gm.../HomeC8631CA7: so you think its do-able > right? > (06:07:22 PM) Golam Mortuza Hossain: no > (06:07:52 PM) za...@gm.../HomeC8631CA7: btw, could I paste this > conversation in the group just as a reference for the others? > (06:09:11 PM) Golam Mortuza Hossain: In some cases unambiguous mapping may > not be possible > (06:09:16 PM) Golam Mortuza Hossain: Yeah, sure > (06:13:37 PM) Golam Mortuza Hossain: My suggestion would be handle only "ৎ" > in the engine. > (06:15:28 PM) Golam Mortuza Hossain: If needed then mapping should be done > in text pre-parser. > (06:16:21 PM) Golam Mortuza Hossain: In the long term "ত্" appearance will > go away! > (06:16:30 PM) za...@gm.../HomeC8631CA7: I agree > yes, keep working with "ৎ" > -- > Regards > Abu Zaher Md. Faridee > > http://zaher14.blogspot.com/ > http://sourceforge.net/projects/apertium/ > --- > Time heals every wound, but time itself is a wound that never heals. > |
From: Abu Z. <za...@gm...> - 2009-08-08 12:53:05
|
I just had a talk regarding this with Golam Mortaza Bhai, pasting that for future references :) (05:52:23 PM) za...@gm.../HomeC8631CA7: I've mailed you regarding an issue betten 'ত্ and 'ৎ', if you get the time, plase feel free to answer (05:52:25 PM) Golam Mortuza Hossain: I mean I got (05:52:30 PM) za...@gm.../HomeC8631CA7: cool (05:52:34 PM) Golam Mortuza Hossain: Please (05:52:42 PM) Golam Mortuza Hossain: follow "ৎ" (05:53:26 PM) Golam Mortuza Hossain: Khanda-Ta as a separate glyph is now Unicode standard (05:54:03 PM) Golam Mortuza Hossain: which wasn't the case earlier (05:54:41 PM) za...@gm.../HomeC8631CA7: I was following ৎ all this time, but came across some sites that have ত্ and the fact that in unicode character set ৎ has a comment like this "a dead consonant form of ta, without implicit vowel, used in some sequences", that why I thought I consult you (05:55:48 PM) Golam Mortuza Hossain: the reason for this, earlier there was no glyph for "Khanda-Ta" in Unicode (05:55:59 PM) za...@gm.../HomeC8631CA7: yeah I know (05:57:03 PM) Golam Mortuza Hossain: If you want to make it backward compatible then (05:57:23 PM) Golam Mortuza Hossain: you could consider mapping "ত্" (05:57:31 PM) Golam Mortuza Hossain: to "ৎ" (05:57:40 PM) Golam Mortuza Hossain: But it could be tricky (05:58:57 PM) za...@gm.../HomeC8631CA7: yeah (05:59:07 PM) za...@gm.../HomeC8631CA7: I know, I tried a bit (05:59:36 PM) Golam Mortuza Hossain: :-) (06:01:17 PM) za...@gm.../HomeC8631CA7: we might need to build a table for that, for eg. ত্ক - ৎক its always like that isn't it, but we can't map like it in উত্তর (06:01:36 PM) za...@gm.../HomeC8631CA7: so we might need a to check all these :( (06:02:32 PM) Golam Mortuza Hossain: If I remember correctly then sometime people also (06:02:42 PM) Golam Mortuza Hossain: used ZWNJ after Halant (06:02:51 PM) za...@gm.../HomeC8631CA7: yeah (06:03:03 PM) za...@gm.../HomeC8631CA7: I've seen that too (06:03:21 PM) Golam Mortuza Hossain: this case should be easy (06:04:30 PM) Golam Mortuza Hossain: also when it appears just before "," , ":", "।", "?", " " etc. (06:04:44 PM) za...@gm.../HomeC8631CA7: am alreay running the source text through a normalizer right now, becase ড় - ড + nukta, we sometimes get text in the complex form and the parser gets confused (06:04:54 PM) za...@gm.../HomeC8631CA7: aha (06:05:23 PM) Golam Mortuza Hossain: yeah I see (06:06:50 PM) za...@gm.../HomeC8631CA7: so you think its do-able right? (06:07:22 PM) Golam Mortuza Hossain: no (06:07:52 PM) za...@gm.../HomeC8631CA7: btw, could I paste this conversation in the group just as a reference for the others? (06:09:11 PM) Golam Mortuza Hossain: In some cases unambiguous mapping may not be possible (06:09:16 PM) Golam Mortuza Hossain: Yeah, sure (06:13:37 PM) Golam Mortuza Hossain: My suggestion would be handle only "ৎ" in the engine. (06:15:28 PM) Golam Mortuza Hossain: If needed then mapping should be done in text pre-parser. (06:16:21 PM) Golam Mortuza Hossain: In the long term "ত্" appearance will go away! (06:16:30 PM) za...@gm.../HomeC8631CA7: I agree -- Regards Abu Zaher Md. Faridee http://zaher14.blogspot.com/ http://sourceforge.net/projects/apertium/ --- Time heals every wound, but time itself is a wound that never heals. |
From: Abu Z. <za...@gm...> - 2009-08-08 10:52:06
|
Hi, Right now which one is considered standard ত্ or ৎ? I mean I have seen plenty of websites with বিদ্যুত্ and বিদ্যুৎ, চিত্কার and চিৎকার। I need need to pick one as a standard for Apertium. In case of Bengali to English part, we could accept both but when generating from English to Bengali, we need to generate one. Once again and thanks in advance. -- Regards Abu Zaher Md. Faridee http://zaher14.blogspot.com/ http://sourceforge.net/projects/apertium/ --- Time heals every wound, but time itself is a wound that never heals. |
From: Abu Z. <za...@gm...> - 2009-06-09 11:05:29
|
Dear Golam Bhai, For my GSoC project, I'll also need to work on a English to Bengali dictionary (pos tagged), which will only hold the references of the lemmata. Currently I'm busy on the Bengali morphological generation part, but after I finish that I can move to Dictionary part. I'll contact and Jamil bhai soon. On Tue, Jun 9, 2009 at 3:03 PM, Jamil Ahmed <its...@gm...> wrote: > Dear Golam bhai, > > I will check and let you know soon. :) > > Regards, > -Jamil > > > 2009/6/7 Golam Mortuza Hossain <gmh...@gm...> > > > Hi All, > > > > Ankur English to Bengali dictionary project [1] has been serving > > increasingly more and more users for quite some time. According > > to Google analytics, Ankur E2B dictionary project has served > > more than sixty thousands request in last month alone [2]. > > > > It has also lead to increased contributions. Unfortunately, > > the project is lacking man-power to keep up with the > > increased demand. Also, I am unable to give enough time > > to the project lately and I don't see my situation is changing > > anytime soon. Consequently, large numbers of contributed > > entries remain unedited [3]. > > > > So I am now seeking opinions from Ankur members to sustain > > the project meaningfully. > > > > I would be happy to make personal request to anyone who > > might be interested in helping the project by any means. > > In case, you know of someone either from Ankur or outside, > > who could help in this regard, then please let me know. > > It may be helpful to forward this request to any other > > interested groups. > > > > > > [1] http://www.bengalinux.org/english-to-bengali-dictionary/ > > [2] > > > http://www.bengalinux.org/english-to-bengali-dictionary/VisitorsOverviewReport.pdf > > [3] http://www.bengalinux.org/cgi-bin/abhidhan/statistics.pl > > > > Cheers, > > Golam > > > > > > > ------------------------------------------------------------------------------ > > Crystal Reports - New Free Runtime and 30 Day Trial > > Check out the new simplified licensing option that enables unlimited > > royalty-free distribution of the report engine for externally facing > > server and web deployment. > > http://p.sf.net/sfu/businessobjects > > _______________________________________________ > > Bengalinux-core mailing list > > Ben...@li... > > https://lists.sourceforge.net/lists/listinfo/bengalinux-core > > > > ------------------------------------------------------------------------------ > Crystal Reports - New Free Runtime and 30 Day Trial > Check out the new simplified licensing option that enables unlimited > royalty-free distribution of the report engine for externally facing > server and web deployment. > http://p.sf.net/sfu/businessobjects > _______________________________________________ > Bengalinux-core mailing list > Ben...@li... > https://lists.sourceforge.net/lists/listinfo/bengalinux-core > -- Regards Abu Zaher Md. Faridee http://zaher14.blogspot.com/ --- Time heals every wound, but time itself is a wound that never heals. |
From: Jamil A. <its...@gm...> - 2009-06-09 09:10:21
|
Dear Golam bhai, I will check and let you know soon. :) Regards, -Jamil 2009/6/7 Golam Mortuza Hossain <gmh...@gm...> > Hi All, > > Ankur English to Bengali dictionary project [1] has been serving > increasingly more and more users for quite some time. According > to Google analytics, Ankur E2B dictionary project has served > more than sixty thousands request in last month alone [2]. > > It has also lead to increased contributions. Unfortunately, > the project is lacking man-power to keep up with the > increased demand. Also, I am unable to give enough time > to the project lately and I don't see my situation is changing > anytime soon. Consequently, large numbers of contributed > entries remain unedited [3]. > > So I am now seeking opinions from Ankur members to sustain > the project meaningfully. > > I would be happy to make personal request to anyone who > might be interested in helping the project by any means. > In case, you know of someone either from Ankur or outside, > who could help in this regard, then please let me know. > It may be helpful to forward this request to any other > interested groups. > > > [1] http://www.bengalinux.org/english-to-bengali-dictionary/ > [2] > http://www.bengalinux.org/english-to-bengali-dictionary/VisitorsOverviewReport.pdf > [3] http://www.bengalinux.org/cgi-bin/abhidhan/statistics.pl > > Cheers, > Golam > > > ------------------------------------------------------------------------------ > Crystal Reports - New Free Runtime and 30 Day Trial > Check out the new simplified licensing option that enables unlimited > royalty-free distribution of the report engine for externally facing > server and web deployment. > http://p.sf.net/sfu/businessobjects > _______________________________________________ > Bengalinux-core mailing list > Ben...@li... > https://lists.sourceforge.net/lists/listinfo/bengalinux-core > |
From: Golam M. H. <gmh...@gm...> - 2009-06-08 19:12:41
|
Hi All, Ankur English to Bengali dictionary project [1] has been serving increasingly more and more users for quite some time. According to Google analytics, Ankur E2B dictionary project has served more than sixty thousands request in last month alone [2]. It has also lead to increased contributions. Unfortunately, the project is lacking man-power to keep up with the increased demand. Also, I am unable to give enough time to the project lately and I don't see my situation is changing anytime soon. Consequently, large numbers of contributed entries remain unedited [3]. So I am now seeking opinions from Ankur members to sustain the project meaningfully. I would be happy to make personal request to anyone who might be interested in helping the project by any means. In case, you know of someone either from Ankur or outside, who could help in this regard, then please let me know. It may be helpful to forward this request to any other interested groups. [1] http://www.bengalinux.org/english-to-bengali-dictionary/ [2] http://www.bengalinux.org/english-to-bengali-dictionary/VisitorsOverviewReport.pdf [3] http://www.bengalinux.org/cgi-bin/abhidhan/statistics.pl Cheers, Golam |
From: Salahuddin P. <sal...@gm...> - 2009-05-14 16:11:59
|
On May 13, 2009, at 10:57 PM, Deepayan Sarkar wrote: > On 5/12/09, Salahuddin Pasha <sal...@gm...> wrote: >> Dear all, >> >> I was working on অভিধান - Abhidhan for XML support. To >> enable various application and tools to utilize our dictionary. >> >> Basic work is already done, but we need to define a standard XML (XML >> DTD or XML Schema). >> >> Any suggestion or comments ? > > Back in 2003, the bengalinux dictionary list had a discussion on this. > Nothing ever came out of it, and when Golam first started on anubadok, > his emphasis was more specialized. In any case, that discussion may > provide some suggestions. > > You can get it from the list archives, and I'm also attaching a > cleaned up and edited version of the thread here: > > ....................... > ---- > > From: Kaushik Ghose <kghose@wa...> - 2003-05-16 15:07 > > <?xml version="1.0"?> > <!ELEMENT dictionary (entry*)> > <!ELEMENT entry (word, info*) > > <!ELEMENT word (#CDATA)> > <!ELEMENT info (refer?,pron?, synonym?,antonym?,meaning?,grammar?)> > <!ATTLIST info pos (n|adj|v|adv) "n" plural (true|false) "false" > origin > CDATA #DEFAULT "????????????" date CDATA> > <!ELEMENT refer (#CDATA)> > <!ELEMENT pron (#CDATA)> > <!ELEMENT synonym (#CDATA)> > <!ATTLIST synonym lang CDATA #DEFAULT "bn"> > <!ELEMENT antonym (#CDATA)> > <!ATTLIST antonym lang CDATA #DEFAULT "bn"> > <!ELEMENT meaning (#CDATA)> > <!ATTLIST meaning lang CDATA #DEFAULT "bn"> > <!ELEMENT grammar (derivative?)> > <!ELEMENT derivative (#CDATA)> > <!ATTLIST derivative form (the|of) "the" num (singular|plural) > "singular"> > > > also, to answer Deepayan's question by date I was thinking of date of > origin, first use etc. > > Will potter with QT > > right now, I'm goign to hardcode the DTD structure, I can't think > of a > simple way of creating an editor that will parse the DTD and > configure the > GUI on the fly - fixed boxes for all teh element will be quicker > for this > size DTD > > PS. try the perl tool at > http://www.sagehill.net/livedtd/download.html > > -kg > > > </thread> > > Dear Deepayan bhai, Thank you for your mail. Here is the present updated one example: <?xml version="1.0" encoding="utf-8"?> <dictionary> <search_results> <dict_entry> <bdict_id>68218</bdict_id> <en_word>apple</en_word> <pos_tag>Proper noun, singular</pos_tag> <penn_tag>NP</penn_tag> <bn_pronunciation></bn_pronunciation> <en_leema></en_leema> <bn_word>অ্যাপল</bn_word> <explanation></explanation> <example>উদাঃ</example> <status>EDITED</status> </dict_entry> </search_results> </dictionary> From Deepayan bhai's mail. I think we still need to add these fields. We will add this in later version as we do not have enough information for these fields now. origin="deshi" <synonyms>...</synonyms> <antonyms>...</antonyms> <entry> <info pos="noun" plural="false" origin="deshi"> <synonyms>...</synonyms> <antonyms>...</antonyms> </info> </entry> <grammar> <derivative form="the">chhaanaaTaa,chhaanaaTi</derivative> <derivative form="of"num="singular">chhaanaaTir</derivative> <derivative form="of" num="plural">chhaanaader</derivative> </grammar> Another questions is which would better for us ? use <grammer> tag and store information in nested tags or the palin one in the present updated one. regards salahuddin |
From: Deepayan S. <dee...@gm...> - 2009-05-13 17:06:12
|
On 5/12/09, Salahuddin Pasha <sal...@gm...> wrote: > Dear all, > > I was working on অভিধান - Abhidhan for XML support. To > enable various application and tools to utilize our dictionary. > > Basic work is already done, but we need to define a standard XML (XML > DTD or XML Schema). > > Any suggestion or comments ? Back in 2003, the bengalinux dictionary list had a discussion on this. Nothing ever came out of it, and when Golam first started on anubadok, his emphasis was more specialized. In any case, that discussion may provide some suggestions. You can get it from the list archives, and I'm also attaching a cleaned up and edited version of the thread here: <thread from May 2003> ---- [Ankur-dictionary] dictionary.dtd From: Kaushik Ghose <kghose@wa...> - 2003-05-14 04:17 Hi, here is the descriptor file. I'm new to XML and DTDs so please go over the semantics as well as the syntax an see if this serves our purpose... <?xml version="1.0"?> <!ELEMENT entry*(word_bn, info_bn*)> <!ELEMENT word_bn (#CDATA)> <!ELEMENT info_bn (english, pronounciation_bn,meaning_bn)> <!ELEMENT english (#CDATA)> <!ELEMENT pronounciation_bn (#CDATA)> <!ELEMENT meaning_bn (#CDATA)> thanks -kg ---- From: Kaushik Ghose <kghose@wa...> - 2003-05-14 05:12 Ok, small correction, QTs DOM class seems to parse this correctly dictionary.dtd <?xml version="1.0"?> <!ELEMENT dictionary (entry*)> <!ELEMENT entry (word_bn, info_bn*) > <!ELEMENT word_bn (#CDATA)> <!ELEMENT info_bn (english?, pronounciation_bn?,meaning_bn?)> <!ELEMENT english (#CDATA)> <!ELEMENT pronounciation_bn (#CDATA)> <!ELEMENT meaning_bn (#CDATA)> test.xml <?xml version="1.0"?> <!DOCTYPE entry SYSTEM "dictionary.dtd"> <dictionary> <entry> <word_bn>????????????????????? ???????????????</word_bn> <info_bn> <english>seedling</english> <pronounciation_bn>ankur</pronounciation_bn> <meaning_bn>??????????????????? ??????????? ???????????????????????? ?????????????????? ??????????????????</meaning_bn> </info_bn> </entry> <entry> <word_bn>????????????????????? ?????????</word_bn> <info_bn> <english>bangla</english> <pronounciation_bn>bangla</pronounciation_bn> <meaning_bn>??????????????????? ????????????????? ????????????????????????, ????????????????????????? ??????????? ????????????????????????? ?????</meaning_bn> </info_bn> <info_bn> <english>bengali</english> </info_bn> </entry> </dictionary> thanks -kg ---- From: Deepayan Sarkar <deepayan@st...> - 2003-05-14 07:03 Ha! A friend of mine once corrected me on this, now I can correct someone else :) 'pronounciation' should be spelled 'pronunciation'. I'm not an expert on DTDs (though I know someone who knows much more, whom I can ask after after we make some progress). I find it very difficult to understand DTD's, and much easier to understand examples of what the final thing would look like. Let's work that way, and we can write out the DTD on ce we decide on the 'look'. I don't know if you know this, but there's something called attributes which might be useful. For instance, with multiple meanings as different parts of speech. Here's an example (I'm using slightly different tags) --- 'pos' is part of speech, 'plural' is whether the word has a plural form, etc.: <entry> <word>chhaanaa</word> <info pos="noun" plural="false" origin="deshi"> <meaning>dudh theke toiri ek dhoroner ...</meaning> <synonyms>...</synonyms> <antonyms>...</antonyms> ## ??? <translation lang="en">cottage cheese (?)</translation> <pronunciation>chhaanaa</pronunciation> </info> <info pos="noun" origin="tatbhabo"> #it's probably not, but... <meaning>shishu, bachchaa</meaning> <translation lang="en">child, young</translation> # comma separated <translation lang="hn">bachcha</translation> #hindi is hn ? not sure <pronunciation>chhaanaa</pronunciation> <derivative form="the">chhaanaaTaa, chhaanaaTi</derivative> <derivative form="of" num="singular">chhaanaaTir</derivative> <derivative form="of" num="plural">chhaanaader</derivative> </info> </entry> (I've used romanized bengali in place of what should be bengali, but you get the idea.) I think we should handle derivative words here (and not have separate entries for them. They can be generated from this). Sanskrit has very systematic rules for 'shabdarup'. Bengali isn't as systematic, but there are still quite general rules. We can formulate some rules and list down only derivative words that are exceptions to that rule. We have the standard forms: to, by, for, from, of and in plus maybe plurals, the, a --- anything else ? Also, Bengali (unlike English) often has many words which mean exactly the same thing. We might try to think of a way to have a single entry for all o f them. Can anyone (preferably with a dictionary at hand) think of anything else ? This is not very important right now, but what's a good format to store pronunciation ? ---- From: Taneem Ahmed <taneem@ey...> - 2003-05-14 08:33 On Wed, 14 May 2003, Kaushik Ghose wrote: > Hi, > here is the descriptor file. > I'm new to XML and DTDs so please go over the semantics as well as the > syntax an see if this serves our purpose... > > > <?xml version="1.0"?> > <!ELEMENT entry*(word_bn, info_bn*)> > <!ELEMENT word_bn (#CDATA)> > <!ELEMENT info_bn (english, pronounciation_bn,meaning_bn)> > <!ELEMENT english (#CDATA)> > <!ELEMENT pronounciation_bn (#CDATA)> > <!ELEMENT meaning_bn (#CDATA)> I remember someone mentioned something about multiple language support. Is it possible to have a general element instead of "english" so that it'll be easier to expand for other langauges? Taneem ---- From: Taneem Ahmed <taneem@ey...> - 2003-05-14 08:37 Sorry I didn't see Deepayan's mail when I sent my previous e-mail. His example is what I was talking about :) Taneem On Wed, 14 May 2003, Deepayan Sarkar wrote: ---- From: Kaushik Ghose <kghose@wa...> - 2003-05-14 20:54 hi, On Wed, 14 May 2003, Deepayan Sarkar wrote: > > Ha! A friend of mine once corrected me on this, now I can correct someone else > :) 'pronounciation' should be spelled 'pronunciation'. > Okay :), so the new tag for this is <pron> >:D > I'm not an expert on DTDs (though I know someone who knows much more, whom I > can ask after after we make some progress). I find it very difficult to > understand DTD's, and much easier to understand examples of what the final > thing would look like. Let's work that way, and we can write out the DTD once > we decide on the 'look'. Sure, I think I've got the hold of elementary DTD (ie of the level I set out, so I can handle that -QTs happy, so am I...) > I don't know if you know this, but there's something called attributes which > might be useful. For instance, with multiple meanings as different parts of > speech. Here's an example (I'm using slightly different tags) --- 'pos' is > part of speech, 'plural' is whether the word has a plural form, etc.: > > <entry> > <word>chhaanaa</word> > <info pos="noun" plural="false" origin="deshi"> > <meaning>dudh theke toiri ek dhoroner ...</meaning> > <synonyms>...</synonyms> > <antonyms>...</antonyms> ## ??? > <translation lang="en">cottage cheese (?)</translation> > <pronunciation>chhaanaa</pronunciation> > </info> > <info pos="noun" origin="tatbhabo"> #it's probably not, but... > <meaning>shishu, bachchaa</meaning> > <translation lang="en">child, young</translation> # comma separated > <translation lang="hn">bachcha</translation> #hindi is hn ? not sure > <pronunciation>chhaanaa</pronunciation> > <derivative form="the">chhaanaaTaa, chhaanaaTi</derivative> > <derivative form="of" num="singular">chhaanaaTir</derivative> > <derivative form="of" num="plural">chhaanaader</derivative> > </info> > </entry> I would suggest only putting in the english synonym, or closest word - this is a question of size and interfacing. If we have a set of english synonyms we can then use that to link to an English-German dict say, or an English-Thai dict to have a bangla-thai dict for ex. If we start to put in translations for additional languages I think the file will become very large and slow to load. As it is, with the bangla word, bangla synonyms, antonyms, meanings and english synonyms I think we are going to deal with pretty large files for each bangla alphabet. Another issue to deal with is what we do with words that have no direct one word english equivalent. I couldn't get what "origin" means ? By plural="false" do you mean it doesn't have a plural form ? > I think we should handle derivative words here (and not have separate entries > for them. They can be generated from this). Sanskrit has very systematic > rules for 'shabdarup'. Bengali isn't as systematic, but there are still quite > general rules. We can formulate some rules and list down only derivative > words that are exceptions to that rule. We have the standard forms: > > to, by, for, from, of and in > > plus maybe plurals, the, a --- anything else ? This is fine, > Also, Bengali (unlike English) often has many words which mean exactly the > same thing. We might try to think of a way to have a single entry for all of > them. I would rather not. I'd say link it to the required word by putting that in the synonym, and in the <meaning> tag put in somethig like "see blah" > > Can anyone (preferably with a dictionary at hand) think of anything else ? > > > This is not very important right now, but what's a good format to store > pronunciation ? > unicode should do fine, there's a provision for the international phonetic alphabet http://www.unicode.org/charts/PDF/U0250.pdf so the next draft layout... <dictionary> <entry> <word_bn> chanaa </word_bn> <info pos="noun" plural="true" origin="??"> <pron>....</pron> <meaning_bn> baccha </meaning_bn> <synonym_bn>...</synonym_bn> <synonym_bn>...</synonym_bn> <antonym_bn>...</antonym_bn> <synonym_en>...</synonym_en> <synonym_en>...</synonym_en> <grammar> <derivative form="the">chhaanaaTaa,chhaanaaTi</derivative> <derivative form="of" num="singular">chhaanaaTir</derivative> <derivative form="of" num="plural">chhaanaader</derivative> </grammar> </info> <info pos="noun" plural="false" origin="??"> <pron>...</pron> <meaning_bn> khabar... </meaning_bn> </info> </entry> </dictionary> -kg ---- From: Deepayan Sarkar <deepayan@st...> - 2003-05-14 23:25 On Wednesday 14 May 2003 15:53, Kaushik Ghose wrote: > I would suggest only putting in the english synonym, or closest word - > this is a question of size and interfacing. If we have a set of english > synonyms we can then use that to link to an English-German dict say, or > an English-Thai dict to have a bangla-thai dict for ex. > If we start to put in translations for additional languages I think the > file will become very large and slow to load. Before we go any further, we need to decide how we are eventually planning to use the XML files. I don't think XML is a good format for use in any real application. For example, for a spell-checker to load the XML files directly would be very inefficient. Instead, the XML could be a repository of all possible information we might ever want to have. For a spell checker we could generate something that would contain only the words and nothing else (that could be a plain text file, or a database, could be in various different encodings and formats). Generating this from the XML may take a while, but if we do this once every two months or so, it shouldn't matter. Similarly for speech synthesis, we could extract only the actual word and its pronunciation, and leave everything else out. From that perspective, I don't think it should matter if the XML files become large. And of course we don't need to have a single file for each alphabet, we could split them as much as we want (maybe the first 3 letters identify each file) as long as given a word it's possible to identify which file that word belongs to. As for the translation, I'm not saying that we have to list translations in to all possible languages. But there's no harm in keeping the option. In fact, initially we won't even have english translations for the words that we already have. And as you point out, not all words will even have an English translation. All this wouldn't matter if we allow an arbitrary number (including 0) of instances of the <translation> tag for each word. The English->other language idea may not always be the best because there might be some words which have no proper english version, but could have, say, hindi versions. We could make it policy to include a non-english translation only when this is the case. But explicitly ruling out that opti on is not a good idea, I think. > As it is, with the bangla word, bangla synonyms, antonyms, meanings and > english synonyms I think we are going to deal with pretty large files for > each bangla alphabet. > > Another issue to deal with is what we do with words that have no direct > one word english equivalent. > > I couldn't get what "origin" means ? Basically tot-somo, tot-bhobo, dishi, bideshi, that sort of stuff. > By plural="false" do you mean it doesn't have a plural form ? Yes. > > I think we should handle derivative words here (and not have separate > > entries for them. They can be generated from this). Sanskrit has very > > systematic rules for 'shabdarup'. Bengali isn't as systematic, but there > > are still quite general rules. We can formulate some rules and list down > > only derivative words that are exceptions to that rule. We have the > > standard forms: > > > > to, by, for, from, of and in > > > > plus maybe plurals, the, a --- anything else ? > > This is fine, > > > Also, Bengali (unlike English) often has many words which mean exactly > > the same thing. We might try to think of a way to have a single entry f or > > all of them. > > I would rather not. I'd say link it to the required word by putting that > in the synonym, and in the <meaning> tag put in somethig like "see blah" Yes, that should be good enough. Maybe in those cases <word_bn>gabAkSha</word_bn> <info ...> <meaning_bn type="refer">jAnalA</meaning_bn> </info> > > Can anyone (preferably with a dictionary at hand) think of anything else > > ? > > > > > > This is not very important right now, but what's a good format to store > > pronunciation ? > > unicode should do fine, there's a provision for the international phonetic > alphabet > http://www.unicode.org/charts/PDF/U0250.pdf Cool. Does there exist a speech synthesizer which can work from this ? That way we could confirm that we enter the correct pronunciation. > so the next draft layout... > > > <dictionary> > <entry> > <word_bn> chanaa </word_bn> > <info pos="noun" plural="true" origin="??"> Since most words would have plural="true", we could omit that (the default would be "true"). > <pron>....</pron> > <meaning_bn> baccha </meaning_bn> > <synonym_bn>...</synonym_bn> > <synonym_bn>...</synonym_bn> Any problem with giving multiple synonyms comma separated ? > <antonym_bn>...</antonym_bn> > <synonym_en>...</synonym_en> > <synonym_en>...</synonym_en> I still think a translation tag with a language attribute would be more appropriate. > <grammar> > <derivative form="the">chhaanaaTaa,chhaanaaTi</derivative> > <derivative form="of" > num="singular">chhaanaaTir</derivative> > <derivative form="of" > num="plural">chhaanaader</derivative> > </grammar> > </info> > <info pos="noun" plural="false" origin="??"> > <pron>...</pron> > <meaning_bn> khabar... </meaning_bn> > </info> > </entry> > </dictionary> Otherwise looks OK (maybe an optional comment tag for each word), unless someone else can think of something. BTW, what's the use of the extra _bn for the tags (not that it matters) ? Deepayan ---- From: Kaushik Ghose <kghose@wa...> - 2003-05-15 02:57 Hiya, On Wed, 14 May 2003, Deepayan Sarkar wrote: > Before we go any further, we need to decide how we are eventually planning to > use the XML files. > > I don't think XML is a good format for use in any real application. For > example, for a spell-checker to load the XML files directly would be very > inefficient. > > Instead, the XML could be a repository of all possible information we might > ever want to have. For a spell checker we could generate something that would > contain only the words and nothing else (that could be a plain text file, or > a database, could be in various different encodings and formats). Generating > this from the XML may take a while, but if we do this once every two months > or so, it shouldn't matter. Similarly for speech synthesis, we could extract > only the actual word and its pronunciation, and leave everything else out. > > >From that perspective, I don't think it should matter if the XML files become > large. And of course we don't need to have a single file for each alphabet, > we could split them as much as we want (maybe the first 3 letters identify > each file) as long as given a word it's possible to identify which file that > word belongs to. > > As for the translation, I'm not saying that we have to list translations into > all possible languages. But there's no harm in keeping the option. In fact, > initially we won't even have english translations for the words that we > already have. And as you point out, not all words will even have an English > translation. All this wouldn't matter if we allow an arbitrary number > (including 0) of instances of the <translation> tag for each word. > Ok, that seems fine. The size of the files will matter for the GUI that does the dicto editing and any online collaboration tool we come up with for creating the dicto, but yes, we'll have automated tools to create (like you, may be on the first of every two months) separate file clusters for spell checkers, theasauri etc. which can be more compacted. Now, for the translation. Are we looking to put in one word that can link this bangla word to a word in some other dicto ? Or are we looking to give a translation of it ? For that we can probably end up with two sets of tags. <synonym lang ="">...</synonym> <meaning lang ="">...</meaning> where synonym is the one word thingy, meaning is well a paragraph or so. > Yes, that should be good enough. Maybe in those cases > > <word_bn>gabAkSha</word_bn> > <info ...> > <meaning_bn type="refer">jAnalA</meaning_bn> > </info> Yes, good idea, I'd prefer a separate tag <refer> which would do this job. we could do it via synonyms too, may be everything... > Cool. Does there exist a speech synthesizer which can work from this ? That > way we could confirm that we enter the correct pronunciation. Didn't go much through it but here's a promising site http://www.vorde.org/prodVordeTech/documents/vorde/split/node28.html > > so the next draft layout... > > > > > > <dictionary> > > <entry> > > <word_bn> chanaa </word_bn> > > <info pos="noun" plural="true" origin="??"> > > Since most words would have plural="true", we could omit that (the default > would be "true"). > > > <pron>....</pron> > > <meaning_bn> baccha </meaning_bn> > > <synonym_bn>...</synonym_bn> > > <synonym_bn>...</synonym_bn> > > Any problem with giving multiple synonyms comma separated ? > > > <antonym_bn>...</antonym_bn> > > <synonym_en>...</synonym_en> > > <synonym_en>...</synonym_en> Yeah, I couldn't figure out if commas would tell the parser these are separate instances, or just one big glob of text, so I played it safe... > I still think a translation tag with a language attribute would be more > appropriate. Yes. > > <grammar> > > <derivative form="the">chhaanaaTaa,chhaanaaTi</derivative> > > <derivative form="of" > > num="singular">chhaanaaTir</derivative> > > <derivative form="of" > > num="plural">chhaanaader</derivative> > > </grammar> > > </info> > > <info pos="noun" plural="false" origin="??"> > > <pron>...</pron> > > <meaning_bn> khabar... </meaning_bn> > > </info> > > </entry> > > </dictionary> > > Otherwise looks OK (maybe an optional comment tag fr each word), unless > someone else can think of something. > > BTW, what's the use of the extra _bn for the tags (not that it matters) ? Yeah, that should get replaced by the lang tag. so here it is (hopefully I remembered everything) <dictionary> <entry> <word>...</word> <info pos="noun" plural="false" orign="." date="."> <pron>...</pron> <synonym lang="bn">...</synonym> <synonym lang="bn">...</synonym> <antonym lang="bn">...</antonym> <synonym lang="en">...</synonym> <meaning lang="bn">...</meaning> <meaning lang="en">...</meaning> <grammar> <derivative form="the" num="singular">...</derivative> </grammar> </info> </entry> </dictionary> I'll make a DTD and see if I can make a GUI for it... -kg ---- From: Deepayan Sarkar <deepayan@st...> - 2003-05-15 04:13 On Wednesday 14 May 2003 21:56, Kaushik Ghose wrote: > Ok, that seems fine. The size of the files will matter for the GUI that > does the dicto editing and any online collaboration tool we come up with > for creating the dicto, but yes, we'll have automated tools to create > (like you, may be on the first of every two months) separate file clusters > for spell checkers, theasauri etc. which can be more compacted. Yes, we do need to plan ahead so that individual files don't get very big. Since the main purpose of the GUI is to enter new words and edit existing words, the only requirement is that given a word we should be able figure out which file it should be in. That way, if the file doesn't exist, the program could create a blank instance of the XML document object, and if it does exist, parse it and read it into memory. As for the file structure, we could consider a separate directory for each starting character, then one file for each combination of first 3 letters (I'm not sure what the best way to name these files would be). But we may need to adjust this depending on how many files per directory and how many words per file this would make. Could you run through the existing words and get an estimate (basically count combinations of first 3 characters) ? > Now, for the translation. Are we looking to put in one word that can link > this bangla word to a word in some other dicto ? Or are we looking to give > a translation of it ? For that we can probably end up with two sets of > tags. > > <synonym lang ="">...</synonym> > <meaning lang ="">...</meaning> > > where synonym is the one word thingy, meaning is well a paragraph or so. Again, no harm in keeping the option (that way, we could potentially have a bengali to english dictionary as well as a bengali to bengali). > > Yes, that should be good enough. Maybe in those cases > > > > <word_bn>gabAkSha</word_bn> > > <info ...> > > <meaning_bn type="refer">jAnalA</meaning_bn> > > </info> > > Yes, good idea, I'd prefer a separate tag <refer> which would do this job. > we could do it via synonyms too, may be everything... OK. > > Any problem with giving multiple synonyms comma separated ? > > > > > <antonym_bn>...</antonym_bn> > > > <synonym_en>...</synonym_en> > > > <synonym_en>...</synonym_en> > > Yeah, I couldn't figure out if commas would tell the parser these are > separate instances, or just one big glob of text, so I played it safe... The comma is not special in XML, so it would be interpreted as a single long string. But we could always interpret them correctly inside applications. Anyway, it's not that important. > so here it is (hopefully I remembered everything) > > <dictionary> > <entry> > <word>...</word> > <info pos="noun" plural="false" orign="." date="."> What's date ? The last modification time ? > <pron>...</pron> > <synonym lang="bn">...</synonym> > <synonym lang="bn">...</synonym> > <antonym lang="bn">...</antonym> > <synonym lang="en">...</synonym> > <meaning lang="bn">...</meaning> > <meaning lang="en">...</meaning> > <grammar> > <derivative form="the" > num="singular">...</derivative> > </grammar> > </info> > </entry> > </dictionary> > > I'll make a DTD and see if I can make a GUI for it... Great. I have done this sort of programming in Python, but not C++. I might be able to help once you get something going. I think it might be useful to start by writing a class to represent a single XML file, with methods to add and modify tags (rather than directly accessing the XML document object all the time). That way, if there are minor changes in the DTD, we just need to modify this class. Deepayan ---- From: Kaushik Ghose <kghose@wa...> - 2003-05-16 15:07 <?xml version="1.0"?> <!ELEMENT dictionary (entry*)> <!ELEMENT entry (word, info*) > <!ELEMENT word (#CDATA)> <!ELEMENT info (refer?,pron?, synonym?,antonym?,meaning?,grammar?)> <!ATTLIST info pos (n|adj|v|adv) "n" plural (true|false) "false" origin CDATA #DEFAULT "????????????" date CDATA> <!ELEMENT refer (#CDATA)> <!ELEMENT pron (#CDATA)> <!ELEMENT synonym (#CDATA)> <!ATTLIST synonym lang CDATA #DEFAULT "bn"> <!ELEMENT antonym (#CDATA)> <!ATTLIST antonym lang CDATA #DEFAULT "bn"> <!ELEMENT meaning (#CDATA)> <!ATTLIST meaning lang CDATA #DEFAULT "bn"> <!ELEMENT grammar (derivative?)> <!ELEMENT derivative (#CDATA)> <!ATTLIST derivative form (the|of) "the" num (singular|plural) "singular"> also, to answer Deepayan's question by date I was thinking of date of origin, first use etc. Will potter with QT right now, I'm goign to hardcode the DTD structure, I can't think of a simple way of creating an editor that will parse the DTD and configure the GUI on the fly - fixed boxes for all teh element will be quicker for this size DTD PS. try the perl tool at http://www.sagehill.net/livedtd/download.html -kg </thread> |
From: Golam M. H. <gmh...@gm...> - 2009-05-13 14:19:51
|
Hi Salahuddin, On Wed, May 13, 2009 at 2:26 AM, salahuddin66 <sal...@gm...> wrote: > Yes, you are right. Personally I also think getxml.pl would be a > standard name. Now BDictXML import BDictSQL and use the existing db > function. > > File attached. Please feel free to let me know if you need any changes. :) First of all, it is coming out very nicely. I have one more suggestion. > <dict_entry_unedited id="7"> I would prefer not to have above element for the following reason. The client side parsing would need to have separate code for reading unedited entries. My suggestion would be to have <status>UNEDITED</status> for each entry. Currently, we have four separate status (1) EDITED (2) UNEDITED* (3) OBSOLETE (4) DELETED (It helps to undo delete easily by simply changing the status). Also, there is a unique "id" field for each dictionary entry. In case, you want to have editing capability in your client app then you need to send back this id. So something like this would be good to have <bdict_id>23232</bdict_id> for each entry. Given you are almost finishing up the spec, I would urge you to write down the final specifications that we can put up in Abhidhan page. This will help those who would want to write their own XML based app using Ankur dictionary. Here is the current dict_table elements `id` int(11) NOT NULL auto_increment, `en_word` text, `pos_tag` text, `en_lemma` text, `bn_pronunciation` text, `bn_word` text, `explanation` text, `example` text, `status` text, Finally, please make sure to put your name in the license headers and in the final XML specs. Best, Golam |
From: Abu Z. <za...@gm...> - 2009-05-13 05:36:32
|
You might also find it helpful to look at apertium dictionary format, which is also standard XML. Here is the link to svn for Nepalese Language (its the closest language to Bengali in apertium we have so far, and the Bengali pair is far from finished :( ) http://apertium.svn.sourceforge.net/svnroot/apertium/trunk/apertium-bn-en/. I have been working to find some standard tag sets for Bengali language, so far I'm also doing away with pen treebank tagsets, but I the future I might need to extend those, as for my project requirements. *However, I bellive penn treebank tagset to be sufficient for a general purpose dictionary format.* The attached file contains the Pen Treebank Tagset and also the bilingual ductioanry format from apertium. What I'd like to propose is instead of using <pos_tag>Verb, non-3rd person singular present</ pos_tag> you could create some definitions like verb, person, number, tense and then use them as the property for the specific entry. I'd be easier to parse in the future. On Wed, May 13, 2009 at 8:02 AM, Golam Mortuza Hossain <gmh...@gm...>wrote: > Hi, > > On Tue, May 12, 2009 at 5:13 PM, Salahuddin Pasha > <sal...@gm...> wrote: > > Basic work is already done, but we need to define a standard XML (XML > > DTD or XML Schema). > > Example: test XML output. > > > > <?xml version="1.0" encoding="utf-8"?> > > <dictionary> > > <search_results> > > <dict_entry id="1"> > > <en_word>read</en_word> > > <pos_tag>Noun, singular or mass</pos_tag> > > > Thanks a lot for your work. > > I should suggest that you also try to have an entry for PennTag > for Parts-of-Speech (pos) like "NN", "VV" etc. So something like > > <penn_tag>NN</penn_tag> > > This would be needed if Anubadok Online intreface needs to update its > database using your XML gateway of Ankur dictionary database. > > Cheers, > Golam > > > ------------------------------------------------------------------------------ > The NEW KODAK i700 Series Scanners deliver under ANY circumstances! Your > production scanning environment may not be a perfect world - but thanks to > Kodak, there's a perfect scanner to get the job done! With the NEW KODAK > i700 > Series Scanner you'll get full speed at 300 dpi even with all image > processing features enabled. http://p.sf.net/sfu/kodak-com > _______________________________________________ > Bengalinux-core mailing list > Ben...@li... > https://lists.sourceforge.net/lists/listinfo/bengalinux-core > -- Regards Abu Zaher Md. Faridee http://zaher14.blogspot.com/ --- Time heals every wound, but time itself is a wound that never heals. |
From: Golam M. H. <gmh...@gm...> - 2009-05-13 02:11:23
|
Hi, On Tue, May 12, 2009 at 5:13 PM, Salahuddin Pasha <sal...@gm...> wrote: > Basic work is already done, but we need to define a standard XML (XML > DTD or XML Schema). > Example: test XML output. > > <?xml version="1.0" encoding="utf-8"?> > <dictionary> > <search_results> > <dict_entry id="1"> > <en_word>read</en_word> > <pos_tag>Noun, singular or mass</pos_tag> Thanks a lot for your work. I should suggest that you also try to have an entry for PennTag for Parts-of-Speech (pos) like "NN", "VV" etc. So something like <penn_tag>NN</penn_tag> This would be needed if Anubadok Online intreface needs to update its database using your XML gateway of Ankur dictionary database. Cheers, Golam |
From: Salahuddin P. <sal...@gm...> - 2009-05-12 21:02:02
|
Dear all, I was working on অভিধান - Abhidhan for XML support. To enable various application and tools to utilize our dictionary. Basic work is already done, but we need to define a standard XML (XML DTD or XML Schema). Any suggestion or comments ? Example: test XML output. <?xml version="1.0" encoding="utf-8"?> <dictionary> <search_results> <dict_entry id="1"> <en_word>read</en_word> <pos_tag>Noun, singular or mass</pos_tag> <bn_word>পড়া</bn_word> </dict_entry> <dict_entry id="2"> <en_word>read</en_word> <pos_tag>Verb, base form</pos_tag> <bn_word>পড়া</bn_word> </dict_entry> <dict_entry id="3"> <en_word>read</en_word> <bn_pronunciation> উচ্চাঃ রীড</ bn_pronunciation> <pos_tag>Verb, non-3rd person singular present</ pos_tag> <bn_word>পাঠ করা</bn_word> </dict_entry> </search_results> </dictionary> regards salahuddin |
From: Progga <pro...@gm...> - 2009-05-12 13:09:23
|
On Mon, May 11, 2009 at 9:56 PM, Golam Mortuza Hossain <gmh...@gm...> wrote: > For Ankurian: Abu Zaher is an accepted candidate for Google summer of > Code [1] for porting Anubadok system to Apertium project [2], another > open-source MT project which has several other language pairs. Congtratulations to Abu Zaher :-) |
From: Golam M. H. <gmh...@gm...> - 2009-05-11 20:58:10
|
Hi, On Mon, May 11, 2009 at 10:15 AM, mak <mah...@gm...> wrote: > > On Mon, May 11, 2009 at 7:06 AM, Abu Zaher <za...@gm...> wrote: >> >> There is one issue that needs to be addressed. My mentor asked me if I >> could create a new group for this project so that we all can discuss the >> crucial matters about Bengali translation openly. This also includes the >> people from CRBLP. I was however, wondering if this could be done by >> inviting my mentor Francis Tyres and Kevin Donnelly to our Ankur core >> mailing list. Is it possible? Also note that we might need to be able to >> talk with people from CRBLP(Brac University), so we might need to invite >> them too (Dr Mumit Khan Sir and his associates). Would it be possible? >> My personal note is creating a new group is very cumbersome while we have >> such an active group with prominet members from Ankur already subscribed. > > it's a delicate issues. Mumit Khan and his accomplice (not associates) are > pro-M$. I don't want any of them to be subscribed in Ankur Core. Rather we > can have a separate mailing list from Ankur likewise our various l10n > projects on redhat/Gnome/KDE etc. > Personally, I don't have any problem with having the discussions on your project in Ankur-core list. However, as MAK said, other members may have their own opinion. I am forwarding your request to core mailing list. BTW, if you want, I will be happy to host the discussion in Anubadok's mailing list. I can invite the people whom you want to join the list. For Ankurian: Abu Zaher is an accepted candidate for Google summer of Code [1] for porting Anubadok system to Apertium project [2], another open-source MT project which has several other language pairs. Cheers, Golam [1] http://socghop.appspot.com/org/home/google/gsoc2009/apertium [2] http://www.apertium.org/ |
From: Golam M. H. <gmh...@gm...> - 2009-05-10 11:35:42
|
On Sun, May 10, 2009 at 12:33 AM, Shabab Mustafa <sha...@gm...> wrote: > The rendering of Padma cannot show the 'কার' and sometimes the 'য-ফলা" > properly. Is it a drawback of Padma? or the poor typeset of Anandabazar and > Bartaman? It's likely that Padma is at fault. Could you please send me a link to the page where you see those issue? Cheers, |
From: Shabab M. <sha...@gm...> - 2009-05-10 04:01:32
|
The rendering of Padma cannot show the 'কার' and sometimes the 'য-ফলা" properly. Is it a drawback of Padma? or the poor typeset of Anandabazar and Bartaman? On Sat, May 9, 2009 at 6:25 PM, Golam Mortuza Hossain <gmh...@gm...>wrote: > Hi, > > Some of you might find this update on Padama rendering bug for ABP > to be useful > > > http://methopath.wordpress.com/2009/05/09/improved-firefox-padma-for-reading-anandabazar-bartaman/ > > Cheers, > Golam > > > ------------------------------------------------------------------------------ > The NEW KODAK i700 Series Scanners deliver under ANY circumstances! Your > production scanning environment may not be a perfect world - but thanks to > Kodak, there's a perfect scanner to get the job done! With the NEW KODAK > i700 > Series Scanner you'll get full speed at 300 dpi even with all image > processing features enabled. http://p.sf.net/sfu/kodak-com > _______________________________________________ > Bengalinux-core mailing list > Ben...@li... > https://lists.sourceforge.net/lists/listinfo/bengalinux-core > -- Shabab Mustafa |
From: Deepayan S. <dee...@gm...> - 2009-05-09 17:26:32
|
On 5/9/09, Debayan Banerjee <deb...@gm...> wrote: > 2009/5/9 Deepayan Sarkar <dee...@gm...>: > > > Debayan, > > > > I have been meaning to ask you: is your character segmentation > > algorithm in a form that could be easily separated out? > > The segmentation algorithm can be found here > (http://tesseractindic.googlecode.com/files/clipmatra_pseudocode.pdf) But this is your original algorithm which segmented গ etc (at least for some fonts), isn't it? I thought you had an improved algorithm which works around some of those problems (or maybe I misunderstood your mail). > > If it could be > > easily done, I would like to try it out in BOCRA. Unfortunately, I > > don't think I will have enough time in the near future to figure out > > how ocropus/tesseract does things. > > > Kindly read the paragraph in this > > (http://hacking-tesseract.blogspot.com/2009/05/bengali-stats.html) > > post regarding reducing number of character classes to be trained. I > want to know if this is possible using BOCRA. No it's not. From the beginning, my design for BOCRA was based on the idea of on-the-fly training, because that's the only approach I thought was feasible given the combination of non-standard fonts and so many potential conjuncts. In most realistic examples, the number of conjuncts is actually quite limited. After accounting for the most common ones, the frequency of the rest are probably lower than normal OCR error rate anyway. -Deepayan |
From: Debayan B. <deb...@gm...> - 2009-05-09 15:20:06
|
2009/5/9 Deepayan Sarkar <dee...@gm...>: > Debayan, > > I have been meaning to ask you: is your character segmentation > algorithm in a form that could be easily separated out? The segmentation algorithm can be found here (http://tesseractindic.googlecode.com/files/clipmatra_pseudocode.pdf) > If it could be > easily done, I would like to try it out in BOCRA. Unfortunately, I > don't think I will have enough time in the near future to figure out > how ocropus/tesseract does things. Kindly read the paragraph in this (http://hacking-tesseract.blogspot.com/2009/05/bengali-stats.html) post regarding reducing number of character classes to be trained. I want to know if this is possible using BOCRA. > > -Deepayan > -- Regards, Debayan Banerjee Support Free Software http://deeproot.in |
From: Golam M. H. <gmh...@gm...> - 2009-05-09 12:50:32
|
Hi, Some of you might find this update on Padama rendering bug for ABP to be useful http://methopath.wordpress.com/2009/05/09/improved-firefox-padma-for-reading-anandabazar-bartaman/ Cheers, Golam |
From: Deepayan S. <dee...@gm...> - 2009-05-09 02:20:44
|
Debayan, I have been meaning to ask you: is your character segmentation algorithm in a form that could be easily separated out? If it could be easily done, I would like to try it out in BOCRA. Unfortunately, I don't think I will have enough time in the near future to figure out how ocropus/tesseract does things. -Deepayan |
From: srhaque <sr...@th...> - 2009-05-08 21:39:20
|
On Friday 08 May 2009, Debayan Banerjee wrote: > 2009/4/20 srhaque <sr...@th...>: > > BTW, if you still need my test file with conjunct samples, here it is... > > Thank you very much. They have proved *very helpful* :) > I preapred this > (http://hacking-tesseract.blogspot.com/2009/05/bengali-stats.html) > post with the help of your document. Cool. If it is of any use, then note that my Raga font also has glyphs for all the conjuncts (though I've not anything with the advanced tables to refine the font generally). I've been thinking about OCR for a little while too, and am doing some little experiments here and there based on trying to apply brute force to simple algorithms for deskewing/text-block extraction/segmentation. However, I'm a bit stuck for inspiration on that front for now, so if there is anything I can do to help *you*, please let me know. Thanks, Shaheed |