|
From: Abu Z. <za...@gm...> - 2009-05-13 05:36:32
|
You might also find it helpful to look at apertium dictionary format, which is also standard XML. Here is the link to svn for Nepalese Language (its the closest language to Bengali in apertium we have so far, and the Bengali pair is far from finished :( ) http://apertium.svn.sourceforge.net/svnroot/apertium/trunk/apertium-bn-en/. I have been working to find some standard tag sets for Bengali language, so far I'm also doing away with pen treebank tagsets, but I the future I might need to extend those, as for my project requirements. *However, I bellive penn treebank tagset to be sufficient for a general purpose dictionary format.* The attached file contains the Pen Treebank Tagset and also the bilingual ductioanry format from apertium. What I'd like to propose is instead of using <pos_tag>Verb, non-3rd person singular present</ pos_tag> you could create some definitions like verb, person, number, tense and then use them as the property for the specific entry. I'd be easier to parse in the future. On Wed, May 13, 2009 at 8:02 AM, Golam Mortuza Hossain <gmh...@gm...>wrote: > Hi, > > On Tue, May 12, 2009 at 5:13 PM, Salahuddin Pasha > <sal...@gm...> wrote: > > Basic work is already done, but we need to define a standard XML (XML > > DTD or XML Schema). > > Example: test XML output. > > > > <?xml version="1.0" encoding="utf-8"?> > > <dictionary> > > <search_results> > > <dict_entry id="1"> > > <en_word>read</en_word> > > <pos_tag>Noun, singular or mass</pos_tag> > > > Thanks a lot for your work. > > I should suggest that you also try to have an entry for PennTag > for Parts-of-Speech (pos) like "NN", "VV" etc. So something like > > <penn_tag>NN</penn_tag> > > This would be needed if Anubadok Online intreface needs to update its > database using your XML gateway of Ankur dictionary database. > > Cheers, > Golam > > > ------------------------------------------------------------------------------ > The NEW KODAK i700 Series Scanners deliver under ANY circumstances! Your > production scanning environment may not be a perfect world - but thanks to > Kodak, there's a perfect scanner to get the job done! With the NEW KODAK > i700 > Series Scanner you'll get full speed at 300 dpi even with all image > processing features enabled. http://p.sf.net/sfu/kodak-com > _______________________________________________ > Bengalinux-core mailing list > Ben...@li... > https://lists.sourceforge.net/lists/listinfo/bengalinux-core > -- Regards Abu Zaher Md. Faridee http://zaher14.blogspot.com/ --- Time heals every wound, but time itself is a wound that never heals. |