|
From: Mikel L. F. <ml...@dl...> - 2010-04-06 20:06:43
|
Dear Andries, thank you very much for your proposal and for your interest in Apertium. Here are some comments on your Apertium GSoC proposal. I am writing these comments offline on a plane, and they may overlap with other comments you might have received or with comments I might have made to other students. Also, some of the lines in your proposal got lost when I printed it. My apologies if my comments are not appropriate. You consider than "Apertium uses a simplified type of machine-translation" probably because it does not make explicit use of any kind of semantic representation, which seem to be one of your interests. This task will involve dealing with dictionary entries in a rather mechanical way: will you still be motivated to work on it? I like the fact that you plan to build a web-based interface. I don't see why you should need different dialog windows for each language (pair?), because you can easily parametrize and use the same user interface. Your proposal says that "The system will allow concurrent access to the dictionaries, locking only the words that are in course of being edited. The timestamps of the last edit for every entry will be kept in a separate file.". However, consider the possibility of not locking (and solving conflicts later, à la Subversion for instance). Only a certain subset of dictionary entries may easily be managed in a database way, but your proposal fails to identify which ones; in particular, I would go for single-word entries having easy bilingual correspondences. Your proposal should give more detail about the structure of dictionaries, about the ways you are going to parse them, and about how you are going to extract the part of the dictionary that will be managed using the database. Also, it is not clear to me how you plan to extract and present paradigm information (you cannot present the whole paradigm!) to the dictionary developers so that they can easily choose the appropriate paradigms. How would you deal with directionality in bilingual dictionary entries and non-one-to-one correspondences if any? There is already some code (in Java) that parses and extracts certain types of entries from existing language pairs to generate new language pairs, in apertium-dixtools, written mainly by Enrique Benimeli. Your proposal would be more credible if it made reference to it and give a bit of detail on whether you will be able to recycle code. In summary, your proposal is a bit too vague to convince us that you will be able to tackle the problem with a fair chance of success. It should make more contact with the existing information (in papers, in the wiki) about dictionary structure. Please consider that a dictionary management application that allows only to manage a certain restricted type of dictionary entries may still be welcome. I hope you have time to improve your proposal before the deadline! Best, Mikel En/na ANDRIES Mihai ha escrit: > *Name:* ANDRIES Mihai > *E-mail address:* mih...@gm... <mailto:mih...@gm...> > *Other information that may be useful to contact you: * > skype: frost_13 > *Why is it you are interested in machine translation?* > I plan to do research in Artificial Intelligence during my career. > My final goal being the "strong AI", i have spent the last 6 months > developing the Input/Output unit. > I work with the Deep-sense meaning of phrases. During this time, I > have developed a dictionary adapted for machine-translation, only to > find one month later that I had recreated the WordNet, without having > any prior knowledge about it. > *Why is it that they are interested in the Apertium project? * > * *Although Apertium uses a simplified type of machine-translation, > it is still a source for knowledge-through-practice enrichment. I > hope that this project will propel me forward in my quest for a strong > AI. > *Which of the published tasks are you interested in? * > I am interested in 3 tasks: > > * Easy dictionary maintenance > * Discontiguous multiwords > * Complex multiwords > > Spectie proposed that I should start a new language pair > (Russian-Romanian). > * > * > *What do you plan to do? * > * *I can realize any of these tasks or even two (if speaking about > the discontiguous and complex multiwords). > Meanwhile, i prefer to concentrate my efforts on the "easy > dictionary maintenance" project. > > *Include a proposal, including * > * * a title, * > * *"Easy dictionary maintenance" > * * reasons why Google and Apertium should sponsor it, * > * *This project is worth to be sponsored because this will > contribute to the project's development. > Classically, open-source projects lack a comfortable interface. > It is often difficult for the future members of the community to > contribute because of the time it takes to learn how to handle the > instruments used in the work, as they are usually non-intuitive. The > absence of a high-quality IDE and documentation for the existing one > is a major problem for a large number of open-source projects. > My c ontribution, if you agree to sponsor it, will fix this > situation for Apertium, at least in the area of dictionary building. > > * * a description of how and who it will benefit in society, * > * *The design of a web application for the maintenance of > dictionaries will catalyze the development of dictionaries, allowing > even non-trained personnel to contribute to the project.It is one of > the most productive tasks available in terms of obtained results. > * * and a detailed work plan (including, if possible, a brief > schedule with milestones and deliverables). * > I intend to gather as mush as possible information about all the > dialog windows that have to be implemented in order to edit the > dictionary. > The dialog windows will differ from one language to another, as > different features are present in different languages. > > Users will have to login before being able to make any edits. > > The system will allow concurrent access to the dictionaries, locking > only the words that are in course of being edited. > The timestamps of the last edit for every entry will be kept in a > separate file. > > The server will use a cache to limit the number of requests of the > hard disk. > It will also keep a list of words that are being edited, in order to > allow or to block the access to a specific word. > > The users will keep a local version of the list of words available for > edit. This list will be updated (and not totally re-downloaded) on > user demand. > > The first results should be already present in 3 weeks, with further > development continuing afterwards. > > *Include time needed to think, to program, to document and to > disseminate. * > * *Creating the concept of the application, clarifying user needs: > 10 days > Program the application and constant user feedback: 45 days > * *Document the application: 3 days > Disseminate: 7 days to inform the public through announcements on > the project's main page, IRC discussions and contact of companies > specialized in translations. > > *List your skills and give evidence of your qualifications. * > * *I am a 3rd year student at the University of Strasbourg, France. > I am graduating this year. > I have a vast experience of working with Java (including Java RMI > and CORBA). > I also took a course of "Human-Machine Interfaces" (content in > French: http://mathinfo.unistra.fr/mod_ens/ue.php?sem=39&ue=428 > <http://mathinfo.unistra.fr/mod_ens/ue.php?sem=39&ue=428> ). > > > I am familiar with PHP, MySQL, CSS, XML and HTML, in case > if any work on the website will be necessary. > > I am also familiar with SVN, Git and Mercurial. > > My native languages are Russian and Romanian. > I am fluent in English and French. > > *Tell us what is your current field of study, major, etc. * > I am majoring in Computer Science. > *Convince us that you can do the work. * > * *I can offer you a list of some of my previous works: > - a website tied with a MySQL database for keeping a > dictionary of librarian terms > http://swarm.altf4.ru/biblio/visitor.php > - a website for a bank (school > project) http://codd.u-strasbg.fr/~1mandries/ > <http://codd.u-strasbg.fr/%7E1mandries/> > - a Pacman done in > Java http://dsatlanta.org.ru/java/pacman/launch.html > *In particular we would like to know whether you have programmed > before in open-source projects. * > I have never programmed in open-source projects before. > *List any non-Summer-of-Code plans you have for the Summer, especially > employment and class-taking. * > There is a repeated exams session scheduled between 14 and 18 > June, in case I fail to pass some of the exams in May. > I also plan a nose surgery for the 21st of June. I will have to > stay in hospital for 2 days afterwards. > I am at your full disposition for all the rest of the summer. > *Be specific about schedules and time commitments. we would like to be > sure you have at least 30 free hours a week to develop for our > project. * * > * > * *I am able to dedicate 45 hours or more (~8 hours per day, 6 days > a week) for the project. > I am usually available online for around 12-14 hours a day. > > ------------------------------------------------------------------------ > > ------------------------------------------------------------------------------ > Download Intel® Parallel Studio Eval > Try the new software tools for yourself. Speed compiling, find bugs > proactively, and fine-tune applications for parallel performance. > See why Intel Parallel Studio got high marks during beta. > http://p.sf.net/sfu/intel-sw-dev > ------------------------------------------------------------------------ > > _______________________________________________ > Apertium-stuff mailing list > Ape...@li... > https://lists.sourceforge.net/lists/listinfo/apertium-stuff > |