Thread: [Presage-user] Standard location of presage data
the intelligent predictive text entry platform
Status: Beta
Brought to you by:
matteovescovi
From: Weng X. <we...@gm...> - 2012-05-10 05:21:01
Attachments:
signature.asc
|
Hi, Currently I'm using presage to implement word completion in my input method, I found presage it self provides several languages, but it doesn't provide some API like enchant to list supported language, I could do some manual search in my input method, but I'm not sure the path will changed in the future or not. So would presage provide some standard "supported language" API for doing this? Thanks! Regards |
From: Matteo V. <mat...@ya...> - 2012-05-16 21:15:55
|
Hi Weng, Presage is designed to provide support for any natural language. One core design feature of presage is the ability to make use of different sources of information and predictive algorithms to generate predictions. The sources of information and predictive algorithms used to generate a prediction are controlled by presage configuration. The default build of presage, for example, is configured to use three predictors: - the smoothed n-gram predictive algorithm with a language model that is generated from a single training text (the novel "The picture of Dorian Gray" by Oscar Wilde) - the abbreviation expansion predictor - the recency predictor The Presage.PredictorRegistry.PREDICTORS configuration variable controls what predictors should be instantiated. The elements nested within the Presage.Predictors element control the configuration of each instance of the predictors. By default, presage currently builds language models for the English, Italian, and Spanish languages. These language models are generated during the build from source and are installed in the share/presage/ directory of your presage installation. In order to switch to a different language, you need to modify the configuration used by presage to suit your needs. You can achieve that by editing your presage.xml configuration file. The configuration file is read from the following locations: 1. /etc/presage.xml if it exists 2. <installation_directory>/etc/presage.xml if it exists 3. ~/.presage.xml if it exists ($HOME/.presage/.presage.xml on Unix, %USERPROFILE%/.presage.xml on Windows) [1] Configuration values in 3. take priority over configuration values in 2. and 1. You can also modify the configuration programmatically through the Presage::config(...) calls or a variant of the Presage class constructor. The Predictor.SmoothedNgramPredictor.DBFILENAME config variable controls the language model database used by the Smoothed Ngram predictive plugin, i.e.: <Predictors> <SmoothedNgramPredictor> <DBFILENAME>/home/matt/presage_local_install/share/presage/database_en.db</DBFILENAME> For example, suppose that our goal is to enable support for the German language. In order to achieve that, you will want to modify this variable to point to a German language model. presage provides the tools to generate predictive resources required by the predictors to suit any natural language or specific users' needs. You can generate a German language model database using the text2ngram tool. The text2ngram tool generates n-gram language models from a given corpus of text. Ideally, you would collect a representative set of text (text that the user has produced or text that matches the writing style and context of the user) and then feed that to the text2ngram tool to generate a n-gram database (1-gram, 2-gram and 3-gram tables, but you can also generate higher order n-gram and set the SmoothedNgram DELTAS values accordingly). Obviously, the higher n-gram order used, the higher the risk of overtraining the model and overfitting the training corpus (wikipedia has good articles about machine learning and overfitting). Assuming you have a German training text corpus in a file named training_de.txt, you can generated a 3-gram language model database for use with the smoothed n-gram predictor by running the following commands: for i in 1 2 3; do text2ngram -n $i -l -f sqlite -o database_de.db training_de.txt; done So, to sum it up, Presage does not provide an API to query what language models are available. This is controlled through configuration by the application invoking on presage. Hope this helps, - Matteo [1] It is likely that future presage releases (starting from presage 0.8.8) will use ~/.presage/presage.xml instead of ~/.presage.xml. On 10/05/12 06:20, Weng Xuetian wrote: > > Hi, > > Currently I'm using presage to implement word completion in my input > method, I found presage it self provides several languages, but it > doesn't provide some API like enchant to list supported language, I > could do some manual search in my input method, but I'm not sure the > path will changed in the future or not. > > So would presage provide some standard "supported language" API for > doing this? Thanks! > > Regards > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > > > _______________________________________________ > Presage-user mailing list > Pre...@li... > https://lists.sourceforge.net/lists/listinfo/presage-user > |