presage-devel Mailing List for Presage

the intelligent predictive text entry platform

Status: Beta

Brought to you by: matteovescovi

presage-devel — presage developers mailing list

You can subscribe to this list here.

2010	_Jan	_Feb	_Mar	_Apr	_May (3)	_Jun	_Jul	_Aug	_Sep	_Oct	_Nov	_Dec
2015	_Jan	_Feb	_Mar	_Apr	_May	_Jun	_Jul	_Aug	_Sep	_Oct	_Nov (1)	_Dec (1)
2016	_Jan	_Feb	_Mar	_Apr	_May	_Jun	_Jul (1)	_Aug	_Sep	_Oct	_Nov	_Dec
2018	_Jan	_Feb	_Mar (1)	_Apr	_May	_Jun	_Jul	_Aug	_Sep	_Oct	_Nov	_Dec

[Presage-devel] Presage development

From: rinigus <rin...@gm...> - 2018-03-14 07:13:01

Dear Matteo and Presage developers:

as a part of incorporation of Presage into Sailfish OS keyboard started by
@martonmiklos, I've got engaged in Presage development as well. To make it
easier for us, I forked Presage repo into GitHub. The forked version is
available at https://github.com/sailfish-keyboard/presage and has few new
features:

* On the basis of SQLite predictor, I wrote the predictor that is using
MARISA database together with the raw counts file to represent n-grams.
This is read-only implementation, but is much faster than the one using
SQLite (ballpark is about 10x faster) and significantly smaller for the
same number of stored n-grams

* On the basis of dictionary predictor, Hunspell predictor has been written

* Ability to forget learned words

* Some packaging scripts for Sailfish added in packaging

All changes can be viewed at
https://github.com/rinigus/presage/compare/upstream...master .

We are planning to add Unicode support to Presage in future, so string
normalization would be done via Unicode normalization, not lower casing as
supported by Presage right now. Should also help with the tokenization.

It looks to me that Presage development has slowed down. However, I wonder
whether our changes would be of interest to the upstream and whether you
would like to incorporate them?

Best wishes,

Rinigus

[Presage-devel] FYI: presage: FTBFS with GCC 6: narrowing conversion

From: HAYASHI K. <ke...@gm...> - 2016-07-07 01:31:37

Hi,

On Debian BTS, FTBFS with GCC 6: narrowing conversion issue is reported by
Martin Michlmayr.

Here is the issue.
  https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=811758

And patch file is attached to solve above issue.
  https://bugs.debian.org/cgi-bin/bugreport.cgi?att=1;bug=811758;filename=fix-bug-811758-gcc6.patch;msg=12

I'm not familiar with presage internals, but please
consider to merge above patch or refine it before merging it.

Regards,
-- 

Kentaro Hayashi <ke...@gm...>

Re: [Presage-devel] help with presage ARM / Universal Windows

From: Matteo V. <mat...@ya...> - 2015-12-15 14:49:29

Hi Moshe,
presage can be used in traditional Windows .NET applications. Take a look at the bindings/csharp/ directory in the source repository, which contains the .NET bindings that allows presage to be accessed by any CLR language, including C#. This directory also contains the implementation of a presage WCF service and a simple presage_csharp_demo application that uses the .NET binding.
I must admit I don't know much about Windows Universal Applications. How did you build presage? presage is written in C++ and provides a number of language bindings, including the .NET binding that you would use to call into presage from a C# application. I'm guessing that to get this to work in an UWP application you would need to have a native build of presage for the ARM architecture you are targeting.
presage has been built and is known to work on ARM. However, the only ARM builds I am aware of were on Debian using the GCC C++ compiler, not on Windows.

Cheers,- Matteo

On Tuesday, 15 December 2015, 14:28, Moshe Hoori <moshe.hoori@algo.team> wrote:

Hi,

My name is Moshe, and I'm taking a part in a social initiative for ALS patients.

I'm trying to use presage on windows10 (phone/tablet) platform (x86/ARM) platforms. and I have some questions:

1. When calling the DLL from a Universal Window C# App - I get rc=7 (thrown exception) from the presage_new function from the DLL. what does this mean? I read the CPP code and I think it has to do something withe the predictor initiation .
2. Was presage ever compiled for ARM? do you expect building for ARM?

Thank you Very Very much,

Moshe Hoori

------------------------------------------------------------------------------

_______________________________________________
Presage-devel mailing list
Pre...@li...
https://lists.sourceforge.net/lists/listinfo/presage-devel

[Presage-devel] help with presage ARM / Universal Windows

From: Moshe H. <mos...@al...> - 2015-11-30 22:03:18

Hi,

My name is Moshe, and I'm taking a part in a social initiative for ALS
patients.

I'm trying to use presage on windows10 (phone/tablet) platform (x86/ARM)
platforms. and I have some questions:

1. When calling the DLL from a *Universal Window* C# App - I get rc=7
(thrown exception) from the presage_new function from the DLL. what does
this mean? I read the CPP code and I think it has to do something withe the
predictor initiation .
2. Was presage ever compiled for ARM? do you expect building for ARM?


Thank you *Very Very much,*

Moshe Hoori

Re: [Presage-devel] New developments on Caribou

From: Matteo V. <mat...@ya...> - 2010-05-19 09:43:54

Hi,

marmuta wrote:
>> I think there is scope to join forces between presage and onboard.
>>
>> presage is architected to merge predictions generated by a set of 
>> predictors. Each predictor uses a different language model/predictive 
>> algorithm to generate predictions.
>>
>> Currently presage provides the following predictors:
>> ARPA predictor: statistical language modelling data in the ARPA
>> N-gram format
>> generalized smoothed n-gram statistical predictor: generalized
>> smoothed n-gram statistical predictor can work with n-gram of
>> arbitrary cardinality recency predictor: based on recency promotion
>> principle dictionary predictor: generates a prediction by returning
>> tokens that are a completion of the current prefix in alphabetical
>> order abbreviation expansion predictor: maps the current prefix to a
>> token and returns the token in a prediction with a 1.0 probability
>> dejavu predictor: learns and then later reproduces previously seen
>> text sequences.
>>
>> A bit more information on how these predictors work is available
>> here: http://presage.sourceforge.net/?q=node/15
>>
>>
>> It sounds like the language model and predictive algorithm used in
>> the onboard word-prediction branch is an ideal candidate to be
>> integrated into presage and become a new presage predictor class.
>>     
> Pretty interesting stuff, but from looking over its feature list I'm
> wondering what presage would gain. There doesn't seem to be much
> onboards prediction could add that isn't implemented already. 
>
> Roughly compared, gpredict (name is subject to change) covers
> these presage components:
>
> - generalized smoothed n-gram statistical predictor 
> - recency predictor (with exponential falloff)
> - dictionary predictor (word completion)
> - dejavu predictor? (if it does continuous on-line learning)
>
> The main difference, apart from the general architecture, may be that
> gpredict uses dynamically updatable language models, handy for on-line
> learning. I'm not completely sure, but it seems presage's three n-gram
> predictors are based on immutable models and the dejavu predictor keeps
> a separate adaptable model of unigrams.
>   

The generalized smoothed n-gram predictor does continuous on-line 
learning (learning can be turned on or off at runtime or via 
configuration). When learning is turned on, the language model is 
updated on the fly with new n-gram counts.

The dejavu predictor is just a toy predictor, really. I wrote it to try 
things out when I started implemented continuous online learning 
functionality and it now serves as simple example of how to implement a 
learning predictor class.

Similarly, the smoothed count predictor and the 3-gram smoothed 
predictor are remnants from a time when I was experimenting with 
language models and really are building steps towards the generalized 
smoothed n-gram predictor, which is currently the main statistical 
predictor (along with the ARPA predictor).

>> presage could then be the engine used to power the d-bus prediction 
>> service, offering the predictive capabilities of the onboard language 
>> model/predictor, plus all the predictors currently provided by
>> presage (all of which can be turned on/off and configured to suit
>> individual needs).
>>     
> The modularity could be helpful, even though I'm not sure if I could
> really make use of it.
>
> We were very concerned about memory usage and had initially thought
> about using static ARPA compatible structures for large immutable
> language models and dynamically updatable models only for on-line
> learning. However later the dynamic models turned out to be almost as
> efficient as the ARPA implementation and so now there are (flavors of)
> dynamic models for everything.
>
> Similar consolidation happened with recency caching. It was originally
> planned as a separate modular component. However that would have meant
> redundant storage of n-grams and a forced limit to some arbitrarily
> small number of recent n-grams. So I had it integrate more closely with
> the generic dynamic models, gaining recency tracking across all known
> n-grams but sacrificing some modularity (there is still variability
> through inheritance though).
>   

If onboard's current predictive functionality was merged into presage 
and encapsulated into a (say, for lack of a better name) 
OnboardPredictor class, then presage's modularity would be useful 
because it would allow us to:
- replicate exactly the same predictive functionality of current 
gpredict service, by switching on OnboardPredictor and turning off other 
predictors
- augment OnboardPredictor predictive functionality with other 
predictors currently provided by presage, as desired by onboard or the 
user, simply by modifying a config variable.

Presage would definitely benefit from having a new and high-quality 
predictor in its core.

>> The presage core library itself has minimal dependencies: it pretty
>> much only needs a C++ runtime and sqlite, which is used as the
>> backing store for n-gram based language models (this ensure fast
>> access, minimum memory footprint and no delays while loading the
>> language model in memory).
>>     
> That is definitely an advantage as gpredict currently takes around 5s
> (@3GHz) to load the english base model with ~1.4 million n-grams.
> Memory usage may or may not be an issue, the D-Bus service with only
> English as the resident language takes around 30MB.
>   

I trained presage's smoothed n-gram predictor language model on the text 
corpora currently using by gpredict to yield a language model with ~1.2 
million n-grams, compared to presage default language model, which is 
trained on a single text (namely the Picture of Dorian Gray), totaling 
about  ~75000 n-grams.

The increase in prediction time and resident memory required on a 
control text is very small compared to the increase in n-grams:
~75 thousands n-grams -- prediction time: ~7 seconds, resident memory 
size: ~3MB
~1.2 millions n-grams -- prediction time: ~17 seconds, resident memory 
size: ~5MB

This preliminary testing shows that prediction time and memory 
consumption does not grow linearly with the number of n-grams.

> That said, when I first saw presage, I wasn't too happy about its sqlite
> dependency. Sqlite often means frequent hard drive accesses and a choice
> between general slowness due to generous fsync'ing or all bets off
> concerning data security. That may be unfounded prejudice in this
> case and perhaps presage has all that overcome. I didn't do any real
> world testing with it.
>   

Yes, that's the trade-off to have the language model on disk rather than 
in memory. There's advantages and disadvantages to having the lm reside 
in memory or on disk.

The great thing about it is that, strictly speaking, it's not presage 
that has a dependency on sqlite, but rather the individual predictors 
that store their language model in an sqlite database. In other words, 
the dependency on sqlite could be removed from the presage library 
itself, and moved to the smoothed n-gram predictor. This would be very 
little work (a 10 minutes job I believe).

In practice, I found sqlite very fast and reliable. Presage database 
connector layer encloses all writes to the database (and reads too, for 
that matter) in transactions, which guarantees atomicity of updates to 
the language model.

>>> For details about the word prediction service, please contact
>>> marmuta that did nearly all the work about the word prediction
>>> service.  
>>>       
>> I'll follow up with marmuta to discuss the feasibility of making this 
>> happen and work out the technical details, in case there is consensus
>> to go ahead with this.
>>     
> I'm happy to further discuss this, even though I'm a bit torn currently.
>
> I can see the appeal of having presage (or other candidates like nltk)
> be the central repository for all kinds of prediction needs. On the
> other hand the advantages of merging gpredict into presage don't seem
> to be that obvious. Most of the functionality does exist already in
> presage and from onboards point of view using presage appears to
> currently gain it little except for new dependencies.
>   

I need to look at gpredict language model and predictive algorithm in 
more detail, but I currently believe that presage will benefit from 
having a new predictor available, which can be turned on and combined 
with the existing predictors.

onboard would benefit from having access to presage's other predictors, 
which can be configured on or off and customized by the user (i.e. 
abbreviation expansion predictor).

> Also onboard's prediction service was already meant to be a full
> featured standalone word predictor. It is largely working as planned
> and we were going to split it off from onboard as a ready-to-use D-Bus
> service soon. Rebasing on presage at this point would probably delay
> things considerably for onboard. Not sure yet if this is the right
> thing to do, but I'm open for pro-arguments.
>   

Well, I understand the concerns about delaying things for onboard, but I 
think there are significant benefits in integrating gpredict and presage 
together and building a prediction D-Bus service on presage.

Perhaps we could start with trying onboard with the presage D-Bus 
service that David has created, while we integrate gpredict into presage 
(basically, it would mean moving the C++ code into it class implementing 
a Predictor interface). I'm willing to help with this.

Cheers,
- Matteo

[Presage-devel] Welcome

From: Vescovi, M. <mat...@pr...> - 2010-05-18 13:40:47

Welcome to the presage-devel mailing list!

- Matteo Vescovi

[Presage-devel] Welcome

From: Matteo V. <mat...@ya...> - 2010-05-18 13:26:26

Welcome to presage-devel mailing list!

- Matteo Vescovi

Flat | Threaded

2010	Jan	Feb	Mar	Apr	May (3)	Jun	Jul	Aug	Sep	Oct	Nov	Dec
2015	Jan	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov (1)	Dec (1)
2016	Jan	Feb	Mar	Apr	May	Jun	Jul (1)	Aug	Sep	Oct	Nov	Dec
2018	Jan	Feb	Mar (1)	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec