Screenshot instructions:
Windows
Mac
Red Hat Linux
Ubuntu
Click URL instructions:
Rightclick on ad, choose "Copy Link", then paste here →
(This may not be possible with some types of ads)
From: W. Trevor King <wking@tr...>  20130316 20:41:36
Attachments:
signature.asc

I'm using MDP to do factor analysis on some survey data, but some results are missing (i.e. the responder skipped a question). I load the data into a masked array with: scores = numpy.genfromtxt(path, delimiter='\t') masked_scores = numpy.ma.array(scores, mask=numpy.isnan(scores)) however, the node's covariance accumulator (mdp.utils.CovarianceMatrix) doesn't seem to handle the masked array very well, returning a matrix containing quite a few NaNs. NumPy *can* compute these covariances with: numpy.ma.cov(masked_scores, rowvar=False, allow_masked=True, ddof=1) but I haven't looked into how to work support for such masked arrays into CovarianceMatrix. As an MDP newbie, I thought I'd ask for feedback from some more experienced folks before diving in, so: * Am I just missing something obvious? * Would converting to masked array handling be acceptable, or should we follow NumPy in maintaining duplicate functionality in a subnamespace? :( Thanks, Trevor  This email may be signed or encrypted with GnuPG (http://www.gnupg.org). For more information, see http://en.wikipedia.org/wiki/Pretty_Good_Privacy 
From: Pietro Berkes <berkes@ga...>  20130317 22:07:55
Attachments:
Message as HTML

Hi Trevor, On Sat, Mar 16, 2013 at 8:41 PM, W. Trevor King <wking@...> wrote: > I'm using MDP to do factor analysis on some survey data, but some > results are missing (i.e. the responder skipped a question). I load > the data into a masked array with: > > scores = numpy.genfromtxt(path, delimiter='\t') > masked_scores = numpy.ma.array(scores, mask=numpy.isnan(scores)) > > however, the node's covariance accumulator > (mdp.utils.CovarianceMatrix) doesn't seem to handle the masked array > very well, returning a matrix containing quite a few NaNs. NumPy > *can* compute these covariances with: > > numpy.ma.cov(masked_scores, rowvar=False, allow_masked=True, ddof=1) > > but I haven't looked into how to work support for such masked arrays > into CovarianceMatrix. As an MDP newbie, I thought I'd ask for > feedback from some more experienced folks before diving in, so: > > * Am I just missing something obvious? > No, you are right: MDP does not support masked arrays at the moment. > * Would converting to masked array handling be acceptable, or should > we follow NumPy in maintaining duplicate functionality in a > subnamespace? :( > For this specific case, we cannot use numpy.ma.cov directly, as it is important that we keep the core feature of CovarianceMatrix, namely the fact that it is possible to compute it with batches of data instead of all the data ar once. That's useful for example in the case were the input data does not fit into memory at once. Of course, with a bit of work it should be possible to do this, and we'd be happy to consider a patch that allows using masked arrays in a transparent way... Best, Pietro > Thanks, > Trevor > >  > This email may be signed or encrypted with GnuPG (http://www.gnupg.org). > For more information, see http://en.wikipedia.org/wiki/Pretty_Good_Privacy > > >  > Everyone hates slow websites. So do we. > Make your web apps faster with AppDynamics > Download AppDynamics Lite for free today: > http://p.sf.net/sfu/appdyn_d2d_mar > _______________________________________________ > mdptoolkitusers mailing list > mdptoolkitusers@... > https://lists.sourceforge.net/lists/listinfo/mdptoolkitusers > > 
From: W. Trevor King <wking@tr...>  20130318 21:14:34
Attachments:
signature.asc

On Sun, Mar 17, 2013 at 10:07:27PM +0000, Pietro Berkes wrote: > On Sat, Mar 16, 2013 at 8:41 PM, W. Trevor King <wking@...> wrote: > > I'm using MDP to do factor analysis on some survey data, but some > > results are missing (i.e. the responder skipped a question). > > … > > * Would converting to masked array handling be acceptable, or should > > we follow NumPy in maintaining duplicate functionality in a > > subnamespace? :( > > For this specific case, we cannot use numpy.ma.cov directly, as it is > important that we keep the core feature of CovarianceMatrix, namely the > fact that it is possible to compute it with batches of data instead of all > the data ar once. That's useful for example in the case were the input data > does not fit into memory at once. > > Of course, with a bit of work it should be possible to do this, and we'd be > happy to consider a patch that allows using masked arrays in a transparent > way... I've been working up a patch, but I'm currently stuck on the centering component (removing the squared mean). Usually you'd have: cov(X) = E[ (X  E[X])^T ⋅ (X  E[X]) ] = E[X^T ⋅ X]  E[X^T] E[X]  E[X]^T E[X] + E[X]^T E[X] = E[X^T ⋅ X]  E[X]^T E[X] However, with NumPy's masked arrays, (X  E[X]) effectively copies the maske from X onto E[X] (masking previously valid entries in E[X]), so the usually equivalent terms containing E[X] are all slightly different: >>> import numpy >>> X = numpy.ma.array([[1,2],[2,1],[1,2]], dtype=float, mask=False) >>> X.mask[2,1] = True >>> print(X) [[1.0 2.0] [2.0 1.0] [1.0 ]] >>> E = numpy.ma.ones(X.shape, dtype=float) >>> E.mask = X.mask >>> E *= X.mean(axis=0) >>> print(E.round(3)) [[1.333 1.5] [1.333 1.5] [1.333 ]] >>> XtX = numpy.ma.dot(X.T, X) >>> XtE = numpy.ma.dot(X.T, E) >>> EtX = numpy.ma.dot(E.T, X) >>> EtE = numpy.ma.dot(E.T, E) >>> print(XtX.round(3)) [[6.0 4.0] [4.0 5.0]] >>> print(XtE.round(3)) [[5.333 4.5] [4.0 4.5]] >>> print(EtX.round(3)) [[5.333 4.0] [4.5 4.5]] >>> print(EtE.round(3)) [[5.333 4.0] [4.0 4.5]] Just for sanity, I can reproduce NumPy's masked covariance from these terms (with an additional scaling term that depends on the masking history [1]. >>> ddof = 1 >>> cov = numpy.ma.cov(X, rowvar=False, ddof=ddof) >>> print(cov.round(3)) [[0.333 0.5] [0.5 0.5]] >>> xnotmask = numpy.logical_not(X.mask).astype(int) >>> fact = numpy.dot(xnotmask.T, xnotmask) * 1.  ddof >>> my_cov = (XtX  XtE  EtX + EtE) / fact >>> print(my_cov.round(3)) [[0.333 0.5] [0.5 0.5]] >>> print((my_cov  cov).max()) # doctest: +ELLIPSIS 1.665...e16 The currently accumulated `_cov_mtx` is `XtX`, and I can reconstruct `EtE` and `fact`, but I'm having trouble figuring out a good way to get `XtE` and `EtX`. The problem with accumulating them is that you need the means, which you obviously don't have until you get to the “fix” stage. When you get a masked value, you'd have to reach back into time and zero accumulated terms that need to be masked for the eventual `XtE` and `EtX` calculations. I can spend some more time with a pencil and my thinking cap, but I wondered if someone else could bail me out ;). Thoughts? Trevor [1]: https://github.com/numpy/numpy/blob/master/numpy/ma/extras.py#L1319  This email may be signed or encrypted with GnuPG (http://www.gnupg.org). For more information, see http://en.wikipedia.org/wiki/Pretty_Good_Privacy 
From: Tiziano Zito <tiziano.zito@bc...>  20130320 11:36:41

Hi Trevor, if I understand you correctly, you are trying to implement a CovarianceMatrix which is able to deal with masked arrays *and* that can be learned in batches. While I fully appreciate this effort and would *extremely* welcome a patch in this direction, I was wondering why you don't simply write a subclass of CovarianceMatrix that just cumulates the input while `update` is called, and then internally calls numpy.ma.cov when `fix` is called. You could then subclass FANode. I think it would be sufficient to overwrite the `__init__` method to use your enhanced CovarianceMatrix instead of the standard one. Yes, this would mean that if your input data does not fit in memory you can not use your node, but maybe this is not really a problem in your case? On the other hand, if you manage to get a proper solution out of your pencil, I would appreciate it even more ;) Ciao, Tiziano On Mon 18 Mar, 17:14, W. Trevor King wrote: > On Sun, Mar 17, 2013 at 10:07:27PM +0000, Pietro Berkes wrote: > > On Sat, Mar 16, 2013 at 8:41 PM, W. Trevor King <wking@...> wrote: > > > I'm using MDP to do factor analysis on some survey data, but some > > > results are missing (i.e. the responder skipped a question). > > > … > > > * Would converting to masked array handling be acceptable, or should > > > we follow NumPy in maintaining duplicate functionality in a > > > subnamespace? :( > > > > For this specific case, we cannot use numpy.ma.cov directly, as it is > > important that we keep the core feature of CovarianceMatrix, namely the > > fact that it is possible to compute it with batches of data instead of all > > the data ar once. That's useful for example in the case were the input data > > does not fit into memory at once. > > > > Of course, with a bit of work it should be possible to do this, and we'd be > > happy to consider a patch that allows using masked arrays in a transparent > > way... > > I've been working up a patch, but I'm currently stuck on the centering > component (removing the squared mean). Usually you'd have: > > cov(X) = E[ (X  E[X])^T ⋅ (X  E[X]) ] > = E[X^T ⋅ X]  E[X^T] E[X]  E[X]^T E[X] + E[X]^T E[X] > = E[X^T ⋅ X]  E[X]^T E[X] > > However, with NumPy's masked arrays, (X  E[X]) effectively copies the > maske from X onto E[X] (masking previously valid entries in E[X]), so > the usually equivalent terms containing E[X] are all slightly > different: > > >>> import numpy > >>> X = numpy.ma.array([[1,2],[2,1],[1,2]], dtype=float, mask=False) > >>> X.mask[2,1] = True > >>> print(X) > [[1.0 2.0] > [2.0 1.0] > [1.0 ]] > >>> E = numpy.ma.ones(X.shape, dtype=float) > >>> E.mask = X.mask > >>> E *= X.mean(axis=0) > >>> print(E.round(3)) > [[1.333 1.5] > [1.333 1.5] > [1.333 ]] > >>> XtX = numpy.ma.dot(X.T, X) > >>> XtE = numpy.ma.dot(X.T, E) > >>> EtX = numpy.ma.dot(E.T, X) > >>> EtE = numpy.ma.dot(E.T, E) > >>> print(XtX.round(3)) > [[6.0 4.0] > [4.0 5.0]] > >>> print(XtE.round(3)) > [[5.333 4.5] > [4.0 4.5]] > >>> print(EtX.round(3)) > [[5.333 4.0] > [4.5 4.5]] > >>> print(EtE.round(3)) > [[5.333 4.0] > [4.0 4.5]] > > Just for sanity, I can reproduce NumPy's masked covariance from these > terms (with an additional scaling term that depends on the masking > history [1]. > > >>> ddof = 1 > >>> cov = numpy.ma.cov(X, rowvar=False, ddof=ddof) > >>> print(cov.round(3)) > [[0.333 0.5] > [0.5 0.5]] > >>> xnotmask = numpy.logical_not(X.mask).astype(int) > >>> fact = numpy.dot(xnotmask.T, xnotmask) * 1.  ddof > >>> my_cov = (XtX  XtE  EtX + EtE) / fact > >>> print(my_cov.round(3)) > [[0.333 0.5] > [0.5 0.5]] > >>> print((my_cov  cov).max()) # doctest: +ELLIPSIS > 1.665...e16 > > The currently accumulated `_cov_mtx` is `XtX`, and I can reconstruct > `EtE` and `fact`, but I'm having trouble figuring out a good way to > get `XtE` and `EtX`. The problem with accumulating them is that you > need the means, which you obviously don't have until you get to the > “fix” stage. When you get a masked value, you'd have to reach back > into time and zero accumulated terms that need to be masked for the > eventual `XtE` and `EtX` calculations. I can spend some more time > with a pencil and my thinking cap, but I wondered if someone else > could bail me out ;). > > Thoughts? > Trevor > > [1]: https://github.com/numpy/numpy/blob/master/numpy/ma/extras.py#L1319 > >  > This email may be signed or encrypted with GnuPG (http://www.gnupg.org). > For more information, see http://en.wikipedia.org/wiki/Pretty_Good_Privacy >  > Everyone hates slow websites. So do we. > Make your web apps faster with AppDynamics > Download AppDynamics Lite for free today: > http://p.sf.net/sfu/appdyn_d2d_mar > _______________________________________________ > mdptoolkitusers mailing list > mdptoolkitusers@... > https://lists.sourceforge.net/lists/listinfo/mdptoolkitusers 
From: W. Trevor King <wking@tr...>  20130320 16:43:44
Attachments:
signature.asc

On Wed, Mar 20, 2013 at 12:36:31PM +0100, Tiziano Zito wrote: > if I understand you correctly, you are trying to implement a > CovarianceMatrix which is able to deal with masked arrays *and* that > can be learned in batches. Yes, but I'm not having much luck ;). > While I fully appreciate this effort and would *extremely* welcome a > patch in this direction, I was wondering why you don't simply write > a subclass of CovarianceMatrix that just cumulates the input while > `update` is called, and then internally calls numpy.ma.cov when > `fix` is called. > > You could then subclass FANode. I think it would be sufficient to > overwrite the `__init__` method to use your enhanced > CovarianceMatrix instead of the standard one. It seemed like a better idea to try and work up a more general approach. For example, there are a number of other nodes (git grep CovarianceMatrix mdp/nodes/) that also use CovarianceMatrix and might run into similar problems. How about a nonaccumulating CovarianceMatrix (called MaskedCovarianceMatrix?) that you can plug in instead. Ideally, the class inheritance would be something like: class FANode (CovarianceAccumulatorNode): def __init__(self, …, **kwargs): … super(FANode, self).__init__(**kwargs) … class CovarianceAccumulatorNode (Node): def __init__(self, covariance_class=CovarianceMatrix, dtype=None, ddof=None, bias=True, **kwargs): super(CovarianceAccumulatorNode, self).__init__(dtype=dtype, **kwargs) self._covariance_class = covariance_class self._cov_mtx = covariance_class(dtype=dtype, bias=True) def _train(self, x): "update the covariance matrix" self._cov_mtx.update(x) def _fix_cov_mtx(self): "request the covariance matrix and clean up" cov_mtx, mu, tlen = self._cov_mtx.fix() del self._cov_mtx return (cov_mtx, mu, tlen) This would work for FANode, PCANode, …, but I'm not sure what the most elegant handling would be for GaussianClassifier and others that don't accumulate into the covariance directly. If they also used CovarianceAccumulatorNode, they'd get some extra stuff that they don't need. Maybe: class CovarianceAccumulatorNode (CovarianceNode): def __init__(self, **kwargs): super(CovarianceAccumulatorNode, self).__init__(**kwargs) self._cov_mtx = self._new_covariance() def _train(self, x): … def _fix_cov_mtx(self): … class CovarianceNode (Node): def __init__(self, covariance_class=CovarianceMatrix, dtype=None, bias=True, **kwargs): super(CovarianceNode, self).__init__(dtype=dtype, **kwargs) self._covariance_class = covariance_class self._covariance_class_kwargs = {'dtype': dtype, 'bias': bias} def _new_covariance(self): return covariance_class(**self._covariance_class_kwargs) Then GaussianClassifier and other oddballs could subclass the simpler CovarianceNode. Maybe this is overly engineered and I should just monkeypatch over CovarianceMatrix when I have masked data (before initializing my FANode)? Thoughts? Trevor p.s. Since v1.5, NumPy prefers ddof to bias [1]. Should MDP follow along? [1] http://docs.scipy.org/doc/numpy/reference/generated/numpy.cov.html  This email may be signed or encrypted with GnuPG (http://www.gnupg.org). For more information, see http://en.wikipedia.org/wiki/Pretty_Good_Privacy 
From: Tiziano Zito <tiziano.zito@bc...>  20130320 19:59:16

> It seemed like a better idea to try and work up a more general > approach. For example, there are a number of other nodes (git grep > CovarianceMatrix mdp/nodes/) that also use CovarianceMatrix and might > run into similar problems. > > [...] > > How about a nonaccumulating CovarianceMatrix (called > MaskedCovarianceMatrix?) that you can plug in instead. Ideally, the > class inheritance would be something like: > > [...] > > This would work for FANode, PCANode, …, but I'm not sure what the most > elegant handling would be for GaussianClassifier and others that don't > accumulate into the covariance directly. If they also used > CovarianceAccumulatorNode, they'd get some extra stuff that they don't > need. Maybe: > > [...] > > Then GaussianClassifier and other oddballs could subclass the simpler > CovarianceNode. Maybe this is overly engineered and I should just > monkeypatch over CovarianceMatrix when I have masked data (before > initializing my FANode)? > > Thoughts? For this kind of "enhancements" we typically do not use inheritance. Look at the extensions mechanism [1] in MDP tutorial. That is arguably a more elegant way to solve the problem. But, I fear it is not so easy. Even if PCANode, for example, would use a CovarianceMatrix class able to deal with masked array, it would break immediately as soon as PCANode calls the LAPACK bindings, right? As far as I remember LAPACK does not support masked array, nor it supports sparse matrices, which is another thing MDP could in principle support. Maybe the real solution is to push for numpy to finally take a decision about the missingdata/maskingarray issues and implement there a transparent handling of these special arrays... What do you think? Tiziano [1] http://mdptoolkit.sourceforge.net/tutorial/extensions.html 
From: W. Trevor King <wking@tr...>  20130324 17:33:58
Attachments:
signature.asc

Sorry for the delayed response, it's been a busy week ;). On Wed, Mar 20, 2013 at 08:58:58PM +0100, Tiziano Zito wrote: > For this kind of "enhancements" we typically do not use inheritance. > Look at the extensions mechanism [1] in MDP tutorial. That is > arguably a more elegant way to solve the problem. I hadn't noticed those, thanks for the pointer :). > But, I fear it is not so easy. Even if PCANode, for example, would > use a CovarianceMatrix class able to deal with masked array, it would > break immediately as soon as PCANode calls the LAPACK bindings, > right? As far as I remember LAPACK does not support masked array, > nor it supports sparse matrices, which is another thing MDP could in > principle support. As far as I can tell, the only interaction PCANode has with the CovarianceMatrix instance is through: self._cov_mtx = CovarianceMatrix(dtype) self._cov_mtx.update(x) self.cov_mtx, avg, self.tlen = self._cov_mtx.fix() A MaskedCovarianceMatrix implementation which stored updates in memory and used: def fix(self): ddof = 0 if self.bias: ddof = 1 x = numpy.ma.concatenate(self.list_of_x_from_updates) cov = numpy.ma.cov(x, rowvar=False, ddof=ddof) avg = x.mean(axis=1) if cov.mask.any() or avg.mask.any(): raise ValueError('excessively masked training data') return (cov.data, avg.data, x.shape[0]) to extract the covariance, mean, and number of samples would return an unmasked covariance (computed from masked input). I don't see a problem with PCANode there, but I may be missing something… Thanks for helping to hash this out :) Trevor  This email may be signed or encrypted with GnuPG (http://www.gnupg.org). For more information, see http://en.wikipedia.org/wiki/Pretty_Good_Privacy 
From: Tiziano Zito <tiziano.zito@bc...>  20130325 08:27:33

> As far as I can tell, the only interaction PCANode has with the > CovarianceMatrix instance is through: > > self._cov_mtx = CovarianceMatrix(dtype) > self._cov_mtx.update(x) > self.cov_mtx, avg, self.tlen = self._cov_mtx.fix() > > A MaskedCovarianceMatrix implementation which stored updates in memory > and used: > > def fix(self): > ddof = 0 > if self.bias: > ddof = 1 > x = numpy.ma.concatenate(self.list_of_x_from_updates) > cov = numpy.ma.cov(x, rowvar=False, ddof=ddof) > avg = x.mean(axis=1) > if cov.mask.any() or avg.mask.any(): > raise ValueError('excessively masked training data') > return (cov.data, avg.data, x.shape[0]) > > to extract the covariance, mean, and number of samples would return an > unmasked covariance (computed from masked input). I don't see a > problem with PCANode there, but I may be missing something… > OK, I definitely need to explore masked arrays :) I just tried and indeed you can pass a masked array to {scipy,numpy}.linalg.eig. The conversion to sandaard arrays is done internally. So, yes, I think it can work and you were right! Would it be possible to autodetect when you need to use the masked covariance matrix instead of the standard one? In other words, would it be possible to always use the standard covariance matrix class, and then switch on the fly to the masked one as soon as an input data batch is a masked array? The Node class, from which all other nodes are derived, already has a _check_input method that could be used for that. What do you think? Tiziano 
From: W. Trevor King <wking@tr...>  20130325 11:33:44
Attachments:
signature.asc

On Mon, Mar 25, 2013 at 09:27:25AM +0100, Tiziano Zito wrote: > Would it be possible to autodetect when you need to use the masked > covariance matrix instead of the standard one? In other words, would > it be possible to always use the standard covariance matrix class, > and then switch on the fly to the masked one as soon as an input > data batch is a masked array? I don't think so, because: On Mon, Mar 18, 2013 at 05:14:21PM 0400, W. Trevor King wrote: > The problem with accumulating them is that you need the means, which > you obviously don't have until you get to the “fix” stage. When you > get a masked value, you'd have to reach back into time and zero > accumulated terms that need to be masked for the eventual `XtE` and > `EtX` calculations. Still, selecting a MaskedCovarianceMatrix when: a) you know you're not going to be crunching gigs of data, and b) you know you might have masked values doesn't seem too onerous (and works for my case ;). We just need to figure out how to add this option elegantly. Besides the FANode and PCANode already mentioned, the NormalizeNode has the same CovarianceMatrix usage. Other nodes that use CovarianceMatrix have slightly different update and analysis procedures, so I'm not sure a MaskedCovarianceMatrix would work for them. Cheers, Trevor  This email may be signed or encrypted with GnuPG (http://www.gnupg.org). For more information, see http://en.wikipedia.org/wiki/Pretty_Good_Privacy 
From: Tiziano Zito <tiziano.zito@bc...>  20130325 11:45:50

> > The problem with accumulating them is that you need the means, which > > you obviously don't have until you get to the “fix” stage. When you > > get a masked value, you'd have to reach back into time and zero > > accumulated terms that need to be masked for the eventual `XtE` and > > `EtX` calculations. > > Still, selecting a MaskedCovarianceMatrix when: > > a) you know you're not going to be crunching gigs of data, and > b) you know you might have masked values > > doesn't seem too onerous (and works for my case ;). We just need to > figure out how to add this option elegantly. Look at how the 'parallel' extension is implemented in mdp/parallel/parallelnodes.py There is a base class which implements the public methods, and then specialized derived classes for those node classes you know how to handle. Another, example is the 'caching' extension: mdp/caching/caching_extension.py Here you see how to overwrite methods. For the masked covariance, I think it should be possible to just overwrite the _train method. The goal is that at the end users just need to do: with mdp.extension('masked_arrays'): pcanode.train(x) y = pcanode.execute(x) in order to use the masked covariance version of PCA. If you have time, please give it a try. If you fork MDP on github we can follow your work and try to give a hand ;) Ciao, Tiziano 
From: W. Trevor King <wking@tr...>  20130325 11:58:46
Attachments:
signature.asc

On Mon, Mar 25, 2013 at 12:45:41PM +0100, Tiziano Zito wrote: > For the masked covariance, I think it should be possible to just > overwrite the _train method. Shouldn't I just override (amend?) __init__ to set up ._cov_mtx with a new MaskedCovarianceMatrix? Then _train() and .fix() calls wouldn't have to change. I'll need somewhere to stash the new code, and it seems like a standalone MaskedCovarianceMatrix would be easy to test. Should my new additional live in: mdptoolkit/mdp/masked/__init__.py mdptoolkit/mdp/masked/masked_extension.py as a home for the new code (mirroring the caching extension)? Should MaskedCovarianceMatrix live in mdp.utils.covariance or mdp.masked.masked_extension? > If you have time, please give it a try. If you fork MDP on github > we can follow your work and try to give a hand ;) Already cloned, but none of my earlier attempts worked out well enough to be worth publishing ;). I'll definitely submit a pull request (or send patches to the list, whichever you like best) once I get things working :). Cheers, Trevor  This email may be signed or encrypted with GnuPG (http://www.gnupg.org). For more information, see http://en.wikipedia.org/wiki/Pretty_Good_Privacy 
From: Tiziano Zito <tiziano.zito@bc...>  20130325 12:56:04

On Mon 25 Mar, 07:58, W. Trevor King wrote: > On Mon, Mar 25, 2013 at 12:45:41PM +0100, Tiziano Zito wrote: > > For the masked covariance, I think it should be possible to just > > overwrite the _train method. > > Shouldn't I just override (amend?) __init__ to set up ._cov_mtx with a > new MaskedCovarianceMatrix? Then _train() and .fix() calls wouldn't > have to change. Yes, you are right. I was still in my trip of automatic detection and monkeypatching on the fly the covmat instance... The problem of overriding __init__ is that the extension is not going to work for already existing instances: pca = PCANode() # now activate extension, i.e. monkeypatch __init__ with mdp.extension("masked"): pca.train(x) > this will only work if you move the instantiation within the with block... so, I would maybe override _check_input, which gets called before _train  see Node.train in mdp/signal_node.py. Within _check_input you can check that self._cov_mtx is a MaskedCovariance. If not, you overwrite self._cov_mtx with an instance thereof. It it is, everything is fine ;) Actually, it may even be that it is enough to check for self._train_phase_started == False within the _check_input method. This may even be better, so that you are sure to override the _cov_mtx only at the beginning of the training. If the standard _cov_mtx has been already updated at least once with normal data, you should throw a meaningful error, I suppose, suggesting to activate the extension before starting the training. Again, see Node.train in mdp/signal_node.py for more details. Remember to carry over the original dtype of the cov_mtx. > I'll need somewhere to stash the new code, and it > seems like a standalone MaskedCovarianceMatrix would be easy to test. > > Should my new additional live in: > > mdptoolkit/mdp/masked/__init__.py > mdptoolkit/mdp/masked/masked_extension.py > > as a home for the new code (mirroring the caching extension)? This is a good plan. Put all you need in mdp/masked. We use py.test for testing, so you should be able to just put the tests in mdp/test/masked_extension.py and run them with py.test mdp/test/masked_extension.py > Already cloned, but none of my earlier attempts worked out well enough > to be worth publishing ;). I'll definitely submit a pull request (or > send patches to the list, whichever you like best) once I get things > working :). PRs are easier to review and easier to integrate, so please do that. Thanks! Tiziano 
From: W. Trevor King <wking@tr...>  20130327 21:12:07
Attachments:
signature.asc

On Mon, Mar 25, 2013 at 07:58:37AM 0400, W. Trevor King wrote: > I'll definitely submit a pull request (or send patches to the list, > whichever you like best) once I get things working :). Well, it's not quite working, but here's the PR: https://github.com/mdptoolkit/mdptoolkit/pull/5 I tried to mimic the 'cache_execute' extension, but the MaskedCovarianceMatrix substitution doesn't seem to be catching ;). I'd appreciate feedback, if anyone has time to poke around. Cheers, Trevor  This email may be signed or encrypted with GnuPG (http://www.gnupg.org). For more information, see http://en.wikipedia.org/wiki/Pretty_Good_Privacy 
From: Tiziano Zito <tiziano.zito@bc...>  20130328 17:10:57

Hi Trevor, great and impressive work, thank you! Please have a look at the branch masked_support. I fixed the problems you were having and I tried to explain what happened in the commit message. If something is not clear just let me know. There was a misconception about how extensions work. It was probably not wise to point you to the 'caching' extension, as that works globally for every node derived from mdp.Node, i.e. all nodes. Your extension instead just needs to work for nodes that have a _new_covariance_matrix method. By reducing the scope of the extension, I also removed the possibility of adding masking support for some instances or classes only. I don't see how that could be useful. Either all nodes that can do it support masked array or none. Where is the point of only enhancing some classes or instances? I am not at all sure about how to test it, though. Yes, we currently test that the node uses a masked covariance matrix. But what are the expected differences in the results when the extension is active? What was not working before that now is working? We should test for that, right? If you get a chance to rewrite your commit messages so that they match the threelettersprefix standard [1], I would really appreciate it. As on feature branches history rewriting is explicitly allowed [2], you could just rebase masked_support and submit a new PR. Of course you could add to the new PR the new tests, if you come up with some good ones! Thank you again! Tiziano [1] http://mdptoolkit.sourceforge.net/development.html#gitcommitmessages [2] http://mdptoolkit.sourceforge.net/development.html#historyrewriting On Wed 27 Mar, 17:11, W. Trevor King wrote: > On Mon, Mar 25, 2013 at 07:58:37AM 0400, W. Trevor King wrote: > > I'll definitely submit a pull request (or send patches to the list, > > whichever you like best) once I get things working :). > > Well, it's not quite working, but here's the PR: > > https://github.com/mdptoolkit/mdptoolkit/pull/5 > > I tried to mimic the 'cache_execute' extension, but the > MaskedCovarianceMatrix substitution doesn't seem to be catching ;). > I'd appreciate feedback, if anyone has time to poke around. > > Cheers, > Trevor > >  > This email may be signed or encrypted with GnuPG (http://www.gnupg.org). > For more information, see http://en.wikipedia.org/wiki/Pretty_Good_Privacy >  > Own the FutureIntel® Level Up Game Demo Contest 2013 > Rise to greatness in Intel's independent game demo contest. > Compete for recognition, cash, and the chance to get your game > on Steam. $5K grand prize plus 10 genre and skill prizes. > Submit your demo by 6/6/13. http://p.sf.net/sfu/intel_levelupd2d > _______________________________________________ > mdptoolkitusers mailing list > mdptoolkitusers@... > https://lists.sourceforge.net/lists/listinfo/mdptoolkitusers 
From: W. Trevor King <wking@tr...>  20130328 17:33:38
Attachments:
signature.asc

On Thu, Mar 28, 2013 at 06:10:48PM +0100, Tiziano Zito wrote: > Please have a look at the branch masked_support. I fixed the > problems you were having and I tried to explain what happened in the > commit message. Ah, your explaination makes sense. I'll squash those changes into my initial work and touch up the test suite. > By reducing the scope of the extension, I also removed the > possibility of adding masking support for some instances or classes > only. I don't see how that could be useful. Either all nodes that > can do it support masked array or none. Where is the point of only > enhancing some classes or instances? Maybe if you have a node graph where only a few entry nodes need to handle masked data? I'm happy with a single global switch if you are though ;). > What was not working before that now is working? We should test > for that, right? Yeah, I'll add this. I hadn't before because the extension wasn't working at all ;). > If you get a chance to rewrite your commit messages so that they > match the threelettersprefix standard [1], I would really > appreciate it. I'm not clear about the destinction between `ERF`'s enhancement and `NEW`. Should I use something like: NEW: Add a 'masked' extension to handle masked input NEW: Add MaskedCovarianceMatrix class ERF: Add ._new_covariance_matrix() for easy extension > As on feature branches history rewriting is explicitly allowed [2], > you could just rebase masked_support and submit a new PR. Of course > you could add to the new PR the new tests, if you come up with some > good ones! Once I rebase things, I'll forcepush back onto my original PR. Is there a formal way to credit you for a mutual commit? Thanks, Trevor  This email may be signed or encrypted with GnuPG (http://www.gnupg.org). For more information, see http://en.wikipedia.org/wiki/Pretty_Good_Privacy 
From: Tiziano Zito <tiziano.zito@bc...>  20130328 17:38:54

> > If you get a chance to rewrite your commit messages so that they > > match the threelettersprefix standard [1], I would really > > appreciate it. > > I'm not clear about the destinction between `ERF`'s enhancement and > `NEW`. Should I use something like: > > NEW: Add a 'masked' extension to handle masked input > NEW: Add MaskedCovarianceMatrix class > ERF: Add ._new_covariance_matrix() for easy extension That sounds good, thanks. > > As on feature branches history rewriting is explicitly allowed [2], > > you could just rebase masked_support and submit a new PR. Of course > > you could add to the new PR the new tests, if you come up with some > > good ones! > > Once I rebase things, I'll forcepush back onto my original PR. Is > there a formal way to credit you for a mutual commit? Well if you just pull from masked_support on top of the master of your clone and then reword those three commits of yours, I assume git is going to preserve the authorship of my commits, no? This would be enough credit ;) Ciao, Tiziano 
From: W. Trevor King <wking@tr...>  20130328 17:49:26
Attachments:
signature.asc

On Thu, Mar 28, 2013 at 06:38:46PM +0100, Tiziano Zito wrote: > > > As on feature branches history rewriting is explicitly allowed [2], > > > you could just rebase masked_support and submit a new PR. Of course > > > you could add to the new PR the new tests, if you come up with some > > > good ones! > > > > Once I rebase things, I'll forcepush back onto my original PR. Is > > there a formal way to credit you for a mutual commit? > > Well if you just pull from masked_support on top of the master of your clone > and then reword those three commits of yours, I assume git is going to > preserve the authorship of my commits, no? This would be enough > credit ;) Ah, no squashing then. Works for me :). Cheers, Trevor  This email may be signed or encrypted with GnuPG (http://www.gnupg.org). For more information, see http://en.wikipedia.org/wiki/Pretty_Good_Privacy 
From: Pietro Berkes <berkes@ga...>  20130328 20:29:55
Attachments:
Message as HTML

Hi Trevor, thanks for your contribution! I'm +1 for merging this when the tests are in place. It would also be great to have an example in the Tutorial (after the section "Caching execution results"?). Best, Pietro On Thu, Mar 28, 2013 at 5:49 PM, W. Trevor King <wking@...> wrote: > On Thu, Mar 28, 2013 at 06:38:46PM +0100, Tiziano Zito wrote: > > > > As on feature branches history rewriting is explicitly allowed [2], > > > > you could just rebase masked_support and submit a new PR. Of course > > > > you could add to the new PR the new tests, if you come up with some > > > > good ones! > > > > > > Once I rebase things, I'll forcepush back onto my original PR. Is > > > there a formal way to credit you for a mutual commit? > > > > Well if you just pull from masked_support on top of the master of your > clone > > and then reword those three commits of yours, I assume git is going to > > preserve the authorship of my commits, no? This would be enough > > credit ;) > > Ah, no squashing then. Works for me :). > > Cheers, > Trevor > >  > This email may be signed or encrypted with GnuPG (http://www.gnupg.org). > For more information, see http://en.wikipedia.org/wiki/Pretty_Good_Privacy > > >  > Own the FutureIntel® Level Up Game Demo Contest 2013 > Rise to greatness in Intel's independent game demo contest. > Compete for recognition, cash, and the chance to get your game > on Steam. $5K grand prize plus 10 genre and skill prizes. > Submit your demo by 6/6/13. http://p.sf.net/sfu/intel_levelupd2d > _______________________________________________ > mdptoolkitusers mailing list > mdptoolkitusers@... > https://lists.sourceforge.net/lists/listinfo/mdptoolkitusers > > 
Sign up for the SourceForge newsletter:
No, thanks