Hi Stefan,

thanks for the code, this makes sense.

Sorry for the late reply, I have been traveling. Once I have some free time I'll send you some more comments.


On Friday, October 18, 2013, Stefan Richthofer wrote:
Hi Niko,

I designed a new kind of layer for my purpose. I let the decision to you, whether it is worth
a pull request. It is very simple. I use a specialized "merger"-node that features an abstract
merge-method. This method is kind of a train-method, but with other nodes as input rather than
data (the merger would usually be not even trainable).
The merger must know the given nodes' data structure and can directly retrieve the relevant
information from it.
The point is to use different methods for time-sequent chunk-combining and hrizontal combining.
At first, the ordinary nodes in the layer do the time-sequent combining , each node scoped on its
area. The chunks are NOT stored, but processed int covariance and autocorrelation matrices.
Then the merger and its merge method are responsible for horizontal combining. Since it gets the
other nodes as input, it can directly combine the cov- and autocorr- matrices additively.

The MergeLayer is initialized with ordinary nodes, which are used to initialize the ordinary layer-parent.
Additionally it gets a merger.
On stop training, it proceeds as follows:
- depending on a flag, it calls the ordinary nodes' stop-training methods (i.e. its parent's stop-training method)
- it calls the merger's merge method with all nodes in the layer
- it calls stop_merging, so the merger can do some postprocessing
- it transforms itself into kind of a clone-layer by replacing all nodes with the merger (which is a subclass of node)

See the source code below or in the attached file.



Created on Oct 16, 2013

@author: Stefan Richthofer

from mdp.hinet.layer import Layer
from mdp import Node

class Merger(Node):
        def is_trainable(self):
                """Per default, a merger is not trainable, since it would not be trained except,
                it appears multiply in the network.
                return False

        ### Methods to be implemented by the user

        # this are the methods the user has to overwrite
        # they receive the data already casted to the correct type

        def _merge(self, node):

        def _stop_merging(self, *args, **kwargs):

        ### User interface to the overwritten methods

        def merge(self, node, *args, **kwargs):
                """Update the internal structures according to the input node `node`.

                `node` is an mdp-node that has been trained as part of a layer. The
                merger-subclass should exactly know about the node-type and the node's
                internal structure to retrieve the relevant data from it.

                By default, subclasses should overwrite `_merge` to implement their
                merging phase. The docstring of the `_train` method overwrites this


        def stop_merging(self, *args, **kwargs):
                """Stop the merging phase.

                By default, subclasses should overwrite `_stop_merging` to implement
                this functionality. The docstring of the `_stop_merging` method
                overwrites this docstring.
                self._stop_merging(self, *args, **kwargs)

class MergeLayer(Layer):
        """Layer with an additional merging-phase that merges all nodes in the layer
        after training. Merging is done by a given merger, which is itself a node.
        After merging, the merger will be used for execution in a CloneLayer-like

        The idea behind MergeLayer is a hybrid of ordinary layer and CloneLayer.
        The goal in this design is to use separate nodes in the train-phase,
        while using only a single node for execution. The difference to CloneLayer
        is that in MergeLayer, a different algorithm can be used for combining
        horizontally parallel data chunks than for combining time-sequent data
        chunks. The latter ones are combined by the nodes in the usual train-phase.
        In Contrast to CloneLayer, MergeLayer allows control, how the horizontal merging
        of the data works. While CloneLayer would push this data into the very same
        train method like the time-sequent chunks, MergeLayer uses a merger to combine
        horizontal data.

        def __init__(self, merger, call_stop_training = False, nodes, dtype=None):
                """Setup the layer with the given list of nodes.

                Keyword arguments:
                merger -- Merger to be used.
                nodes -- List of the nodes to be used.
                super(MergeLayer, self).__init__(nodes, dtype=dtype)
                self.merger = merger
                self.call_stop_training = call_stop_training

        def _stop_training(self, *args, **kwargs):
                """Stop training of the internal node."""
                if self.call_stop_training:
                        super(MergeLayer, self).stop_training()
                for node in self.nodes:
                self.trained_nodes = self.nodes
                self.nodes = (self.merger,) * len(self.trained_nodes)
                if self.output_dim is None:
                        self.output_dim = self._get_output_dim_from_nodes()

> Gesendet: Mittwoch, 16. Oktober 2013 um 21:42 Uhr
> Von: "Niko Wilbert" <mail@nikowilbert.de>
> An: "MDP users mailing list" <mdp-toolkit-users@lists.sourceforge.net>
> Betreff: Re: [mdp-toolkit-users] Fw: Clonelayer versus training phase in time-sequent chunks
> Hi Stefan,
> you are right that a CloneLayer can't do what you want in this case. I
> don't see how to circumvent this in a general way. If you want to
> reorder training data from different chunks then you basically have to
> store all the chunks. This would only make sense if the preprocessing
> before that node has reduced the dimensionality by a lot (which is
> certainly true for many use cases, but even then it is often too much
> data to keep in memory). If you want to follow this route then it
> should be possible to write a special CloneLayer class that organizes
> the data chunk storage and final training of the node.
> Note that processing chunks like a single data set can be a little
> complicated, even without using a CloneLayer. The SFA node does
> provide this capability (see the include_last_sample argument).
> If you want to check that splitting the data into reasonably large
> chunks does not have a significant effect on the results, then of
> course the easiest way would be to use a test scenario where you can
> train with a single chunk. Another option would be to not use "weight
> sharing" and instead use individual node instance in a normal layer.
> If you increase the number of data chunks accordingly (assuming that
> they are sufficiently homogenous) then this should be a good control
> experiement.
> If you implement a nice solution for this problem then you are of
> course very welcome to make a pull request on Github (and of course
> you can always ask for help on the mailing list).
> Cheers,
> Niko
> On Wed, Oct 16, 2013 at 2:36 PM, Stefan Richthofer
> <Stefan.Richthofer@gmx.de> wrote:
> > After looking into the source, I am convinced, there is no way to do what I want using CloneLayer.
> > However I think the solution is to use an ordinary layer with several wrapper nodes that track their indices
> > and have a backend-reference to the node I would have used for clone layer. It would be kind of a
> > hand-made clone layer.
> >
> > Cheers
> >
> > Stefan
> >
> >
> >> Gesendet: Mittwoch, 16. Oktober 2013 um 13:30 Uhr
> >> Von: "Stefan Richthofer" <Stefan.Richthofer@gmx.de>
> >> An: mdp-toolkit-users@lists.sourceforge.net
> >> Betreff: [mdp-toolkit-users] Clonelayer versus training phase in time-sequent chunks
> >>
> >> Hello MDP community,
> >>
> >> I am currently implementing Predictable Feature Analysis for MDP, a time-based analysis node that heavily depends on autocorrelation matrices (i.e. with lack 1, 2,... p).
> >> I wonder how to get this right with clone layer functionality. I plan to use it like SFA in a hierarchical setup with Rectangel2DSwitchboard.
> >> In clone layer mode, how is the data from the several fields presented to the node? By several calls to the train method?
> >> In that case, how can I distinguish, to which time and field a given chunk belongs? Since I will deal with long training phases, I will already use multiple train-calls to process the long training phase in chunks.
> >> The point is, in time-sequent chunks on the same field I must calculate inter-chunk autocorrelation matrices. Otherwise I would loose at least p datapoints per chunk. Maybe this would not matter significantly, but I need to control this behavior to find that out.
> >>
> >> What is the best way to determine which given chunks are subsequent (in the same field) and which are (time-)independent?
> >> Can I do something with get_current_train_phase from Node or with _get_train_seq from CloneLayer?
> >> Or do I have to subclass CloneLayer and related classes to build an architecture that tells the node about the field index?
> >>
> >> ------------------------------------------------------------------------------
> >> October Webinars: Code for Performance
> >> Free Intel webinars can help you accelerate application performance.
> >> Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from
> >> the latest Intel processors and coprocessors. See abstracts and register >
> >>