I had a bit of an epiphany while going over the whitepaper, and realized that the segments could be considered somewhat equivalent to weights in a standard neural network. If you turned a neural network sideways, you'd have something analagous to an HTM column. If you were using a standard 3 layer feedforward network, the number of hidden neurons would equate to the number of cells per column. I would find it useful to incorporate the ideas of HTM in a traditional neural network system, since neural networks are much better understood, more generally efficient, and easier to implement.
I've verified for myself that for simple non-forking contexts, neural networks can be used to implement a simple memory prediction framework. I use a simple autoencoder that reduces the dimensionality of the original input (similar to sparsification) and then I use a time-series predictor network that accepts the encoded output of T-1 to predict T. This works very well, in fact, and can be used on inputs between 0 and 1, which imho is desirable over HTM's binary input matrix. For contexts requiring relationships beyond the scope of T-1 to T, this system doesn't work.
With regular neurons that perform summation of weighted inputs from previous layers, it's hard to incorporate temporal contexts, even for normal problems. You could of course associate T-2 to T, and T-3 to T, and so forth, and create a probabilistic model that utilizes all of the previous predictions to create a better prediction, but as soon as your input falls outside the scope of explicit associations, you immediately lose semantic continuity. Plus, it's just not good to have to create a new network for each and every time and space context you're wanting to learn.
So that led me to search for a solution, and I think it may lie in Long Short Term Memory. LSTM neurons are a relatively new model of artificial neuron that utilize logic gates that modulate a memory cell, input, output, and retention. This model of neuron retains memory for arbitrarily long periods of time, and is very capable of handling noisy environments as well as arbitrary cyclic signals. There are some very good papers referenced from Jurgen Schmidhuber's site.
I'm polishing up my implementation of LSTM with the idea that an autoencoder LSTM network is going to be able to encode temporal contexts alongside the spatial context.
I'm looking for a learning algorithm, and I think genetic selection with a drift minimization mechanism would be ideal. Backprop wouldn't necessarily work, since I don't want to retain arbitrary prior sequences as a training set. My idea for genetic algorithm would find many candidate solutions for the input/output pair and select the one with the most success over the least difference from the current weight matrix.
This deviates from HTM's idea of segments and instead encodes memory as weights to and from the hidden layer of cells. What can be done with LSTM networks has to do with the way matrices can be correlated. By training the networks to learn Eigenfunctions, you can extract useful information from arbitrary configurations of the input space. What this means is that you can train a network to predict what should be in its neighboring spaces, and use the output of that network to populate a semantic space. An Eigenfunction is a function that transforms one matrix to another. By training a network to learn an Eigenfunction, it's being trained to transform inputs to outputs in a particular way. In a 2d grid of cells, networks could learn the Eigenfunctions that transform the input into each of the 9 adjacent cells. Obviously a larger set of cells will have more powers to generalize, and in this scenario, you'd have at least 10 networks whose output associates the center input with each of its neighbors. Create a hierarchy with this sort of system, and you have something similar to HTM columns and regions.
The semantic space would be larger than the input space, similar to the way the number of segments and cells per column define the capacity of an HTM region. Sparse encoding networks would be useful to reduce the dimensionality of the input space, but wouldn't be necessary, because there are ways of combining all the Eigenfunction transforms into a summed Eigenfunction that produces a new input vector to pass to the next level. The performance limitations become attached to the cost of running a single component network vs the exponential cost of running them all synchronously.
LSTM networks arranged in this sort of system could correlate sequences and patterns. Sequences could be spatial, as in corresponding grid samples from a larger input space, or they could be temporal, utilizing the memory cell of an LSTM neuron. They can also be any mixture of spatial or temporal - it depends on the percept sequence experienced by the networks.
Has anyone else tried something like this? Is there a pitfall to this approach that I'm not seeing that would make this more complex than HTM/CLA? I'm partial to this approach because it uses components that are well understood and much more easy to implement.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I had a bit of an epiphany while going over the whitepaper, and realized that the segments could be considered somewhat equivalent to weights in a standard neural network. If you turned a neural network sideways, you'd have something analagous to an HTM column. If you were using a standard 3 layer feedforward network, the number of hidden neurons would equate to the number of cells per column. I would find it useful to incorporate the ideas of HTM in a traditional neural network system, since neural networks are much better understood, more generally efficient, and easier to implement.
I've verified for myself that for simple non-forking contexts, neural networks can be used to implement a simple memory prediction framework. I use a simple autoencoder that reduces the dimensionality of the original input (similar to sparsification) and then I use a time-series predictor network that accepts the encoded output of T-1 to predict T. This works very well, in fact, and can be used on inputs between 0 and 1, which imho is desirable over HTM's binary input matrix. For contexts requiring relationships beyond the scope of T-1 to T, this system doesn't work.
With regular neurons that perform summation of weighted inputs from previous layers, it's hard to incorporate temporal contexts, even for normal problems. You could of course associate T-2 to T, and T-3 to T, and so forth, and create a probabilistic model that utilizes all of the previous predictions to create a better prediction, but as soon as your input falls outside the scope of explicit associations, you immediately lose semantic continuity. Plus, it's just not good to have to create a new network for each and every time and space context you're wanting to learn.
So that led me to search for a solution, and I think it may lie in Long Short Term Memory. LSTM neurons are a relatively new model of artificial neuron that utilize logic gates that modulate a memory cell, input, output, and retention. This model of neuron retains memory for arbitrarily long periods of time, and is very capable of handling noisy environments as well as arbitrary cyclic signals. There are some very good papers referenced from Jurgen Schmidhuber's site.
I'm polishing up my implementation of LSTM with the idea that an autoencoder LSTM network is going to be able to encode temporal contexts alongside the spatial context.
I'm looking for a learning algorithm, and I think genetic selection with a drift minimization mechanism would be ideal. Backprop wouldn't necessarily work, since I don't want to retain arbitrary prior sequences as a training set. My idea for genetic algorithm would find many candidate solutions for the input/output pair and select the one with the most success over the least difference from the current weight matrix.
This deviates from HTM's idea of segments and instead encodes memory as weights to and from the hidden layer of cells. What can be done with LSTM networks has to do with the way matrices can be correlated. By training the networks to learn Eigenfunctions, you can extract useful information from arbitrary configurations of the input space. What this means is that you can train a network to predict what should be in its neighboring spaces, and use the output of that network to populate a semantic space. An Eigenfunction is a function that transforms one matrix to another. By training a network to learn an Eigenfunction, it's being trained to transform inputs to outputs in a particular way. In a 2d grid of cells, networks could learn the Eigenfunctions that transform the input into each of the 9 adjacent cells. Obviously a larger set of cells will have more powers to generalize, and in this scenario, you'd have at least 10 networks whose output associates the center input with each of its neighbors. Create a hierarchy with this sort of system, and you have something similar to HTM columns and regions.
The semantic space would be larger than the input space, similar to the way the number of segments and cells per column define the capacity of an HTM region. Sparse encoding networks would be useful to reduce the dimensionality of the input space, but wouldn't be necessary, because there are ways of combining all the Eigenfunction transforms into a summed Eigenfunction that produces a new input vector to pass to the next level. The performance limitations become attached to the cost of running a single component network vs the exponential cost of running them all synchronously.
LSTM networks arranged in this sort of system could correlate sequences and patterns. Sequences could be spatial, as in corresponding grid samples from a larger input space, or they could be temporal, utilizing the memory cell of an LSTM neuron. They can also be any mixture of spatial or temporal - it depends on the percept sequence experienced by the networks.
Has anyone else tried something like this? Is there a pitfall to this approach that I'm not seeing that would make this more complex than HTM/CLA? I'm partial to this approach because it uses components that are well understood and much more easy to implement.