(Hi guys, this is interesting discussion that was started in the old Numenta forum. In it, Barry and others explained clearly some things that the aren't much clear in the CLA Numenta paper)
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
someuser88 wrote:
I've read most of the 'cortical algorithms learning and concepts' paper, and although somewhat permeable there were still a few strings of text that felt a little esoteric. Anyway, of primary concern is how to implement a HTM constrained to PHP/Javascript (unfortunately, they're the only two languages I know at present).
So, how would I develop a HTM (conceptually) to form an invariant for basic images like those featured in academic papers describing the efficacy of HTMs - ?
Oh, and areas of the paper which eluded me are as followed:
- Distal dendrite segments: does this mean that they would also connect to nodes on other rows too, or only other columns of the same row?
- Prediction mode enabled by lateral input of distal dendrite segments exceeding soma threshold for node activation: why not from feed-forward input?
- "Boosting" columns which are below a certain threshold: is this to ensure that all columns are used? I'm not entirely sure of why neighboring columns inhibit the activation of weaker columns yet if they fall below a threshold of some kind, they're "boosted" to compete with a winning set of columns?
Apologies if this is somewhat "noobish", but the paper didn't seem to be clear with what it was describing. Thanks!
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
someuser88 wrote"Boosting" columns which are below a certain threshold: is this to ensure that all columns are used? I'm not entirely sure of why neighboring columns inhibit the activation of weaker columns yet if they fall below a threshold of some kind, they're "boosted" to compete with a winning set of columns?
I am no expert but here is an attempted answer to the question above:
The same region receives a stream of input. At input A Column 7 might inhibit column 8 - but at input B, Column 8 might inhibit column 7. This helps create the sparse distributed representation. Yet if a column is never active for any data input - it isn't really much use. So it has to find an input dataset for which it can become active, thus the boosting process.
Hope that answers this part of the question.
Last edit: David Ragazzi 2013-02-18
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
someuser88 wrote: - Prediction mode enabled by lateral input of distal dendrite segments exceeding soma threshold for node activation: why not from feed-forward input?
Once again I am no expert but here is an attempt to answer part of your question.
Each column has a number of cells. Once a sparse distributed pattern has arrived (ie columns are active that represent the pattern) This triggers the lateral input for the next predicted pattern (lateral input) - this is compared with the next pattern received (feed forward) and then the predicted column that receives the data is active at a cell which marks the context of when the pattern arrived. So the column says - I am active from feed forward and I am also active from lateral input triggered at a certain time. Two inputs - feed forward and lateral give rise to this information. This is why there is more than one cell in a column.
If the column was not triggered to be active the entire column becomes positive
(read page 22 and 23 again.) The HTM learning algorithms paper is extremely well written - meaning it has a huge amount of information in a small amount of text.
Hope this answers that part of the question
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
someuser88 wrote: - Distal dendrite segments: does this mean that they would also connect to nodes on other rows too, or only other columns of the same row?
The answer to this question should now be apparent from the answer to the last.
A region with columns of 7 rows can store context data for seven time periods.
Here is an example
Consider a region of 100 columns of 7 rows.
Any given input pattern results in 10 columns to win.
Lets name the patterns A through G
For arguments sake say two patterns are learnt over 7 time periods.
at times 1,2,3,4,5,6,7
ABCDEFG
ACEGBDF
Once pattern A arrives at time 1 - lateral input is sent for
B and C at time 2
C and E at time 3
D and G at time 4
E and B at time 5
and so on.
So a given pattern of sparse distributeds columns should be connected to nodes on all the different rows - and synapses adjusted in permanence as the system learns
Hope this helps to answer the question.
Last edit: David Ragazzi 2013-02-18
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
someuser88 wrote: Anyway, of primary concern is how to implement a HTM constrained to PHP/Javascript (unfortunately, they're the only two languages I know at present).
So, how would I develop a HTM (conceptually) to form an invariant for basic images like those featured in academic papers describing the efficacy of HTMs - ?
I agree php and javascript are really interesting languages to develop HTM.
well to run these systems massively in parallel (Where I believe their true power will become apparent) a system of servers is going to be needed.
What better language for a server to run then with php with MYSQL databases for columns and cells.
I am waiting for the Beta release of the HTM software but it would be fun to develop a model in php and javascript.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
someuser88"Boosting" columns which are below a certain threshold: is this to ensure that all columns are used? I'm not entirely sure of why neighboring columns inhibit the activation of weaker columns yet if they fall below a threshold of some kind, they're "boosted" to compete with a winning set of columns?
I am no expert but here is an attempted answer to the question above:
The same region receives a stream of input. At input A Column 7 might inhibit column 8 - but at input B, Column 8 might inhibit column 7. This helps create the sparse distributed representation. Yet if a column is never active for any data input - it isn't really much use. So it has to find an input dataset for which it can become active, thus the boosting process.
Hope that answers this part of the question.
So, ultimately this is to ensure that all cells represent some unique sub-set of input?
jleyden wrote:Once again I am no expert but here is an attempt to answer part of your question.
Each column has a number of cells. Once a sparse distributed pattern has arrived (ie columns are active that represent the pattern) This triggers the lateral input for the next predicted pattern (lateral input) - this is compared with the next pattern received (feed forward) and then the predicted column that receives the data is active at a cell which marks the context of when the pattern arrived. So the column says - I am active from feed forward and I am also active from lateral input triggered at a certain time. Two inputs - feed forward and lateral give rise to this information. This is why there is more than one cell in a column.
If the column was not triggered to be active the entire column becomes positive
(read page 22 and 23 again.) The HTM learning algorithms paper is extremely well written - meaning it has a huge amount of information in a small amount of text.
Hope this answers that part of the question
Does "lateral" refer to inputs from other columns or from other cells of the same column, or both? Does "feed-forward" refer to input from a preceding cell/active sensory stream? And how does the cell predict the next pattern? Is it merely by just forming a representation of its input, or does it use some means of probability? Or is it just by lateral and feed-forward input (I think that's what you said)?
Apologies once again if these questions are a bit "noobish". From what I understand, the spatial pooler initializes the HTM for a "winning" set of columns whilst the temporal pooler finds a set of predictive and learning cells for each column, actives an entire column if no cells were found in a predictive state and leaves the learning cells in their previous state and also applies the synaptic learning formula for adjusting the system to new inputs. I still don't know why feed-forward input determines of whether a cell goes into a predictive state and I don't know what the predictive state does. How does the cell predict? Does it anticipate the next sequence by the most positive representation from its temporal index (e.g. if ABCDE for 5 time periods, but not F for the 6th, and if C appeared most of the time, then predict C - repeat and continuously adjust to find the best possible value to predict)?
I sincerely appreciate your responses anyway!
Last edit: David Ragazzi 2013-02-18
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
someuser88 wrote:Does "lateral" refer to inputs from other columns or from other cells of the same column, or both?
Lateral means cells from other columns. Cells do not make connections to other cells within their own column. I suppose they could be allowed to and probably still be consistent however for simplicity the algorithm does not do this.
someuser88 wrote:Does "feed-forward" refer to input from a preceding cell/active sensory stream?
"feed-forward" means the input to a region. Feed-forward is what determines which columns are to be active. The only interaction with temporal data is that when a column is chosen if it contains predicted cells then only those are active (instead of all column cells).
someuser88 wrote:And how does the cell predict the next pattern? Is it merely by just forming a representation of its input, or does it use some means of probability? Or is it just by lateral and feed-forward input (I think that's what you said)?
Cells have connections (segments with synapses to other cells) that were created using prior data sequences. On a given time step, let's say some cells are active. Those that are active want to create connections (or strengthen existing ones) to cells that were active last time step. We know which cells those were because they were marked as 'learning cells' last time. So the current active cells make connections to those previous cells. Thus next time those 'previous' cells happen to become active again they can use their segments to predict the next set of cells likely to follow.
someuser88 wrote:From what I understand, the spatial pooler initializes the HTM for a "winning" set of columns whilst the temporal pooler finds a set of predictive and learning cells for each column, actives an entire column if no cells were found in a predictive state and leaves the learning cells in their previous state and also applies the synaptic learning formula for adjusting the system to new inputs.
Yes that sounds correct. The spatial pooler primarily is determining which columns become active. The temporal pooler is determing predicted/learning cells and uses this information to update the cell connections so our predictions hopefully get more accurate/extensive over time.
someuser88 wrote:I still don't know why feed-forward input determines of whether a cell goes into a predictive state and I don't know what the predictive state does. How does the cell predict? Does it anticipate the next sequence by the most positive representation from its temporal index (e.g. if ABCDE for 5 time periods, but not F for the 6th, and if C appeared most of the time, then predict C - repeat and continuously adjust to find the best possible value to predict)?
Again feed-forward only really determines which columns are active (and therefore the cells within those columns). For a cell to go into the predicted state means several other cells across some of the active columns all have connections to particular cell. Meaning, several of the currently active cells happened previously and they formed connections to the cells that came next. Now that they are active once again they are 'predicting' those that came next last time.
For ABCDE then, the set of cells representing 'A' will have connections to the set of cells representing 'B'. So when 'A' cells are active, the 'B' cells will be put into a predictive state.
It can become more complex than that over time. 'A' can predict 'B' is coming immediately next, but also may know that cells representing CDE are also coming 'eventually'. This is because cells can form connections several time steps into the past. They are not limited to the immediate past.
Also it doesn't necessarily matter if for example 'C' "appeared most of the time" in general. What matters is if 'C' appeared most of the time 'after B' instead of after something else. It matters what 'sequence' happens most of the time, since sequences between things are what we are primarily learning.
I know this is a lot to take in. But you sound like you're doing quite well for claiming to be 'noobish' =)
Barry
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
binarybarry wrote:Cells have connections (segments with synapses to other cells) that were created using prior data sequences. On a given time step, let's say some cells are active. Those that are active want to create connections (or strengthen existing ones) to cells that were active last time step. We know which cells those were because they were marked as 'learning cells' last time. So the current active cells make connections to those previous cells. Thus next time those 'previous' cells happen to become active again they can use their segments to predict the next set of cells likely to follow.
Apologies for the delayed response, but a few things still remain shrouded (unfortunately - this has taken a while to learn, and I wonder if other frameworks would be comparable in length to learn (since I'm not particularly adept with Math, I can't read conventional papers on baysian nets, neural nets, etc)).
So, for the temporal pooler, are cells in a 'learning' state those which are active from feed-forward input/have found false predictions? And is prediction merely a cell looking to see what cells activated it in t-n such that it'll eventually realize the definitive pattern of synapses - a segment - that activated the state, and thereby construct a set of invariants for input sets (e.g. such that the letter 'a' manifests in many ways)? Lastly, I'm unsure of where 'getBestMatchingCell()' fits in with all this - does that mean that the function finds which cell functioned indefinitely whenever the sensory sequence arose? (e.g. the cell which always fired when a sensory sequence representing the letter 'a' occurred)
Thanks! I sincerely appreciate your input!
Last edit: David Ragazzi 2013-02-18
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
someuser88 wrote:So, for the temporal pooler, are cells in a 'learning' state those which are active from feed-forward input/have found false predictions?
Learning cells are designated from amongst the set of active cells (which become active from feed-forward). If a column has multiple active cells, only one will be chosen as a learning cell to be updated. If there are multiple the learning cell is picked based on which cell has the most currently active synpase connections. The more active synapses a cell has the more a cell is already part of a sequence or close to it and thus is given priority to strengthen these existing connections.
someuser88 wrote:I'm unsure of where 'getBestMatchingCell()' fits in with all this - does that mean that the function finds which cell functioned indefinitely whenever the sensory sequence arose? (e.g. the cell which always fired when a sensory sequence representing the letter 'a' occurred)
getBestMatchingCell() does what I just described. If a column has multiple active cells it needs to select just one to be a learning cell for the time step. The getBestMatchingCell() function does this by finding the 'best' cell to designate as learning (find the cell that is most closely already part of a sequence to make that sequence even stronger).
someuser88 wrote:And is prediction merely a cell looking to see what cells activated it in t-n such that it'll eventually realize the definitive pattern of synapses - a segment - that activated the state, and thereby construct a set of invariants for input sets (e.g. such that the letter 'a' manifests in many ways)?
Prediction at the cell level is a bit simpler than that. An individual cell becomes predicted if one of its segments has enough active synapses at any given time step. Keep in mind an individual cell doesn't really mean much by itself. A pattern (or a single time step within a pattern) is defined by a collection of cells that all occur together (this cell collection is the 'sparse representation' the documenation talks about).
So you might reword your description by saying a prediction is a collection of many particular cells that are activated by a different collection of particular cells from t-n that can ultimately represent some larger pattern. The invariant property is contributed to by the fact that a collection of cells is a sparse representation, meaning similar inputs should result in a very similar collection of cells (leading to similar sequences).
Barry
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
binaryBarry wrote:Prediction at the cell level is a bit simpler than that. An individual cell becomes predicted if one of its segments has enough active synapses at any given time step. Keep in mind an individual cell doesn't really mean much by itself. A pattern (or a single time step within a pattern) is defined by a collection of cells that all occur together (this cell collection is the 'sparse representation' the documenation talks about).
So you might reword your description by saying a prediction is a collection of many particular cells that are activated by a different collection of particular cells from t-n that can ultimately represent some larger pattern. The invariant property is contributed to by the fact that a collection of cells is a sparse representation, meaning similar inputs should result in a very similar collection of cells (leading to similar sequences).
Barry
Ah, that's what I mean't - apologies. Perhaps the most intrinsic component of this model: where do synapses arise from? I.e. how is the model initiated? Does one randomize inputs and synaptic connections amid which columns, which then configure themselves automatically over time to construct invariants of an input stream? The pseudo-code to me doesn't appear to describe how synapses are implemented, and proceeds the assumption that they're already configured (with all cellular connections appropriated). Another problem is how are thresholds configured? This problem is the same as the synaptic trouble - do we just randomize thresholds, or are they automatically configured by the model.
I sincerely appreciate your responses!
Last edit: David Ragazzi 2013-02-18
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
someuser88 wrote:where do synapses arise from? I.e. how is the model initiated? Does one randomize inputs and synaptic connections amid which columns, which then configure themselves automatically over time to construct invariants of an input stream? The pseudo-code to me doesn't appear to describe how synapses are implemented, and proceeds the assumption that they're already configured (with all cellular connections appropriated). Another problem is how are thresholds configured? This problem is the same as the synaptic trouble - do we just randomize thresholds, or are they automatically configured by the model.
I sincerely appreciate your responses!
First let me remind you there is a distinction between spatial pooling synapses ('proximal') and temporal pooling synapses ('distal'). Both have different rules for initialization and connection learning.
For spatial pooling you are right there is no pseudo-code in the doc for initializing the connections. There is however a paragraph on page 34 that describes one way the initalization can work for the spatial pooler. I won't repost the text here but to summarize each column is initially connected to a random subset of the input bits. The permanence values of these synapses are then randomized with a guassian bias centered on both the connection threshold and how close the input bit is to the column (if we assume column locations map roughly to input space). This results in default synapses that are near the connected/disconnected threshold and are generally stronger the closer they are to the column.
When the spatial pooler is learning, the synapse connections will then change over time with some getting stronger and some weaker becoming hopefully more stable as they adjust to the more common types of inputs that come in.
For the temporal pooler on the other hand we do not have any initial synapse connections. Instead we create new segments and new synapses based on sequences as they occur. As synapses are added the temporal pooler tries to reuse existing segments if similar patterns occur over time. This again should hopefully lead to stability once enough segments/synapses have been added.
Last edit: David Ragazzi 2013-02-18
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
someuser88 wrote:
Apologies for the delayed response, a slight distraction had assumed interest and had ultimately created negligence of this topic - HTM.
Are learning cells essential in order for a HTM to predict invariants? Can I rely solely on active/inactive/predicting cells to find invariants of input vectors?
I still don't understand why we need learning cells. From what I've read,for each column, learning cells are chosen as the most active cell of that column, but only if there are no other learning cells chosen -- so, at least once cell must be set to a learning cell for each column, for each t. Subsequently, if a cell receives lateral input from other learning cells, then that cell too is chosen as a learning cell. What I don't understand is what the learning cell is actually doing -- i.e. it merely appears to be a flag and nothing more other than to infect other cells with the flag (i.e. what's the difference between the learning and predicting cell?).
Edit: okay, correct me if I'm wrong. Are learning cells chosen as the cells which best match the current state of input, as the definitives; such that if no cells exist in a predictive state (i.e. such that prediction is a means of describing states of input) then the set of learning cells are subsequently selected to best represent the state of input - ?
Thanks for all the input thus far, you've been a great help with clarifying the inconsistencies of my understanding of this framework.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
someuser88 wrote:
Are learning cells essential in order for a HTM to predict invariants? Can I rely solely on active/inactive/predicting cells to find invariants of input vectors?
"Learning" cell is only a temporary designation that is decided each time step. It simply means only the cells marked as learning will have their synapse connections updated/changed. For the quesiton are they essential to predict, well no they are not needed for inference (detecting a temporal sequence). However to not allow learning means the HTM network never changes, so the invariants found will be based on how the cells are initially connected which is likely to be less than ideal if you don't know what data you'll be seeing.
someuser88 wrote:
I still don't understand why we need learning cells. From what I've read,for each column, learning cells are chosen as the most active cell of that column, but only if there are no other learning cells chosen -- so, at least once cell must be set to a learning cell for each column, for each t. Subsequently, if a cell receives lateral input from other learning cells, then that cell too is chosen as a learning cell. What I don't understand is what the learning cell is actually doing -- i.e. it merely appears to be a flag and nothing more other than to infect other cells with the flag (i.e. what's the difference between the learning and predicting cell?).
For every active column we want to select a learning cell. The learning cell is the one that will have its segments/synapses modified (in Phase 3 of the code). In other words it's the cell we want to learn the current sequence that is happening right now. We only want one cell to learn (per active column) at a time because the purpose of the cells is to distinguish different temporal contexts. For example if we see a "B" we want to learn to distinguish sequences "AB", "CB", "DB", etc. The multiple cells helps us keep all these sequences separate. Seeing the first cell active might indicate "AB" while the second might indicate "CB" etc.
When choosing a learning cell we want to pick the one that best matches the current input. If it's active from lateral input it means the cell is already part of a learned sequence and so if active we are likely in that same sequence once again. Thus that is our best case, we definitely want to learn on that cell to strengthen the connection further since now we've seen the same thing again.
If there is no lateral input, we only know our column is active so instead try to pick the cell that has any active synapse connections at all and learn on that one. Otherwise just pick any cell to start learning a new sequence.
someuser88 wrote:
Are learning cells chosen as the cells which best match the current state of input, as the definitives; such that if no cells exist in a predictive state (i.e. such that prediction is a means of describing states of input) then the set of learning cells are subsequently selected to best represent the state of input - ?
Overall yes that sounds about right. The learning cells (ones we want to modify) are the cells that best match the current state of input. The 'current state of input' is the set of active columns. Each column was either predicted (had at least one predicting cell in it) or not. If a column was predicted then only those cells are 'active' and thus allowed to continue learning (to strengthen that existing sequence). If a column ws not predicted then all cells are active and we want to try and learn a new sequence so we pick one of the cells to learn (usually the one that may have a few synapses already connected but otherwise any cell) so next time perhaps we will have predicting cells in that column.
Last edit: David Ragazzi 2013-02-18
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thanks. I sincerely appreciate all your input -- I believe that this was all that's needed. Will certainly report back with any future discrepancies of algorithmic implementation, but once again I do sincerely appreciate all the feedback you've provided!
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Distal dendrite segments: does this mean that they would also connect to nodes on other rows too, or only other columns of the same row?
distal segments make up 90% of your computer ram takeup. :) they connect every cell of every column to every other cell of every other column.
Prediction mode enabled by lateral input of distal dendrite segments exceeding soma threshold for node activation: why not from feed-forward input?
feedforward makes you know if your predictions are true or not, if they arent, then the synapse is invalid and you decrease it.
a cell is pushed to be in predictive state by the distil segment, once connected.
"Boosting" columns which are below a certain threshold: is this to ensure that all columns are used? I'm not entirely sure of why neighboring columns inhibit the activation of weaker columns yet if they fall below a threshold of some kind, they're "boosted" to compete with a winning set of columns?
if you didnt boost, you wouldnt even be able to get started learning, you have to learn alot of falsities before you come apon a true repeating pattern.
the boosting mechanism does a lot, it even helps stop overwriting old data thats still good.
Last edit: Magnus Wootton 2013-05-17
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
(Hi guys, this is interesting discussion that was started in the old Numenta forum. In it, Barry and others explained clearly some things that the aren't much clear in the CLA Numenta paper)
Last edit: David Ragazzi 2013-02-18
Last edit: David Ragazzi 2013-02-18
Last edit: David Ragazzi 2013-02-18
Last edit: David Ragazzi 2013-02-18
Last edit: David Ragazzi 2013-02-18
Last edit: David Ragazzi 2013-02-18
Last edit: David Ragazzi 2013-02-18
distal segments make up 90% of your computer ram takeup. :) they connect every cell of every column to every other cell of every other column.
feedforward makes you know if your predictions are true or not, if they arent, then the synapse is invalid and you decrease it.
a cell is pushed to be in predictive state by the distil segment, once connected.
if you didnt boost, you wouldnt even be able to get started learning, you have to learn alot of falsities before you come apon a true repeating pattern.
the boosting mechanism does a lot, it even helps stop overwriting old data thats still good.
Last edit: Magnus Wootton 2013-05-17