openHTM / Discussion / HTM Theory and its algorithm: Solving the temporal pooler context forking problem

Itay - 2013-04-03

Hi guys, as you remember, I showed in a post in the HTM theory forum that there is a problem when you try to teach the temporal pooler a sequence that has this pattern : "ABBCBBA"
Such that when this pattern repeats, "ABBCBBA ABBCBBA ABBCBBA.. " and so forth, the temporal pattern can recall this pattern successfully.
Barry looked at the videos and code, and agreed that the temporal pooler in his C implementation (the implementation is probably one of the last forks, since I downloaded it half a year ago) seem to be malfunctioning, however, I still need an answer from him whether or not the malfunctioning still occurs, after he finishes to convert his last C implementation to C#.
This is the old post address : http://sourceforge.net/p/openhtm/discussion/htm/thread/6de01744/

I also uploaded two videos showing you how the HTM region gets confused using the last Barry's C implementation.
http://youtu.be/d8LITtJd3OY - a video showing my C# passing 100x100 images to your library
The output from your region was filtered for the most 500 active columns

http://youtu.be/oKTnmHRRt24 - a video showing my c# passing 20x20 images to your library
The output from your region was filtered for the most 30 active columns

Basically what you see on the videos -
on the left side, you see the original input that I send to the region
on the right side, you see the prediction of your region, filtered by the most active columns.

I have also uploaded a videos of mine, showing some prototype algorithm that correctly predict the ABBCBBA sequence.

http://www.youtube.com/watch?v=osi4P9oyz3Y - input forking test (the same as before, just showing "ABBCBBA" repeatedly..) but this time using my prototype algorithm, you can now see how it's supposed to be predicted.. no confusion. 20x20 pixels and showing the best 30 columns.

http://www.youtube.com/watch?v=XUX1X4tMeEk - Jumping ball test, testing this time BinarryBarry's C Implementation. the jumping ball test allows to see if the tempolar pooler can learn complex real world sequences. this time the context forking gets even more intense. You can see that the tempolar pooler gets confused and the result is not very clear ball prediction - with a lot of noise. the tempolar pooler can't decide whether an "A" comes after the "B", or a "C" comes after the "B", or whether it's just another "B".. I used 30 cells per column, image and region size are 40x40 pixels showing the most 50 intense columns.

http://www.youtube.com/watch?v=aLRxzeH5Uk0 - showing my prototype algorithm on the same jumping ball scene. See 0:10-0:15 seconds for a real clear ball prediction. Afterwards the algorithm stops working good because of various complicated reasons and the video capturing is getting choppy. Image and region size are 40x40 pixels showing the most 50 intense columns.

Anyways, yesterday and today I've been looking at the Barry's C implementation, and been studying the temporal pooler learning for an hour or two. I didn't debug the algorithm and only noted things just by looking at the code in a rough way, And would like to note a few things about the learning algorithm. The stuff I'm about to say is not formal or expert stuff, just some rough things I think about the algorithm. I really think that this portion of the algorithm did not receive enough attention, I don't even know what kind of expertise you guys have with it, but I know that at when I read the numenta papers, I thought it's a subject that's was unclear for me. and is complicated enough to talk about plenty.

One of the most underrated things I have encountered is the formation of synapses from the current (active) cells towards the learning cells in the last step. There is no mention of it in the numenta papers, and I suspect if Barry did not think about doing it, even simple context learning wouldn't exist. I think this kind of stuff is really important, and also in my "prototype" algorithm I take advantage of learning from the last learning cells massively. The reason is, that if you look at a snapshot of the active cells at different timesteps like a frames of a Photographic film, then the formation of synapses between the current active cells, to the cells used to be active at the last step and selected as learning, is the way to record a navigation path between cells, while timesteps, the frames of the photographic film are advancing..

Selecting a cell from a column as "a learning cell" is also a good thing that unfortunely was not discussed much at the numenta papers. In my mind, selecting a cell as a learning cell, means that this is the cell who is going to represent a specific context in a column, for the received input from preceding cells. This is a really important matter. And selecting only one per column makes some sense.
I'm not sure, however, that other cells who are not selected as learning, shouldn't learn other rules such as increasing and decreasing permanences at parallel.

In this post, I will pass on the things I've seen in the temporal pooler learning in Barry's C implementation, and note difficulties I think might happen.

First, Im not really sure how I believe in all the "learn into future timesteps" and learn "future predictions larger than t > 1". Just for simplicity, I pretend we learn for t = +1 only.
The statement : "If a cell is predicted, and also was accidently active by a cell that was chosen as learning cell, then make this cell a learning cell" is a statement to be doubt of.. But I still don' t know the nature of the current implementation but can foresee problems. I don't think we would always like to learn if we were predicted by a cell who is selected as learning. And this is because we might want other cell in the same column to be activated as learning cells , in order to learn a segment towards the synapses who activated it. Imagine a sequence that looks like "ABABABABABAB….ABABABC" and then this sequence shows up again : "ABABABABABAB….ABABABC" - would all the "AB"s belong to the same cells on their columns? And selected as learning cells because they were predicted by learning cells, and no other cell in the column would be selected as learning cell? If yes, then this is a problem, because the first "AB" should belong to a different cells in the column than the second "AB", or the third "AB"..

If there was no learning cell chosen, then we choose the learning cell as the one activated from the existing connections. If no connections are found, then the chosen cell would be the one with the least number of segments.
So far so good, the learning cell is the one who was most activated. But I can see how problematic scenarios start to happen. Imagine you have a scenario like the previous one : "ABABABABABAB….ABABABC", and you are at t = 4, you are learning the third "A". You are predicting an "A", and the learning cell who is selected, is the same cell that get the most active input in the column, which is the cell that belong to the first "A". Still, we would like to force some other cell in the column to form synapses, because this is a different "A" to the first "A" and should be represented by different cell.
Now imagine a different scenario : a partially correlated input arrived, and the same cell is chosen as the learning cell since it is the best cell "choice" from all the avaliable cells in the column. However - that partially correlated input is basically a different input context altogether and should be sent to a different group of cells when the chosen cell fires, that belong to a different context.
Those questions are hard to answer and might be answered only by a detailed debugging and analyzing.

All in all, the learning cell is selected, and the best segment is selected for connection weights updating later. While the synapses who were active and connected will be updated, and new few synapses will be allocated from the previous learning cells and replace existing synapses.

In phase 2 of the learning, if some segment caused any cell to enter prediction state, we reward that connection. I have no comments about this operation and I think it's a wonderful one..
On the other hand, the reinforcement of segments who previously predicted the cell, and even learning and replacing synapses from these segments.. I Didn't really get the importance of this rule and I don't know what it is good for.

There are few more learning rules which I didn't note and I think are perfectly fine. While this is only a rough sketch of the learning rules of the temporal pooler, it's only a scratch on the surface and plenty of debugging need to be made in order to understand the problems and deal with them. If we want to deal with this problem, then a complicated enough tests needs to be made to challenge the tempolar pooler which will focus on context forking. Such as : "ABBCBBA, ABABABBCABABABBC.. " Another option is to think of acceptable limits that the temporal pooler should be able to perform within and agree with them despite not performing perfectly.

On my next post, I will show a different strategy to learn sequences that will focus on learning from last learning cells.. You know, maybe the strategy in the current implementation works even better than the strategy I will show next, but I really don't know until we debug it extensively. Thanks guys and please comment on what you think and feel free to suggest things :)

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Doug King - 2013-04-03

http://www.youtube.com/watch?v=aLRxzeH5Uk0 - showing my prototype algorithm on the same jumping ball scene. See 0:10-0:15 seconds for a real clear ball prediction. Afterwards the algorithm stops working good because of various complicated reasons and the video capturing is getting choppy. Image and region size are 40x40 pixels showing the most 50 intense columns.

Very impressive. That is the best example of a working HTM that I have seen so far.

We need to follow this line of questioning. Part of the problem we are having is the tools for testing and breaking apart the functions of CLA into testable units. Also we need monitoring tools that will get us a better idea of how the CLA is performing over time, connection metrics, etc. I will see if I can add the monitoring stuff that I used to have in my HTM.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Itay - 2013-04-05
  
  Hi doug..
  
  Can I add something similar to the visualizer in my videos when learning only using the temporal pooler? (I can add more features which can be helpful) Using the 3d visualizer for visualizing the output region of a test case is rather uncomfortable..
  Also, I tried running up the "Hello world" example in the last snapshot and seems like the prediction rate is ~45% ? is that correct? when i look at the bottom I see the output of the region as being some kind of average over all the input pictures..
  I also think that unit tests are not the way to go until we reach a state where we can feel comfortable with the performance and instead to run manual simulations with precise statistics and complete debugging suite. in my eyes unit tests are some sort of scaffolding to make sure stuff won't break in later point and to catch small bugs, rather than proving stuff is working before we see an acceptable performance.. I've also worked with unit tests extensively in my work at the software company im working for..
  Currently Im not sure what is the state of matters, and im waiting to see in what stage things are. when are we going to deal with the inner working of HTM and the temporal pooler? if you guys want to reserach more into my technique for learning sequences i will also provide details regarding the problematics..
  
  Last edit: Itay 2013-04-05
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Itay - 2013-04-04

Hi guys :) In this post I will explain a strategy for learning sequences.

In order to learn sequences with context, one approach can be to remember a navigation path between cells who are active at advancing timesteps.
Imagine we have a really small neurocortex. It's width is 4, it's height is 1, and it has 3 cells per column. We can use this kind of small neurocortex to visualize and experiment with different algorithms of learning.. Let's now imagine we are looking at the columns from the side, so we can see the activity inside. Like this : http://i.imgflip.com/yom4.gif

Let's try to learn the following sequence : "ABBCBBA", there are now 4 cells per column to be able to learn this sequence.
http://i.imgflip.com/yxc8.gif
In the animation above, there is a simple algorithm :
0. Start with a blank region without any connections.
1. If a column is active, select a random cell in the column which doesn't have any segments as the learning cell.
2. If a learning cell Is selected, learn a new segment and connect to the previous learning cells at the last timestep.

This is really simple and this way you can record the longest most complicated sequence.
It basically all comes to a matter of having a closed graph, or a closed rubberband, if you look at it in 3D.
But this is not so simple really.. Is it.. :(

Let's try simulate the learning of the sequence "ABBA" when it is repeated endlessly.
For simplicity, there are two columns in the region.
http://imageshack.us/f/542/learningabab.png/
At t = 0, there is "A" input, all the cells in the A column are active. But one cell without any segments is selected as the learning cell.
At t = 1, there is "B" input, all the cells in the B column are active. But one cell without any segments is selected as the learning cell, and because there are previous learning cells, it forms segments towards the previous learning cell. It is shown as "LP".
At t = 2, all the cells are active In the B column, and we select a different cell as a cell who learns. Who forms connection to the last learning cell.
At t = 3, all the cells in the A column are active, and we assign a learning cell in the column and form connection to the last learning cell, but notice that the B column is predicted falsely.
At t = 4, all the cells in the A column are active, and we assign a learning cell in the column and form connection to the last learning cell, but notice that the B column is predicted - this time it is predicted because we start again with the sequence, we started with "ABBA" and we begin again with the next "ABBA". This prediction is true.
At t = 5, a cell is predicted in the B column successfully, so no need for all the cells in the column to become active. But notice something strange here. In this step we form two segments from two different cells. One segment is a segment from the successfully predicted cells towards the last learning cells. When there is a situation like in t = 4 when the column did not predict successfully, and in t = 5 another column is predicted successfully, then in this example, one of the cells in the t = 5 column is forming a segment towards the last learning cells of the column who did not predict successfully, in t = 4 in order to form a closed loop. (you will see later that this is the main difficulty of this algorithm and it's Achilles heel, in a more complicated situations, basically you don't know which cells to connect to the last learning cells in order to form the closed loops and will have to create an impossible number of connections and learning cells), in addition to this segment formation, we also create another learning cell for the column as usual and connect to the previous learning cell.
In t = 6, the second "B" in "ABBA" is predicted successfully. Note that I don't create another learning cells although I should have, this is because of my lazyness ;)
In t = 7, the last A in "ABBA" is predicted successfully
In t = 8, the first A in "ABBA" is predicted successfully
In t = 9, the first B in "ABBA" is predicted successfully. Note that without forming the closed loop this wouldn't work.
In t = 10, the second B in "ABBA" is predicted succesfully.
In t = 11, the last A in "ABBA" is predicted successfully.
In t = 12, the first A in "ABBA" is predicted successfully, and so forth.
Note that there are redundant connections that don't get used with cells who stops predicting or simply connections who don't get used at all. Those kind of segments who don't get used should slowly disappear with permanence reducation, in order to free up space in the columns for future learning. The permanence of segments that are used often is increased.

Here's an example of my algorithm (captured directly from the debug screen, we gotta have such debug screen otherwise problems will be really difficult to debug) that is working on the repeating input "ABBCBBA" :
http://i.imgflip.com/yxel.gif
Some signs in the animation :
Yellow = cell is active
Green = cell is predicting
White = cell was predicted successfully
LP = learning previous cell (the learning cell in the previous step)
L = current learning cell
Note that some permanence modifying algorithms are in effect, I don't even remember what they are exactly because I have programmed this algorithm half a year ago.

So basically, in general :
• What input comes through, you learn in parallel to the current activity in the region, since you never know how long and complicated the sequence will get. For an example, the sequence "ABABABABC" can get repeated plenty of times and could be a valid sequence which needs to be predicted.
• Segments aren't in use are deleted to free up space for more used segments.
• Segments that are in use often, will strengthen their permanence to not get deleted easily.

In detail (I really don't remember all the rules I have implemented some time ago, but I give here the basic rules for the basic version of the algorithm) :
• Start with a blank region without any kind of connectivity.
• If the column is active, select a random learning cell, which is a cell in the column that have no segments assigned. Have the selected learning cell form segment to the last learning cells.
• *** if in the current timestep one of the cells in the column is predicting and also active, then check if the previous learning cells owner column did not predict successfully. If the column did not predict successfully, then form a segment from one (or more, not sure about that) of the active and predicted cells to the previous learning cells in order to form a closed loop. This kind of step is problematic and I will try to talk about it on the next post.

I will try to review the problematics in the algorithm and tell you in the next post. But note that I did not research about them much.
Thanks guys :) and hopefully tell me what you think :)

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Barry Matt - 2013-04-09

I have good news for you Itay. I finally got around to setting up a full unit test in OpenHTM for the ABBCBBA sequence. I set it up to directly read from your original 100x100 bitmap image files into a spatial-hardcoded region. The test runs through the sequence a full 10 times and writes the prediction for each step as another bitmap file of the same format.

First off, I want to point out that your code was not examining the predictions in the same way that I would have. Your code was trying to find the subset of "strongest" predicted columns and only use that. In my test I kept things much simpler, I want to see all columns being predicted for t+1.

When I ran the code from earlier today the results I got were pretty close to what I expected, but not quite right in the end. Things were a lot better than what you were seeing, however the region was never quite able to get the temporal contexts straight so seeing "B" would often result in multiple predictions (since all of A, B, or C could potentially come after a B). Here are the predictions in a graph:
http://imgur.com/29QPOiT

In the picture you see the original input sequence on the first row. Then each subsequent row represents the region predictions during the first pass, second pass, third pass, etc as we repeat the sequence to the region with learning enabled. You can read the columns straight down, so in column 1 you see at top the input image at time 1 and each row represents what the region expected to happen at that time during that iteration. So 100% accuracy would mean the row should look identical to the top row. (The gray is simply a way to distinguish that this is a predicted image, so pretend gray==black). Finally, all iterations after the last row pictured were identical so this is where the region learning stabilized.

As you can see the predictions were pretty good but the contexts were a bit confused so several times you see it predicting multiple items since it wasn't sure which would happen (it thought it was possible for either to come next).

Now this context confusion bothered me a bit as I thought the context cells should be able to handle that over time. One thing I remembered in the current code is that the GetBestMatchingCell in Column was simply choosing the first cell it found in the case where multiple cells have equal segment counts. This is a bit wrong, it really should have been selecting a random cell among the set. What this means is that if the same cell is chosen across the region we are significantly limiting our ability to track unique contexts. So I made a small change to select randomly instead.

Turns out I was right. Here are the results after making this fix to learning cell selection:
http://imgur.com/btWs1fF

Now after 7 iterations I get perfect 100% predictions for the full sequence. The region stabilizes to this perfection from that point on. You also see that the predictions before that point are much better than the previous results.

I now feel a lot more confident than I did before. Thanks for the test. You can run it for yourself in the CLA.Tests package if you want to see my exact code.

Your discussion on this topic so far has been helpful. Some of your ideas make a lot of sense. I do not doubt we could make further improvements to the learning algorithm to make things even better over time!

Last edit: Nick 2013-04-09

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nick - 2013-04-09
  
  In my update I forgot to cancel number changes, so you need to set less iterations (7 and 10) to save your time :)
  
  Last edit: Nick 2013-04-09
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Itay - 2013-04-09
  
  Hi, very good news :)
  I have had some doubts this would work, and I do remember Numenta having a discussion in one of their patents about which cell in the column to turn into a learning one. whether if it's a random one, or have the least number of segments..
  I did not yet test your work, I will now test the region thoroughly for complex sequences. I will also try to learn how the algorithm manages to do this and gain some insights because its really amazing it does learn.
  The reason I chose to collect information about the strongest columns (and not simply about all of them) is that eventually, the sparse pooler is going to collect only the strongest predictions and the other ones mean less. also, in my algorithm, I used this property extensively because incorrect predictions are happening all the time in my algorithm, the way I solved this problem is to assume that the less strong predictions are lost due to sparse pooling and predicting incorrect things is OK as long as they are not the strongest things.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Itay - 2013-04-09
    
    I have tested more complex tests, and unfortunely it seems that the context forking enabled by the current CLA implementation is limited to two steps.
    tests like "AAAAX" is not working well, even when I filter the most intense columns.. it's avaliable in the IDE\benchmarks folder.
    I have also included the jumping ball test (an extreme example of context forking) in the Benchmarks folder, and it also seems to be too noisy.
    And still, it's impressive you managed to fix the "ABBCBBA" test, and I'm going to study how the algorithm actually does that.
    We still have work to do..
    
    Last edit: Itay 2013-04-09
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    - Doug King - 2013-04-09
      
      We still have work to do..
      Yes, once we have a proven stable CLA that we can benchmark we can start to optimize.
      
      There is not end to optimization, I see further down the road simple little algorithms that adjust some of the key parameters in the CLA on the fly, depending on the type of data, speed of required learning vs. stability, etc. So you could imagine a tuning algorithm that adjusts permanence on the fly along with how many synapses we want to initialize depending on how accurate we want output vs how fast we want output to react, etc. etc.
      
      Once again, a plug-in api (my solution to everything :-) with hooks into CLA settings would allow for the plugin to get feedback on how CLA is performing, under what conditions, etc. and then make adjustments to the CLA settings. Then the community would come up with their own solutions for different problems.
      
      If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    - David Ragazzi - 2013-04-10
      
      Hi guys,
      
      Yesterday, when I looked again the code about learning cell, I thought in some things that could be useful to test if context forking is related only to chosen cell.
      
      Barry said:
      
      Now this context confusion bothered me a bit as I thought the context cells should be able to handle that over time. One thing I remembered in the current code is that the GetBestMatchingCell in Column was simply choosing the first cell it found in the case where multiple cells have equal segment counts. This is a bit wrong, it really should have been selecting a random cell among the set. What this means is that if the same cell is chosen across the region we are significantly limiting our ability to track unique contexts. So I made a small change to select randomly instead.
      
      Instead of use ramdom cells to decide which cell is fittest why we don't picked the less active cell (over duty cycles) similar what to happens to columns competition (of course, having care of not repeating the last used cell to learning)?
      
      On the contrary, ie in the case of continue to use the same approach (random cells), it would be interesting we test using more cells per column in order increase the range of ramdom choices and check if this can cause effects in context or not.
      
      David
      
      If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
      - Barry Matt - 2013-04-10
        
        I should clarify. The algorithm has barely changed. The best matching cell is still picking the cell with the best segment, which falls back to the cell with the fewest segments. The only thing that changed is what happens if there is more than one cell with the same number of fewest segments? Before we would always just take the "first" one, now I take a random one among those that have the same number of fewest segments and not among all cells.
        
        This is particular important when a region first starts learning, because at that point there are no segments at all. So in that situation all cells have 0 segments so all cells have the fewest. Using a random pick at this point helps keep the context (which is represented by the collection of cells among the columns) a more unique set. This then theoretically helps with keeping track of different temporal contexts a bit easier, less likely to have cells/columns get confused.
        
        So your suggestion of "less active cell" is actually what we are already doing as the primary means of making the selection. The random only comes in when, after all else fails, we need a final tie-breaker to chose among the cells that remain. This situation is mostly relevant right at the start of learning (such as the short ABBCBB). When a region has done a lot of learning it should be rare to have to fall back to random.
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
        
        David Ragazzi - 2013-04-11
        
        oh.. ok.. Really is true.. I missing this point.
        
        So for this be sucessfull, spatial pooler should work very well in order to avoid a column is used more of one time when learning, ie avoid that a same cell in this column is choosen ramdomly again for the same sequence being learned.
        
        If I understand well, maybe the problem is that we are testing with hardcoded configuration which makes the same column be used several times by 2 or more inputs sharing common points. If we have a sequence with I-T-I-T-I-T, the possibility of a cell be ramdomly choosen twice increases, because at least all columns in the midle of X axis is always actived.
        
        Another thing, even that a cell already have segments maybe it ignore create new ones due the context are similar to previous inputs (ie cells get confused). I mean spatial pooler might decreases this possibility.
        
        I'm not affirming this, it's just an hipotesis. I should look the code later to observe better the current temporal behavior. But if this is true maybe this explains some context problems with repeated or similar inputs. And again, please correct me if I'm wrong or confuse some concept. :-)
        
        Last edit: David Ragazzi 2013-04-11
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
        
        Itay - 2013-04-11
        
        As much as I know, with the spatial pooler working, an input X will be mapped to Y. but if you show another X, it will be mapped again to Y. it will not be random and activate random columns. although yes there is spatial learning but it shouldnt be learn so fast to affect the sequences learning. as far as I know.
        
        I don't know how columns work in the brain, and I don't know if the original algorithm of the brain is programmed to know how to deal with columns being active twice or more times, and learn sequences with the same column being active. I don't know if the brain does this randomly, or in a precise way. But it's really hard to work on a general example and generate a general algorithm using a complex fuzzy but repeating input. we still need the temporal pooler to work correctly, so I think right now making the temporal pooler with the most basic inputs seems to be the right way..
        
        I have some problems with my computer atm and I couldn't make progress the last two days. but my thoughts haven't changed.
        
        I also did a couple of hours of studying my own learning algorithm and finally got reminded of some problems. essentially, my post at the top of the thread is a base algorithm for learning which must be expanded to have a working temporal pooler. however there are problems because when you learn and form loops between sequences and connections that switch between sequences, the moment that some neurons form the wrong connections, a sequence path might get broken and won't be able to get strengthened. in order to solve this problem i believe there are a few possible solutions. the "no brainer" solution is to exponentially copy all the cells and their connections and create them lower in the column each time the problematic step of closing loops is performed (see my second post at the thread). this also costs a lot of memory and limits the sequences length that are able to be learnt. I have also thought of other approaches. another solution is to mark the connection from the previous learning cell to the current learning cell as "strong" connection, and any other connections as "weak" connections. the weak connections are the connections that form loops and switching between sequences, and modify the weak connections while not hurting the hard connections, I did not think of how to do that. another solution might be to record sequences as usual (previous learning cell to the current learning cell) and in order to close loops, look on the whole column who is lit, and see all the possibilities - lit up all the cells in the column, check which other columns are being lit and check which cells in them are lit. as time passes, some cells in correctly predicted columns are going to be repeated multiple times in sync with the sequence. but some cells will lit up more common than others. those cells who lit more common than others can identify which cells needs their permanence increased, and how to close the sequence loop.
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
        
        Itay - 2013-04-12
        
        Alright,
        I couldn't work on the computer the last few days and found some time to run through examples in my prototype algorithm, which is described at post number 2 at this thread.
        In this version of the algorithm, I simulated the sequences "AABAB", "ABABC", "ABABABC", and "AAABAAB", and I think the learning was 100% correct.
        Which means, programming this algorithm on software would probably would learn the most complicated sequences. however, there are, like I said, problems I will describe below.
        When I simulated these examples, I used segments whose forming is delayed, who begin construct themselves at the current timestep, but finish the construction only at T+2, but Im almost sure learning will occur at normal segment forming.
        I also noticed that the more you prolong segments formation, there is an effect that might help in reducing the amount of closing sequence loops (which will be described here)
        clearly, the most difficult step in my algorithm is closing the sequence loop and switching between sequences, and forming a closed graph.
        In this post I will make clear of my prototype algorithm and describe it some more, while putting a special emphasis on closing the loop step.
        
        First of all, it's obligatory that you read posts #1 and #2 in this thread in order to understand.
        this algorithm fits for segment threshold of 1 and not many simulteniously active columns. but might be able to expand to support bigger configurations.
        
        Start with a blank region without any kind of connectivity.
        
        Pass on all the columns :
        2a. If the column is active, then select a learning cell, which is a cell in the column that have no segments assigned.
        If all the learning cells have segments, then choose the cell with worst segments as the learning cell and clear it's segments.
        2b. Assign a segment to the learning cell which includes the last learning cell from the previous timestep.
        2c. closing loop operation : if one of the cells in the selected column is predicted and active, then for every last active column in the last timestep :
        2c-1. if the last active column was not predicted, then
        2c-2. for every predicted cell in the current column :
        2c-2a. choose a learning cell in the selected last active column, which is a cell who doesn't have segments or is the cell with the worst segments.
        2c-2b. copy the segments of the last learning cells in the column to the new learning cell.
        2c-2c. assign a new segment to the predicted cell which includes the new learning cell.
        
        pass on all the columns
        3a. if the column is active and predicted, then
        3a-1. increase the permanence of the predicted cells segments.
        3b. if the column was predicting in the last time step but not active now, then
        3b-1. decrease the permanence of the predicted cells segments.
        
        that should suffice for now.
        if there would be an interest, I can provide more pictures to illustrate the closing loop operation.
        Basically, this operation has an exponential nature, because each time you close a loop, you assign new learning cells which their number is equal to the predicted cells in a predicted column.
        In time, the amount of predicted cells can become exponential, because the learning cells can be become predicted too.
        I don't know in what size of sequences this can become a problem...
        But I do know this algorithm can work for simple cases with little amount of columns, dont know about complicated cases.
        
        Last edit: Itay 2013-04-12
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
      - David Ragazzi - 2013-04-16
        
        David wrote:
        in the case of continue to use the same approach (random cells), it would be interesting we test using more cells per column in order increase the range of ramdom choices and check if this can cause effects in context or not.
        
        It seems that Numenta already did this. In CLA white paper they illustrates context with 4 cells per column. However in the new site white paper, they prefer use 10 cells per column in order to make exponential the number of contexts:
        
        A one-step ("first order") prediction system is not enough to learn complex patterns, however. For example, given the word "like," it is difficult to predict the next word. But if you heard "time flies like...," the extra context could help you predict the next words you hear will be "an arrow." Or if you heard, "try it, you'll like...," you might predict "it" instead. We can create a "variable order" memory system to learn longer sequences by adding cells to form a column. When an input is detected, one of the cells in a column is activated. If that same input is subsequently seen as part of a different sequence, a different cell in the same column is activated. This allows us to expand exponentially the representations of a given input in different contexts. When you hear "like," the same column activates, but you can distinguish the different meanings of "like" because different cells in the column fire, allowing you to make different predictions for what will follow
        
        The diagram below illustrates how this works. For a given SDR, instead of one cell, each bit is represented by one cell in a column of ten. The second representation shows a variation of the same input with different cell activations. If you have 40 active columns and 10 cells per column, this means there are 10^40 ways to represent the same input in different contexts."
        
        https://www.groksolutions.com/technology.html#cla-whitepaper
        
        Of course, this doesn't mean that the problem is only with cells per column, but the chances to a cell being chosen twice is much less.
        
        Last edit: David Ragazzi 2013-04-16
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    - Barry Matt - 2013-04-10
      
      Nick pointed out to me I was actually testing ABBCBB, guess my mind thought the final A was just a loop of the beginning rather a second distinct A. By the way I updated my ABBCBB test to use 20x20 images instead of the 100x100. The test and predictions should be exactly the same it is just the 100x100 was taking a few minutes to run while 20x20 is <1sec. So I figure no need to waste all that test time for no additional gain.
      
      With the full ABBCBBA, the learning does not do quite as well when dealing with both repeat B's and repeat A's. You are also right that AAAAX is a bit of a struggle, same issue. Seems the HTM is not great at handling inputs that are immediately repeated. This is definitely something we are going to have to keep working at to improve.
      
      With that said however I hope my testing so far has at least given you a bit more confidence/hope that we might be able to get something useful out of this project. I am certainly open to suggestions for ways to change the algorithm to improve results.
      
      Now that we have at least a minimal baseline of tests (more will come over time of course) it should make it a lot easier to try out changes to the learning rules and see what effect they have on results. You claimed to have written your own variant of the HTM sequence learning which resulted in better predictions (like your ball bouncing video). Can you be more specific (in code) for what exactly you would change in the HTM to produce a similar result?
      
      If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
      - Itay - 2013-04-10
        
        Hi Barry
        I like it when there is a fast discussion
        The original test is, indeed "ABBCBB" and not "ABBCBBA". the extra "A" in the end is actually the "A" in the beginning. and that's fine because it in itself has some context forking.
        
        "You claimed to have written your own variant of the HTM sequence learning which resulted in better predictions (like your ball bouncing video). Can you be more specific (in code) for what exactly you would change in the HTM to produce a similar result?"
        Yes, look, when I did that I tried to ignore many of the concept the current tempolar pooler is using and instead headed on my with my own vision.
        The result is an algorithm that can learn context to some extent and some degree of complexity as long as there is one rule : you always filter by the most intense columns in order to see the predictions. This is because the spatial pooler is anyways taking the strongest predictions from the region, so it doesn't really matter if there are bad predictions as long as they are not the strongest ones.
        
        It also have problems of it's own. and plenty of them and currently not in a state we can use it and expect good results before we solve those problems. it also doesn't learn the jumping ball example to it's fullest, because it's starting to break after a few iterations, and if we'll use it we will have to think how to solve the problems somehow. this algorithm is also vastly different than the way the current CLA temporal pooler works.
        
        The algorithm it basically described at the top of this thread. I have already posted it and it's concepts. It is very simple relative to the current CLA implementation. this is the same algorithm that runs the jumping ball videos.
        
        Last edit: Itay 2013-04-10
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Doug King - 2013-04-09
  
  Soooooo cool to see both of you guys tackling this and challenging assumptions. This is exactly what I was hoping for with Itay joining, and Barry giving another critical look at the implementation. Unit tests are the best way to isolate issues. Thanks Barry, Itay and rest of team for moving the whole project forward. You guys rock !!!
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

David Ragazzi - 2013-04-09

Very nice work Barry and Itay! This discussion undoubtedly is generating good results. I also started reviewing again the CLA algorithms to contribute more on these crucial issues with the core. In fact, I recommend everyone to do it, ie re-read the white paper and compare with our current implementation. The more minds thinking about it, the better the result.

Obs: I will add more comments in the code since some parts require a longer time to understand them. Segment update is one of them.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

David Ragazzi - 2013-04-18

Hi guys,

I order to help us analysing the temporal pooler I created a very simple input file which consist of a bit walk column by column in a sequential order.

The synapse parameters I used were the same used in the Core test which were reported as success. The region parameters were mininal (size: 3x3, 1 cell per column (no context), 1 new synapse count: 1, activationSegThreshould: 1).

To my surprise I got the results in the attached image.

Note that until Step=18, CLA is predicting correcty the next cell (green as correctly predicted cell, yellow as predicting cell). However when the cycle starts again (Step=19) it add other segment to the column[0,0] with the previous column in T-1 and then begin the messy. At each cycle all cells add segments to their neighboors without any criteria. If you look at information grid for the selected cell, you will see that at each cycle to read the file the number of added segments to each cell increases for the same simple sequence!

Test also with cells per columns = 2, and you will see that the addition of segments works well in the second cycle, but after begins to create several segments to the same sequence.

I tested several configs with segments permanences but without success. Maybe I have misused something but until the first cycle predictions works like a charm.

Could this being really a bug that are hindering the temporal pooler or is just bad parameters? Please analyse Quick test code and see this event happening.

David

SingleBit.png

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- David Ragazzi - 2013-04-18
  
  Ok.. it not just "without criteria".. I think I got it. At each cycle it add a segment to cell in T-1, then for T-2, until a cell has segments with all their neighboors in order to it predict in several time step ahead. Even so, it don't stop creat segments in a progressive increase per cell (3, 6, 10, 15, etc).
  
  Furthermore, with cells per column >= 2, CLA still creates segments from its other cells in same column instead of use a single learn cell for the same sequence.
  
  Once the contex dont change, it shouldn't choose other learn cell for the same sequence and add segments from this second cell (in column) to its neighboors.
  
  Last edit: David Ragazzi 2013-04-18
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Barry Matt - 2013-04-18
    
    David I believe I know why segments are constantly being added. It is due to the (hidden) parameter MinSynapsesPerSegmentThreshold in the Cell class. Currently this is set to 1 which should mean that a segment must have at least one active synapse to be considered for reuse (otherwise a new segment will get created).
    
    However there was a slight bug, the comparison was checking for >1 rather than >=1. This meant, by mistake, a segment must have at least 2 active synapses to be considered for reuse. In your test you have it set to add only 1 synapse per segment, thus on the next iteration the best segment returns nothing because it could not find a segment with >1 active synapses. I just committed the fix so it will require >=1 which means your test case should be reusing existing segments instead of always adding new. Please try running again and see if this fixes it.
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    - David Ragazzi - 2013-04-18
      
      Oh.. really makes senses.. When I go home I'll test again..
      
      The advantage of very simple tests with 3rd view is that is easier find these subtle things in real-time. If everything works I go forward increasing complexity of the tests.
      
      If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
      - David Ragazzi - 2013-04-18
        
        I tested with 1 and with 2 cells per column.
        
        Now it's perfect!
        
        See the predictions in the attachment. Independent of the time step, it predicts correctly and in a stable way.
        
        Last edit: David Ragazzi 2013-04-18
        
        SingleBitPredictions.png
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Barry Matt - 2013-04-18
    
    Ah regarding your other concern about progressive increasing time steps. I have a parameter in Segment called MaxTimeSteps that defaults to 10. This refers to the maximum number of time steps we should make predictions for. However upon looking at it again apparently I only limit the value itself but am not preventing the segments from being added. In other words, the "limit" is simply saying "if time step is >10 then just pretend it is 10". Which is wrong, it should be preventing the segments from being added all together past 10. I just committed what should be the fix for this.
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Solving the temporal pooler context forking problem

An open-source implementation of the HTM Cortical Learning Algorithms

Forums

Help

Solving the temporal pooler context forking problem

Solving the temporal pooler context forking problem

An open-source implementation of the HTM Cortical Learning Algorithms

Forums

Help

Solving the temporal pooler context forking problem document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Solving the temporal pooler context forking problem