I really do not understand why you want to implement your own version of HTM, while we are working on the same.
Can you explain it?
BR Binh
----- Ursprüngliche Nachricht -----
Von: Lauri
Gesendet: 14.03.13 10:42 Uhr
An: [openhtm:discussion]
Betreff: [openhtm:discussion] Few questions about HTM theory
Hi, I've recently found out about HTMs and they seem very interesting. I'm just an amateur programmer and I have no knowledge of statistical models but I'm determined to understand how this theory works :)
I've read the Numenta documentation and looked at your project and I started to write my own implementation of it to better understand this. I have some questions I need help with.
When initializing region we create proximal synapses to a random set of inputs for each column. Is it possible that with this randomness some input bits don't have any synapses connected to them and does it matter if so?
Numenta documentation says each column has a natural center over the input and synapses in the center have a higher permanence when initializing but it does not say by how much, why is it necessary and how I should calculate this?
When permanence of distal synapses falls to zero should we delete them? I don't see anything mentioned about this in the documentation.
Numenta documentation talks about Xth input bit but then it also talks about center of input. Is the input supposed to be a matrix or just a series of bits and does it matter which?
How big should a region be compared to input and how many regions are needed? I know that larger regions means more complex patterns and more regions means almost the same but is a region supposed to be larger than input or smaller and how many regions is needed to solve some usual problems?
When making predictions how do I know how far ahead in timesteps is the prediction? Also can I calculate the probability of the prediction based on past events?
How do I convert text or images to input that makes sense for region? With images I mean not black and white dots but let's say gray or even colored. With text do I have to give input one letter at a time (because the input size is fixed)?
I didn't see anything written about naming patterns in the documentation. How do I know which set of columns correspond to a pattern I want to know about? Do I have to tell the region that current input contains patterns "A", "B" and "C", then later I tell ok now this input contains "B" and "C" and it compares with previous input to find the common columns so it can narrow down which columns correspond to which names? If that is true then how would I do this and if I have multiple regions in layers which region do I tell this and which region should I later ask about it? Also is it possible with boosting and that stuff that columns that previously corresponded to "A" now correspond to something else or "A" is now different columns?
Is there a more thorough documentation than Numenta's on HTMs that I could read?
Yes, if I write everything from scratch on my own then I can remember how it works better. I don't know if you have tried this before but for me if I read something I can remember a little bit but if I write it I can remember it a lot better. A good understanding of this theory is much more important than a good implementation for me. :)
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi Lauri,
It looks like I will have to be the one to address your questions. OpenHTM was originated from a fork of my original HTM implementation back in 2011 so I guess I am one of the more experienced with the algorithm itself.
Based on your level of detail it appears you have been closely studying the Numenta documentation and other information which is great! Here are my thoughts on your questions:
1)
That's right. When using random it is possible some input bits may not have any connections. That is always a risk however generally speaking if you choose your parameters correctly the probability of this happening is extremely low. Usually columns in the network have very overlapping input fields, in other words the possible pool of input bits to connect to a column is highly overlapping with many nearby columns. This means each input bit will usually be connected to many columns. The odds of a given input bit connecting to nothing are so low it can be assumed it won't happen. However, even if some have a few bits that get ignored by the network it is generally not a big deal due to the nature of the HTM considering aggregate sets of bits during its processing. That means that any one bit does not have much influence on the results.
2)
First of all this is merely an option, HTM can be configured to have columns connect to bits from anywhere. However for this situation try to think of vision as an example. If an HTM were to process a 2D image, try to think of the HTM itself as another 2D matrix of columns. If you were to overlay the HTM column matrix on top of the 2D image, then it may make sense for each column to then connect mostly to the input pixels that are nearest to the column via the overlay. This is what "natural center" means. Each column has a natural position in the input if the HTM network were overlayed on top of the input. Again during initialization we might want the input pixels closest to each column be given a higher weight, as a sort of way to indicate that "if this column is active it probably means pixels in a close radius to this column have high activity". As for how much to bias these weights, that is another parameter and Numenta does not have a good answer because it depends largely on the data in question and more important, it is still an open research topic. Nobody is quite sure yet.
3)
I have thought about doing this too. The Numenta doc does not say anything about deletion. Instead it suggests synapses exist forever, even if permanence stays at zero for a long time. With that said, there have been discussions about performing a delete after some threshold (zero for x time steps) is reached. However this is still very much in the experimental phase and nobody is quite sure what the behavior should really be here.
4)
Related to your question #2. The input only need be an array of bits. In the case of images/vision however it is helpful to consider the input (and HTM region) as layed out in a 2d matrix. Like I mentioned in #2, this would then establish natural center locations of each column when overlayed on the image matrix input. However in general the input is just an array of bits, and the HTM network itself can be just an array of columns. The "shape" of the input or the network is only relevant as a way to define the initial connections of columns to input bits. If the input is unknown, then columns can connect to any random input bits and the HTM can still pick up sequences as the input changes, no need for natural center or anything in that case.
5)
This is another on-going question. Generally speaking however the region size should be a smaller than the size of the input. For example if the input is 100x100 then a typical first region might have 50x50 columns. Yes this means that there exist a set of input values that will all map to a single region state. Most of the time though this is generally what we want. Those set of input values that map to one state would all be seen as extremely similar inputs, to the point that for the purpose of sequences they might as well be the same. Similar to how a human might see a set of images that are extremely similar be interpreted as exactly the same, at least in terms of what I predict it is associated with or what might happen next.
In regards to multiple regions, this is still very unclear how it is intended to effectively work. I believe even Numenta's Grok is only using a single big region rather than multiple (at least as far as I can tell). There are a lot of issues around the behavior of multiple connected regions, it deserves it's own thread topic for discussion. For now just know that most experiments have been done with single region only.
6)
Ah good question. When segments are added they are tagged with their time step prediction value. Meaning, when the HTM is learning, it can say "when this segment fires, it means the cell it is activating is being predicted to happen in 6 time steps". So then if a bunch of cells are predicted we know for how many time steps away based on which segments were the ones that caused the activation to begin with. There exist "1 time step segments", "2 time step segments", "3 time step segments", … "n time step segments". In the Numenta doc when you see "sequence segments" they are talking about the "1 time step segments" or those that are predicting cells to happen in the very next time step.
7)
Another good question. Unfortunately this one is still very open. There are a lot of ideas floating around for ways to do this but nothing so far has really stood out as a particularly effectively solution. Even Numenta has not solved this one, they have instead been focusing Grok to run on much simpler structured data. Things as simple as temperature readings over time, utility usage over time, state of machinary over time, etc. These inputs are much easier to conver to appropriate HTM bit representations. Things like text and images are much richer, in that a single input can potentially represent a quite complex item. In order for the HTM to pick up proper sequences we need proper representations and so far with text and images we generally do not know how to effectively do this. Lots of working theories and experiments, but the discussion is on-going.
8)
Another very often asked question. Put another way: given a set of active or predicted columns how can I reconstruct the original input that corresponds to this? The answer so far is, you can't really. Or at least there is nothing in the Numenta doc that defines how. Part of the problem here is first of all if you have a region that is smaller than your input (as mentioned in #5) then some information was lost on the way in. Since many inputs can map to a single region state it is then imposssible to know exactly what the input would be. Instead the best we can do is an approximation, which for most purposes is usually enough. However the approach even for reconstructing the approximation is not well defined. One way might be to take the strongest connected input bits from each of the active columns and keep only the top x% to result in a typical amount of active inputs bits per input. Others have tried another completely separate approach where they actually store a full mapping of input bits to active columns (and reverse) so they can always go back. This requires a lot of extra memory to hold this reverse mapping for all inputs so for larger (non-toy) projects this is not an ideal solution.
Finally yes I realized this about boosting as well. You are right boosting can result in a situation where "A" once activated certain columns can later end up activating others. This is bad if it happens because it would break all the existing learned temporal sequences. For a lot of my early experiments I kept the spatial learning to a minimum (or even disabled) for this very reason. In theory over time the network should adapt itself, this is assuming though that spatial changes are very slow and infrequent so you have to be careful with your parameters.
As for more documentation, not much that I know off hand. There have been a few research papers done by various grad students mentioning HTM but nothing much more than what you have just read in my response. If others know of anything out there I would like to know as well .
Barry
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
In reference to your answer to #5, do you have any thoughts on motor neurons and their connected output bits? By this I mean, should the output bit be just another bit in the input grid that is instead activated by the column firing rather than vice versa and which would then be connected (randomly?) to the original sensory columns, or should the output bits be separate, with their own specialized columns which fire only on input from connected synapses? I think that at the very minimum, the output bits should also have input proximal dendrites which would then be able to be, for lack of a better phrase, "self aware", by which I mean able to recognize and model their own actions. If my output neurons fire, I want my input neurons to know it and model the results of those actions. Does this make any sense?
Last edit: Barry Matt 2013-03-23
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Yes I have thought about how motor/outputs might work. Your ideas are similar to what I have thought. I have considered that outputs might be connected similar to inputs. Just as we have an array/matrix of input bits, we may have an array/matrix of output bits. However, these output bits are actually motor cells, in other words when certain output bits are active it means certain motor actions are taken. Maybe some subset of bits indicate a particular eye movement or hand motion.
Going further you can imagine that if a particular eye movement is made, over time the HTM might come to expect/predict that its input will change in a predictable manner. If I move my hand in front of me I expect see my hand move via a visual input in a certain way.
The HTM is great at learning temporal sequences, so if the state of the motor out bits were involved in the sequence learning we could train the HTM to learn what it expects to happen when certain motor outputs are active. Over time it will know what will happen when certain motor actions are taken. Perhaps in the future when we "see" a doorknob the HTM will predict that a particular motor state (hand on door then turn) will result in a new input of seeing the door open.
This is a just an initial thought but maybe you get the idea. It does not seem a stretch at all to have the HTM sequences include motor outs that are connected to the system itself. The predictions can work essentially the same way. The motor out states can themselves be predictions for what to do/happen next.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Ok, based on that, let me ask you this follow up question. If you want to reward the synapses that lead to preferred outputs and inhibit the ones that lead to less desired outputs, how do you retroactively tease out cause and effect when the actual output that is responsible for the good/bad result may be many inputs or perhaps even many outputs in the past. For example, in my pet currency trading example, if you have an HTM making trades for you and it makes a particularly good one and then goes on to make several other mediocre or bad ones, how do you boost the synapses responsible for making the good choice? Or do you think is it enough to overall boost learning rates when things are going well and inhibit when things are going poorly using some sort of "happiness" multiplier?
Also, would the "output" phase be best accomplished at the same time as either the spatial or temporal learning phases, or as a separate phase altogether once both the spatial and temporal phases are complete?
Last edit: Ian 2013-03-26
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi Lauri,
I really do not understand why you want to implement your own version of HTM, while we are working on the same.
Can you explain it?
BR Binh
----- Ursprüngliche Nachricht -----
Von: Lauri
Gesendet: 14.03.13 10:42 Uhr
An: [openhtm:discussion]
Betreff: [openhtm:discussion] Few questions about HTM theory
Hi, I've recently found out about HTMs and they seem very interesting. I'm just an amateur programmer and I have no knowledge of statistical models but I'm determined to understand how this theory works :)
I've read the Numenta documentation and looked at your project and I started to write my own implementation of it to better understand this. I have some questions I need help with.
When initializing region we create proximal synapses to a random set of inputs for each column. Is it possible that with this randomness some input bits don't have any synapses connected to them and does it matter if so?
Numenta documentation says each column has a natural center over the input and synapses in the center have a higher permanence when initializing but it does not say by how much, why is it necessary and how I should calculate this?
When permanence of distal synapses falls to zero should we delete them? I don't see anything mentioned about this in the documentation.
Numenta documentation talks about Xth input bit but then it also talks about center of input. Is the input supposed to be a matrix or just a series of bits and does it matter which?
How big should a region be compared to input and how many regions are needed? I know that larger regions means more complex patterns and more regions means almost the same but is a region supposed to be larger than input or smaller and how many regions is needed to solve some usual problems?
When making predictions how do I know how far ahead in timesteps is the prediction? Also can I calculate the probability of the prediction based on past events?
How do I convert text or images to input that makes sense for region? With images I mean not black and white dots but let's say gray or even colored. With text do I have to give input one letter at a time (because the input size is fixed)?
I didn't see anything written about naming patterns in the documentation. How do I know which set of columns correspond to a pattern I want to know about? Do I have to tell the region that current input contains patterns "A", "B" and "C", then later I tell ok now this input contains "B" and "C" and it compares with previous input to find the common columns so it can narrow down which columns correspond to which names? If that is true then how would I do this and if I have multiple regions in layers which region do I tell this and which region should I later ask about it? Also is it possible with boosting and that stuff that columns that previously corresponded to "A" now correspond to something else or "A" is now different columns?
Is there a more thorough documentation than Numenta's on HTMs that I could read?
Thank you
Few questions about HTM theory
Sent from sourceforge.net because you indicated interest in https://sourceforge.net/p/openhtm/discussion/htm/
To unsubscribe from further messages, please visit https://sourceforge.net/auth/prefs/
Yes, if I write everything from scratch on my own then I can remember how it works better. I don't know if you have tried this before but for me if I read something I can remember a little bit but if I write it I can remember it a lot better. A good understanding of this theory is much more important than a good implementation for me. :)
I understand your point of view. For you is easier developing from scratch.
Anyway, we need developers, so if you feel confortable to HTM let us know in order to you share your knowledge as a developer member of our project.
Last edit: David Ragazzi 2013-03-15
Can you answer these questions, please?
Hi Lauri,
It looks like I will have to be the one to address your questions. OpenHTM was originated from a fork of my original HTM implementation back in 2011 so I guess I am one of the more experienced with the algorithm itself.
Based on your level of detail it appears you have been closely studying the Numenta documentation and other information which is great! Here are my thoughts on your questions:
1)
That's right. When using random it is possible some input bits may not have any connections. That is always a risk however generally speaking if you choose your parameters correctly the probability of this happening is extremely low. Usually columns in the network have very overlapping input fields, in other words the possible pool of input bits to connect to a column is highly overlapping with many nearby columns. This means each input bit will usually be connected to many columns. The odds of a given input bit connecting to nothing are so low it can be assumed it won't happen. However, even if some have a few bits that get ignored by the network it is generally not a big deal due to the nature of the HTM considering aggregate sets of bits during its processing. That means that any one bit does not have much influence on the results.
2)
First of all this is merely an option, HTM can be configured to have columns connect to bits from anywhere. However for this situation try to think of vision as an example. If an HTM were to process a 2D image, try to think of the HTM itself as another 2D matrix of columns. If you were to overlay the HTM column matrix on top of the 2D image, then it may make sense for each column to then connect mostly to the input pixels that are nearest to the column via the overlay. This is what "natural center" means. Each column has a natural position in the input if the HTM network were overlayed on top of the input. Again during initialization we might want the input pixels closest to each column be given a higher weight, as a sort of way to indicate that "if this column is active it probably means pixels in a close radius to this column have high activity". As for how much to bias these weights, that is another parameter and Numenta does not have a good answer because it depends largely on the data in question and more important, it is still an open research topic. Nobody is quite sure yet.
3)
I have thought about doing this too. The Numenta doc does not say anything about deletion. Instead it suggests synapses exist forever, even if permanence stays at zero for a long time. With that said, there have been discussions about performing a delete after some threshold (zero for x time steps) is reached. However this is still very much in the experimental phase and nobody is quite sure what the behavior should really be here.
4)
Related to your question #2. The input only need be an array of bits. In the case of images/vision however it is helpful to consider the input (and HTM region) as layed out in a 2d matrix. Like I mentioned in #2, this would then establish natural center locations of each column when overlayed on the image matrix input. However in general the input is just an array of bits, and the HTM network itself can be just an array of columns. The "shape" of the input or the network is only relevant as a way to define the initial connections of columns to input bits. If the input is unknown, then columns can connect to any random input bits and the HTM can still pick up sequences as the input changes, no need for natural center or anything in that case.
5)
This is another on-going question. Generally speaking however the region size should be a smaller than the size of the input. For example if the input is 100x100 then a typical first region might have 50x50 columns. Yes this means that there exist a set of input values that will all map to a single region state. Most of the time though this is generally what we want. Those set of input values that map to one state would all be seen as extremely similar inputs, to the point that for the purpose of sequences they might as well be the same. Similar to how a human might see a set of images that are extremely similar be interpreted as exactly the same, at least in terms of what I predict it is associated with or what might happen next.
In regards to multiple regions, this is still very unclear how it is intended to effectively work. I believe even Numenta's Grok is only using a single big region rather than multiple (at least as far as I can tell). There are a lot of issues around the behavior of multiple connected regions, it deserves it's own thread topic for discussion. For now just know that most experiments have been done with single region only.
6)
Ah good question. When segments are added they are tagged with their time step prediction value. Meaning, when the HTM is learning, it can say "when this segment fires, it means the cell it is activating is being predicted to happen in 6 time steps". So then if a bunch of cells are predicted we know for how many time steps away based on which segments were the ones that caused the activation to begin with. There exist "1 time step segments", "2 time step segments", "3 time step segments", … "n time step segments". In the Numenta doc when you see "sequence segments" they are talking about the "1 time step segments" or those that are predicting cells to happen in the very next time step.
7)
Another good question. Unfortunately this one is still very open. There are a lot of ideas floating around for ways to do this but nothing so far has really stood out as a particularly effectively solution. Even Numenta has not solved this one, they have instead been focusing Grok to run on much simpler structured data. Things as simple as temperature readings over time, utility usage over time, state of machinary over time, etc. These inputs are much easier to conver to appropriate HTM bit representations. Things like text and images are much richer, in that a single input can potentially represent a quite complex item. In order for the HTM to pick up proper sequences we need proper representations and so far with text and images we generally do not know how to effectively do this. Lots of working theories and experiments, but the discussion is on-going.
8)
Another very often asked question. Put another way: given a set of active or predicted columns how can I reconstruct the original input that corresponds to this? The answer so far is, you can't really. Or at least there is nothing in the Numenta doc that defines how. Part of the problem here is first of all if you have a region that is smaller than your input (as mentioned in #5) then some information was lost on the way in. Since many inputs can map to a single region state it is then imposssible to know exactly what the input would be. Instead the best we can do is an approximation, which for most purposes is usually enough. However the approach even for reconstructing the approximation is not well defined. One way might be to take the strongest connected input bits from each of the active columns and keep only the top x% to result in a typical amount of active inputs bits per input. Others have tried another completely separate approach where they actually store a full mapping of input bits to active columns (and reverse) so they can always go back. This requires a lot of extra memory to hold this reverse mapping for all inputs so for larger (non-toy) projects this is not an ideal solution.
Finally yes I realized this about boosting as well. You are right boosting can result in a situation where "A" once activated certain columns can later end up activating others. This is bad if it happens because it would break all the existing learned temporal sequences. For a lot of my early experiments I kept the spatial learning to a minimum (or even disabled) for this very reason. In theory over time the network should adapt itself, this is assuming though that spatial changes are very slow and infrequent so you have to be careful with your parameters.
As for more documentation, not much that I know off hand. There have been a few research papers done by various grad students mentioning HTM but nothing much more than what you have just read in my response. If others know of anything out there I would like to know as well .
Barry
Thank you, your answers were clear and helpful. :)
Barry is just THE man..
Barry,
In reference to your answer to #5, do you have any thoughts on motor neurons and their connected output bits? By this I mean, should the output bit be just another bit in the input grid that is instead activated by the column firing rather than vice versa and which would then be connected (randomly?) to the original sensory columns, or should the output bits be separate, with their own specialized columns which fire only on input from connected synapses? I think that at the very minimum, the output bits should also have input proximal dendrites which would then be able to be, for lack of a better phrase, "self aware", by which I mean able to recognize and model their own actions. If my output neurons fire, I want my input neurons to know it and model the results of those actions. Does this make any sense?
Last edit: Barry Matt 2013-03-23
Yes I have thought about how motor/outputs might work. Your ideas are similar to what I have thought. I have considered that outputs might be connected similar to inputs. Just as we have an array/matrix of input bits, we may have an array/matrix of output bits. However, these output bits are actually motor cells, in other words when certain output bits are active it means certain motor actions are taken. Maybe some subset of bits indicate a particular eye movement or hand motion.
Going further you can imagine that if a particular eye movement is made, over time the HTM might come to expect/predict that its input will change in a predictable manner. If I move my hand in front of me I expect see my hand move via a visual input in a certain way.
The HTM is great at learning temporal sequences, so if the state of the motor out bits were involved in the sequence learning we could train the HTM to learn what it expects to happen when certain motor outputs are active. Over time it will know what will happen when certain motor actions are taken. Perhaps in the future when we "see" a doorknob the HTM will predict that a particular motor state (hand on door then turn) will result in a new input of seeing the door open.
This is a just an initial thought but maybe you get the idea. It does not seem a stretch at all to have the HTM sequences include motor outs that are connected to the system itself. The predictions can work essentially the same way. The motor out states can themselves be predictions for what to do/happen next.
Ok, based on that, let me ask you this follow up question. If you want to reward the synapses that lead to preferred outputs and inhibit the ones that lead to less desired outputs, how do you retroactively tease out cause and effect when the actual output that is responsible for the good/bad result may be many inputs or perhaps even many outputs in the past. For example, in my pet currency trading example, if you have an HTM making trades for you and it makes a particularly good one and then goes on to make several other mediocre or bad ones, how do you boost the synapses responsible for making the good choice? Or do you think is it enough to overall boost learning rates when things are going well and inhibit when things are going poorly using some sort of "happiness" multiplier?
Also, would the "output" phase be best accomplished at the same time as either the spatial or temporal learning phases, or as a separate phase altogether once both the spatial and temporal phases are complete?
Last edit: Ian 2013-03-26