Menu

Stripped-out compileable Nupic under VS2012 from 5.June.2013

Itay
2013-06-08
2013-07-04
  • Itay

    Itay - 2013-06-08

    Hi,
    I managed to strip-out C++ nupic external components that are used for testing, intercommunication with python, or high level interface with files, to expose the internal temporal and spatial poolers.
    We can use this version to experiment and even expose the temporal and spatial poolers as DLLs to C# for further testings.
    I provide a file with the stripped out version (downloaded from Nupic master repository from date 05.June.2013), compileable under visual studio 2012 with main.cpp that shows the activation of Nupic's temporal pooler.
    The file is 30MB in size. I will see how I can deliver it (via the forum or via downloads), stay tuned.

     
  • Itay

    Itay - 2013-06-08
     
    • David Ragazzi

      David Ragazzi - 2013-06-08

      Hi Itay,

      This is very interesting. I wonder how you managed this? Do you some automated tool or was just by hand?

      You could post this in NuPIC mail list.

      David

       
  • Itay

    Itay - 2013-06-08

    By hand, I worked my way from the external components inwards. At first I tried to include the "apr" apache library, regex, and so forth, but I discovered it's just too much work because it depends too much on unix, and I don't have knowledge about those libraries. It took several hours, but I guess that if you start with the "algorithms" folder and only try to compile that, it shouldn't take more than half an hour to manage to compile it.

    There seems to be many questionable code files. for an example, what is "Grouper" and why does it claim to be a "temporal pooler"? seems to be the main class for the region implementation. and on the other hand, no c++ files are using "Cells4" (which should be the main temporal pooler..). I don't know what's happening at the python counterpart.

     
  • JRowe

    JRowe - 2013-06-10

    I'm not very familiar with c++ but I do thank you for your efforts! I got the project to compile, and I'm looking through the code now. It looks like it might be very easy to create a wrapper function in dll form that makes full use of the library.

    It looks like output is initialized by the calling program and then filled in later on by "compute", so the main things to consider once you build your region are the input and output arrays. It looks like all dimensionality is handled implicitly, so you simply update input, compute, retrieve output, and repeat.

    I was looking through the segmentUpdate portions and found this kinda funny:

    // TODO: The following logic seems really convoluted. Why can't we
    // increment/decrement inline here? Also, why can't we grow potential new
    // synapses instead of creating a list of all synapses and then erasing
    // them from this list? Seems like a lot of extra memory thrashing.
    // Also, why don't we just send a list of synapse indices into
    // updateSynapses? It would be cleaner and avoid the need to keep the
    // synapses sorted.

    I'm glad I'm not the only person who found the segment update part of the algorithm really convoluted.

     
  • JRowe

    JRowe - 2013-06-15

    So you guys seem to have dropped off the face of the earth; Are you all hacking on the new NuPic system?

     
  • Doug King

    Doug King - 2013-06-15

    I can only speak for myself, but would like to hear from others:

    I think we are all very interested in the NuPic code. I think we are struggling, to varying degrees with the NuPic code - it is convoluted, not documented well and written in a combination of Python and C++. But it works (inside of Numenta), and that is the most important thing.

    We are following the discussions and some of it is shedding light on how to get results with the CLA. I believe our code in openHTM is in good shape to get results - we were stuck for a while but I think we underestimated how important tuning parameters were and also encoding the input into sparse representation, and creating inputs that are well suited to CLA processing - for example, a repeating sine wave (or bouncing ball) is not the best test input. See the NuPic post by Jeff on this.

    So I believe that openHTM could be a parallel platform going forward, with a bit more work. There is also the possibility of using the NuPic core engine written in C++ and wiring that into our nice IDE/experimental platform - that would also give us a full GPL open source licence, not the 'research only' version we have now.

    The question is, why do that, what would we gain by another project, a fork of NuPic, and is it worth the energy. I think the NuPic community will pick up speed and big gains will be made in one year overshadowing anything we can do here. If key members here move to NuPic then openHTM may be a dead end. That would be a shame because it is a nice clean implementation of the CLA with a UI that makes experimenting and understanding of the CLA easier than other code I have seen. Other points are common API for loading and saving experiment data, common input format, prediction visualization, parallel CPU code for performance, and more.

    For all of us here, openHTM has provided a great learning opportunity with some very smart people working at understanding CLA without the support of Numenta. It is also possible that Numenta was looking at this project and came to the realization that open sourcing NuPic was a smart move - you can see what we accomplished here in a few months. Google is doing their own Deep Belief Network with Ray Kurzwil, Geoffery Hinton and Andrew Ng - along with some of their brightest, with an unlimited pool or resources (server farms, and money) so Numenta will have to compete in this area with open source - it's the right move for them.

    I would like to hear further thoughts form anyone in openHTM community about this.

     
  • Itay

    Itay - 2013-06-15

    Hi
    I don't know what other OpenHTM members are doing. I didn't hear anything from Uwe, Barry, or Nick. I know Michael Ferrier is busy with his own experiments. and David is also struggling with the Nupic.
    Indeed the progress of OpenHTM came to a halt. But what can I do about this? it's really a matter of setting objectives and cooperation and deciding on a timeline. Something we lacked in the last month. As long as we don't cooperate together on this problem, then we will continue to struggle and won't have much advancement.
    In the recent days I have struggled on getting the FDRCSpatial class (Spatial pooler) to run. It seems that this class is bugged. and without proper examples of using it I don't think I can get it to work. I don't know what is FDRSpatial and what's the difference between FDRSpatial and FDRCSpatial. Maybe FDRSpatial works well but It looks like FDRCSpatial is the more updated one. anyways, instead I must trust on the python version. I don't know python (at all!) and getting the FDRCSpatial python version to work can be really problematic for me and a little project just doing this.
    I wish I had something better to say but I think I lost my enthusiasm (temporarly) because I feel like I'm the only one that have worked in the last month on the OpenHTM project substantially and I don't like fighting these issues alone

     
    • Doug King

      Doug King - 2013-06-15

      Thanks for the honesty. I would like to hear from more members and all who have contributed ideas and code about their feelings and thoughts.

      Itay, you have contributed greatly since you stepped up and joined. I can only speak for myself, but I was intimidated by the CLA - not the code, but ways to debug and understand what was going on and what should be happening. For all I know, our code is good right now, but we may be feeding it poor tests, or we may have poor parameter tuning, or both. See Jeff's posts on this topic. Numenta has spent a considerable amount of time on sparse encoding of input, and on tuning of CLA parameters. They even automated tuning with Particle Swarm Optimization to help speed things up. This requires running hundreds or thousands of tests on a certain problem set before you can see good results from parameter tuning alone. I believe they use multiple Amazon Compute instances to speed up the optimization when tuning.

      I had many thoughts and theories on what was the problem with our implementation, some of it centered around tuning settings, and the input encoding of the problem. I did not want to cause confusion among those that were discussing the hard-core CLA code and details - I thought it was important that you guys stepped through the logic without being sidetracked by (my) other ideas. You did find a number of issues in the code and I think also came away with a better understanding of the algorithm.

      Progress came to a halt due to several things, in my opinion - difficulty getting results with our test input (bouncing ball, AABAAB, etc.), Numenta announcement of open source NuPic, and a lack of focus and direction on our project, probably due to waiting on the 'unknown bugs' in the CLA to be solved.

      I still believe we have the makings of a good code base and good team, but lack of enthusiasm because of the hard issues of the CLA and no solid direction. We were thinking that the CLA white paper and algorithm may be incomplete and we were just guessing, and it's hard to move forward under those conditions. Now that Numenta have open sourced, I think we can see we were not far off, but we were missing the experimental part of the equation. We underestimated how much time and effort need to go into creating the input and running many iteration of learning before getting any idea what is going on. Stepping through simple examples was only enough to find code bugs.

      So in summary, unless there is some passion still about making openHTM a working platform and some solid direction among members about what openHTM will become I don't think we will have much of a chance. I would like to see a .NET version of CLA - or a C++ core and .NET IDE around it. I think there are many .NET programmers out there that would appreciate such a version. I would think it would need to stay in sync with the NuPic core, NuPic API for sharing data and settings (assuming NuPic group creates a good API - we may be able to influence that part of NuPic project). The big question is, do we put our efforts here of in with the NuPic project.

      David is / was passionate and a driving force on this project - I would like to hear what he has to say.

       
  • David Ragazzi

    David Ragazzi - 2013-06-16

    Hi guys,

    First all, let's go the facts:

    1. OpenHTM didn't die, we simply pause a little the project since Numenta releases open-source NuPIC and we were anxious to finally know its code.
    2. OpenHTM is not and never was a NuPIC rival. Quite the contrary, it is an alternative and a starting point for those wishing to know CLA since Numenta closed its plans.

    This said, as NuPIC is open-source now, I believe OpenHTM fulfilled nicely this initial role, and now it's time to evolve. It's time we stop see OpenHTM as a NuPIC clone, but as a project that could bring new and emergent ideas beyond those we already know.

    In these months I worked with OpenHTM, I knew a lot of approachs similar to HTM but fully independent. Among these:

    • O'Reilly & Munakata (University of Colorado);
    • Deep Belief Network (Ray Kurzwil, Geoffery Hinton and Andrew Ng)

    I remember very well that Jeff got very curious about Mike thesis which was do a mix of Jeff's theory with an another approach similar to HTM (Dr. Frank from Colorado? I don't remember..)

    So I envision OpenHTM as an online lab where we could gather and test the best approaches in a single solution.

    OpenHTM could be a research reference as it gives free and easy access to those that wish know and simulate cortex intelligence. In OpenHTM anyone is welcome to contribute to the major objective (i.e. understand the brain) without depend on the will or "blessing" from a company or university. We are free, and always will be free to implement our own ideas. No one pay our salary, we are here because we like.

    So I'm fully favor of a solution that aggregate the best from all current approachs in AI no matter if they comes from ourselves, Numenta, Colorado, Google, whatever.

    OpenHTM should be aproject that aggregate the best and successful ideas around AI research.

    David
    PS: I just finished my own implementation due my dissertation and this week I will present some ideas.

     

    Last edit: David Ragazzi 2013-06-16
    • Barry Matt

      Barry Matt - 2013-06-16

      I agree with David, especially about our direction from here on out.

      I feel our direction has been very unclear recently. What we "should" be working on is not obvious. I mean yes "try to make the algorithm better" or at least show more conclusive results, but this is still very vague. What kind of results are we expecting exactly? What sort of input is HTM expected to work well with? What kind of predictions do we expect (exactly right, or mostly right to account for noise)?

      There are lots of little details that we are not sure what exactly it is that we think the results should look like for any given input. Or what inputs the HTM is best able to accept.

      So a lot of our work has been much more of a research approach. Where the objectives are not well-defined, but rather "what is the best we can do with this type of cortex-based algorithm"? That sort of task is not something that a given person can be "assigned to" and marked as fixed when done. Instead it is a much longer-term task that will not have any given exact finish.

      Now that NuPIC code is released we have been studying it recently and if you have looked at it so far you will notice it is not exactly the most well-organized or well-documented code around. Just understanding what is going on in their code will itself take some time and of course that means less activity in OpenHTM.

      With that said, going back to what David said, I agree with his response. I think from here on out OpenHTM should start forking in its own unique direction. Yes what we have so far is based on the original HTM algorithm document, but I would like to see if we can take it some steps further. I know some may ask why go further if we are unclear about HTM as it is, well part of our task to me at least is to ask "what can we do from here to make this project more useful to us"? What should we change so it will work more like we might expect. I don't think we need to mirror NuPIC, especially not now that its code is released.

      I feel OpenHTM should go forward as a research platform more than a real-world end product. Perhaps real-world uses will come in time, but we are certainly not there now.

      I would like to try some new experiments with other ideas in computational neuroscience, like some of the projects David mentioned. I already have a few ideas (based on a few papers/texts others have linked in this forum) to try a very simple approach for integrating a basic form of motor control given prediction states of the HTM. This is of course highly experimental, but that is exactly what I think OpenHTM should be, at least for the time being. We have very nice visualization tools that would be perfect in aiding experimental approaches, let's put them to good use.

      Finally, if some of our work overlaps with NuPIC we can consider submitting the occasional pull-request to the NuPIC project based on successful experiments. Of course this is only for work that is compatible with NuPIC, which much of our work going forward will likely not be. However I would like to keep communication with the NuPIC project open, sharing ideas whenever possible. It would be even better if we could eventually provide a .NET wrapper that calls into the NuPIC C++ libraries, perhaps we could even use our own visualization tools if it makes sense.

      I will not be leaving OpenHTM anytime soon! Brain research is a task for the long term, and I hope we can use OpenHTM to continue working together on future experiments and have discussions about the latest neuroscience work.

       
      • Doug King

        Doug King - 2013-06-17

        Thanks David and Barry and Itay. So there is still passion for openHTM - :-).

        I believe we are on similar tracks with our thinking about this project. Maybe we need a project reset in a few weeks to set direction, project goals and improve our project process. In the meantime we can absorb the NuPic code and take time out to think about our personal goals as they relate to this technology.

        I also believe in openHTM as a clean platform for experimentation. The strength of openHTM are:
        Easy to understand, well documented code
        Strong IDE and visual support for understanding connections and experimenting.
        Easy to implement new GUI features in Windows
        Gives us the ability to try out theories and features independent of NuPic (motor control, etc.)

        So I agree - OpenHTM should start forking in its own unique direction. I too would like to move forward with openHTM if we can focus our energy on clear goals. I don't plan on doing any NuPic code development, but I will be following NuPic code and architecture, and I will be trying to use the NuPic platform to experiment with real world data sets when it is ready. I will also be interested in ways of encoding data and finding an efficient methodology for tuning parameters which should apply to both openHTM and NuPic. Much of what we do will cross over to both implementations.

        I do think openHTM can be more than just an experimental platform. If done correctly the core engine should be performant enough to use for real-world applications. New directions such as trying out motor control, GPU acceleration, heiraracy etc. could be done in ways that will not impact the core until they are proven out. The goal would be to keep the core stable and performant and build out the core as ideas are proven to work. I would like to see openHTM stay compatible with NuPic formats and core engine data structure if possible. If it is not a lot of work on our part to be compatible, and NuPic API's make sense, we should try to be compatible. But if openHTM needs to implement new ideas that are not compatible, we shouldn't force it to be compatible. NuPic currently saves trained networks by a python serializer (pickeling) that is not something I think we want to do, so we may have to forget about compatibility :-( or get NuPic to adopt a better approach :-). Anyway, these are the types of things we would want to discuss as a group if we reset our project goals.

        So onward with openHTM. I will wait to hear more opinion (Nick? Uwe? others) and take some time to check out NuPic code, Davids new code, play with current openHTM to see if I can get better results from parameter tuning and other datasets and hopefully hear from you guys how you want to move forward.

        Cheers,

        Doug

         
  • David Ragazzi

    David Ragazzi - 2013-06-28

    Hi guys,

    It´s very good to know we all wish new purposes to openHTM! (although now maybe is interesting we change the name for a non-specific tecnology like openCortex or other)

    Did you last changes? Now openHTM has a IDE like Visual Studio which one can ajust its layout according his preference.

    Another tool was added was a treenode control which allow users choose a region or sensor and then: either configure it (when simulation is off) or then view its details (when simulation is on).

    The input for a higher region will be composed by output of all its lower regions and sensors. For example a higher region with 3 lower regions and 1 sensor will have its input composed by:
    -25% - lower region R1.1
    -25% - lower region R1.2
    -25% - lower region R1.3
    -25% - lower region S1.1

    Note that PercentageInputPerColumn property give to CLA the skills to find coincidences betwen 2 regions and predicts better when ambiguity happens (find coincidences is the famous belief propagation concept in bayesian networks). This happens because a column can receive inputs of more than a lower region. Uwe posted an example of this a time ago.

    I believe this also can be useful to decide which is the better prediction to a sequence (when temporal context forking happens, for example). Even Grok don't implement hierarchy and Jeff already said that they don't think much about (in other words) about temporal context forking. So if we do it, we will make a good contribution. Futhermore I think hierarchy is a powerful feature to CLA solve many problems and increase extensively the number of problems in real world that HTM can address.

    About refactoring, I have 2 questions before I continue:

    • LocalityRadius: since this (as far I understand) limit the area for columns form synapses to input, is it this really need to point increase complexity in code? Could this be an obstacle to handle repeated similar input since it prevent columns of choose other bits in input?
    • NumberPredictionSteps: could we simplify this code using state machines? MaxPredictionSteps can be the Global.T (context size of machine state) among other simplifications.

    Best, David

     
  • Uwe Kirschenmann

    Hello! I was off for four weeks (diving in bali) and i am now back to read and see what happened. i also checked out out numentas code but so far do have my problems with that (at first it did not compile and build at all, now at least the first experiment is running). was anybody able to get this whole thing to work and compile as a project? i am not experienced with c++ in linux, so i guess it is not easy. i welcome any tips and ideas here! from a first look at the documentation i just can agree with the other commments.

    openHTM: i strongly agree that we carry on with the project as david mentioned and, according to doug and barry, also make use of numentas code. itay extracted that already, when i got it right. but i also have to agree, that we were a little unfocussed. this is, as barry mentioned, also due to the nature of the project. i, for example, started experimenting with the CLA setup pretty early and tried to adjust parameters but was always unsure of how to contribute more in coding as there were many construction sites - i did not want to commit nonsense. as a consequence (if i was not the only one who did that), integration got lost. i of course would like to get more concrete in helping, but wonder how to do that (apart from writing papers :-)).
    My opinion is that we have a very good and COMMITED team, but should find ways of structuring ourselves more. For ex.
    David, Nick and Itay are very good developers, always needing more stuff to have their hands on
    Barry is very logical and good thinker and implementer, very exact.
    Doug is excellent in having an overview of and integrating different perspectives. David is very good in organizinig etc. couldn't we use that knowledge?

     
    • Itay

      Itay - 2013-07-04

      Hi, welcome back..
      I think we really need someone who knows python or can help with the python in order to extract the algorithms and run them.
      You said you could run the first experiment. can you bring a tutorial on how to run the functions in python? is this easy or hard?

       
      • David Ragazzi

        David Ragazzi - 2013-07-04

        Uwe:
        Welcome back! Since that refactoring is done and Numenta release its code. We need theoretical knowledge in order to discuss related questions like hierarchy and others. I will present some ideas (about project and team organization) to discuss.

        Itay:
        I could convert Python code to C#, however I don't know enough the Numenta c++ code in order to extract the code. If someone is able to do this, I can convert the code.

         

Log in to post a comment.

MongoDB Logo MongoDB