From: James B. <jab...@gm...> - 2018-05-21 17:47:30
|
On Mon, May 21, 2018 at 9:37 AM, Richard Fateman <fa...@be...> wrote: > ... > It is hard to know how to interpret Marvin Minsky's statement > (and unfortunately, not possible to ask him ...) but I read it > as having to do with Machine Learning (ML), where one is > given a "collection of experiences" and wants to "make predictions". > I think he is saying that CSK stuff, appropriately approximated > would be better than what goes on today. > If I may speak with some priority, if not authority, as the originator of the idea for the Hutter Prize for Lossless Compression of Human Knowledge <http://prize.hutter1.net/#history> (and as one of the people who walked out when Minsky took the stage to give the plenary speech at IEEE's second ICNN): Machine Learning is best viewed, to a first approximation, as lossless compression to achieve unsupervised learning. What Minsky is saying that although one cannot prove one has found the optimal compression of a given set of "observations" (data set), one can obviously compare degrees of compression with a single parameter: Size. What is the connection between compression and simplification? To ask the question is to answer it but for one parameter: Choice of Universal Turing Machine. As you point out the UTM can be Maxima itself. However, as you (and "Understanding Expression Simplification") point out, that choice leads to questions such as "Which is simpler, 9^9... or 387420489?" However to characterize such questions as indicating the "irrelevance" to of framing CAS terms of compression (even when bolstered by Kolmogorov Complexity's incomputability) is throwing the baby out with the bathwater. Minsky (and others have previously) dealt with the incomputability argument for "irrelevance". The other argument(s) for "irrelevance will be dealt with presently. Let me break down "compression of observations" in response to Minsky's expert systems approach: Marvin Minsky provided some of the impetus to build Macsyma, > the predecessor to Maxima. The argument being that if we > built enough experts (Macsyma being an expert on math -- a > better-defined domain than, say, literature) we could merge them > via some meta-expert into a more general intelligence. > Essays of later work (Society of Minds) push further in > various directions. I don't know how Minsky's thinking evolved > to end up promoting Algorithmic Probability, and I > may even be mischaracterizing it as a kind of ML, > since I have not studied it. > In a CAS, there are two levels of "observation": 1) Treating an expression as an "observation". 2) Observing what experts do to simplify expressions. In the "9^9 vs 387420489" question, we are dealing with #1 which really just boils down to the question "Can the choice of UTM be treated as arbitrary in practice?" The answer is, "It can't." Science is a low-context culture <https://en.wikipedia.org/wiki/High-context_and_low-context_cultures>. The choice of UTM for science imposes minimal context for its specification. Think about the video disc sent out with the Voyage spacecraft to communicate with alien life forms. In this low-context, the smallest description of a UTM would be something like the axioms of set theory. This leads to the (presently vague) notion of a "natural UTM" which has the simplest specification in a low context culture. Now, before you jump to the conclusion that this implies "387420489" is simpler than "9^9" because the algorithm for exponentiation needn't be included in the program outputting those digits, keep in mind that one expression is only one observation. With each new expression (say "10000000000") the totality of "observations" demands a potentially different optimal lossless compression. The length of "387420489 10000000000" may _still_ be inadequate to justify inclusion of the exponentiation algorithm, but at some point, it may well be justified. One can argue that these additional observations amount to "increasing context" and hence takes us ever further from the notion of a "natural UTM" and you would be exactly right! All of these additional "observations" (expressions) might be thought of as the "use cases" of Maxima -- and it is the job of "experts" to treat all of these observations as the practical heuristic in choice of the UTM in the sense that they are working to achieve maximal compression of all expressions ever to challenge Maxima's simplification. So, for instance, rather than blindly iterating through all expressions to find which are both smaller than the original and equal to the original, there are obvious transforms that one can apply to an expression to produce expressions that are _known_ to be equal to the original. This is known as "algebra". Moreover, rather than assuming that the space of expression complexity is convex (ie: to get to the optimum, all you have to do is take steps, each of which decreases the size) one can apply known techniques to escape from local minima by temporarily increasing the length of the expression. This is not only what humans do, it is what a wide array of search techniques do, such as adaptive simulated annealing, genetic programming, symbolic regression, etc. Now, for the coup de grâce: #2 (Observing what experts do to simplify expressions.) relates directly to practical efforts in expertise acquisition, most fabulously demonstrated by AlphaGo by Google' DeepMind's team (co-founded by Hutter's PhD students). For decades, Go has been considered uncrackable by machine learning because it is so "intuitive". Although they were able to achieve world-class expertise by observing games by world-class experts -- and that would be adequate for Maxima -- AlphaGo Zero, was able to master the game of Go without human knowledge <https://www.nature.com/articles/nature24270>. Treat algebra (and calculus) as a "game" with "rules" that require "intuition" of "experts" and there may be something done -- even if not to the degree achieved by AlphaGo Zero. > > > "It seems to me that the most important discovery since Gödel was the > discovery by Chaitin, Solomonoff and Kolmogorov of the concept called > Algorithmic Probability which is a fundamental new theory of how to make > predictions given a collection of experiences and this is a beautiful > theory, everybody should learn it, but it’s got one problem, that is, that > you cannot actually calculate what this theory predicts because it is too > hard, it requires an infinite amount of work. However, it should be > possible to make practical approximations to the Chaitin, Kolmogorov, > Solomonoff theory that would make better predictions than anything we have > today. Everybody should learn all about that and spend the rest of their > lives working on it." > > Marvin Minsky > Panel discussion on The Limits of Understanding > World Science Festival, NYC, Dec 14, 2014 > > > > |