|
From: James B. <jab...@gm...> - 2018-05-24 01:41:32
|
There are bound to be problems for which the approach won't be helpful. However, I find it hard to believe it wouldn't be a useful addition to the toolbox and expand Maxima's accessibility, and utility -- perhaps quite a bit. Go's 19x19 board permits 3 states per position resulting in the equivalent of 572 bits of information. A mathematical expression using 6 bit characters could be up to 95 characters long and fit within the same state space. In Go, a "move" is the placement of a stone on one of the 361 positions. In algebraic simplification, a "move" applies a known identity -- let's say there are 361 algebraic identities permitted. In Go, you win by maximizing the number of tiles. In algebraic simplification you win by minimizing the number of bits in the expression. AlphaGo learned how to play by observing records of games played. A similar database could be compiled by a company like Wolfram -- recording what experts do to simplify various expressions. Since Champaign Urbana was my stomping ground back in the halcyon days of PLATO, maybe I'll take it to one of my mutual acquaintances with Steve Wolfram. On Tue, May 22, 2018 at 3:05 PM, Stavros Macrakis (Σταῦρος Μακράκης) < mac...@al...> wrote: > Nice theoretical framework. But not clear how it solves practical problems. > > Consider the inequation 0 ≤ P, where P is some multivariate polynomial. It > can be simplified to True when P is always non-negative or False when P is > always negative (which is awfully well compressed, I trust you agree?). > There is even an algorithm. Unfortunately, the problem is NP-hard. I don't > see how ML is going to help there, and that is about as trivial a case as I > can imagine. > > As for exploring sequences of algebraic transformations, yes, that can > sometimes be useful. Attached please find some simple code that finds all > the fixed points of repeated application of a list of functions to an > expression. Obviously there is a lot of room for being cleverer about the > search strategy than doing an exhaustive enumeration. There is also a lot > of room for additional kinds of manipulation. Still, it can occasionally be > useful on small expressions. > > Now consider, for example, x^2+3*x*y+y^2 (13 chars). That can be expressed > in fewer characters through an elementary sequence of algebraic > manipulations: x^2+3*x*y+y^2 = (x^2+2*x*y+y^2)+x*y = (x+y)^2+x*y (11 > chars). Is a search going to find that by splitting up the coefficient 3 in > different ways?; how will that scale to larger expressions? > > Now consider 21*x^6-63*x^5+105*x^4-101*x^3+87*x^2+27*x+21 (44 > characters); I can express that in 28 characters, and finding that solution > isn't too hard, but I'll leave it as an exercise for the reader. The point > is that even compressing a very very simple case like a univariate > polynomial is not trivial. > > -s > > On Mon, May 21, 2018 at 1:46 PM, James Bowery <jab...@gm...> wrote: > >> >> >> On Mon, May 21, 2018 at 9:37 AM, Richard Fateman <fa...@be...> >> wrote: >> >>> ... >>> It is hard to know how to interpret Marvin Minsky's statement >>> (and unfortunately, not possible to ask him ...) but I read it >>> as having to do with Machine Learning (ML), where one is >>> given a "collection of experiences" and wants to "make predictions". >>> I think he is saying that CSK stuff, appropriately approximated >>> would be better than what goes on today. >>> >> >> If I may speak with some priority, if not authority, as the originator >> of the idea for the Hutter Prize for Lossless Compression of Human Knowledge >> <http://prize.hutter1.net/#history> (and as one of the people who walked >> out when Minsky took the stage to give the plenary speech at IEEE's second >> ICNN): >> >> Machine Learning is best viewed, to a first approximation, as lossless >> compression to achieve unsupervised learning. What Minsky is saying that >> although one cannot prove one has found the optimal compression of a given >> set of "observations" (data set), one can obviously compare degrees of >> compression with a single parameter: Size. >> >> What is the connection between compression and simplification? >> >> To ask the question is to answer it but for one parameter: Choice of >> Universal Turing Machine. >> >> As you point out the UTM can be Maxima itself. However, as you (and >> "Understanding Expression Simplification") point out, that choice leads to >> questions such as "Which is simpler, 9^9... or 387420489?" However to >> characterize such questions as indicating the "irrelevance" to of framing >> CAS terms of compression (even when bolstered by Kolmogorov Complexity's >> incomputability) is throwing the baby out with the bathwater. Minsky (and >> others have previously) dealt with the incomputability argument for >> "irrelevance". The other argument(s) for "irrelevance will be dealt with >> presently. >> >> Let me break down "compression of observations" in response to Minsky's >> expert systems approach: >> >> Marvin Minsky provided some of the impetus to build Macsyma, >>> the predecessor to Maxima. The argument being that if we >>> built enough experts (Macsyma being an expert on math -- a >>> better-defined domain than, say, literature) we could merge them >>> via some meta-expert into a more general intelligence. >>> Essays of later work (Society of Minds) push further in >>> various directions. I don't know how Minsky's thinking evolved >>> to end up promoting Algorithmic Probability, and I >>> may even be mischaracterizing it as a kind of ML, >>> since I have not studied it. >>> >> >> In a CAS, there are two levels of "observation": >> >> 1) Treating an expression as an "observation". >> 2) Observing what experts do to simplify expressions. >> >> In the "9^9 vs 387420489" question, we are dealing with #1 which really >> just boils down to the question "Can the choice of UTM be treated as >> arbitrary in practice?" >> >> The answer is, "It can't." >> >> Science is a low-context culture >> <https://en.wikipedia.org/wiki/High-context_and_low-context_cultures>. >> The choice of UTM for science imposes minimal context for its >> specification. Think about the video disc sent out with the Voyage >> spacecraft to communicate with alien life forms. In this low-context, the >> smallest description of a UTM would be something like the axioms of set >> theory. This leads to the (presently vague) notion of a "natural UTM" which >> has the simplest specification in a low context culture. >> >> Now, before you jump to the conclusion that this implies "387420489" is >> simpler than "9^9" because the algorithm for exponentiation needn't be >> included in the program outputting those digits, keep in mind that one >> expression is only one observation. With each new expression (say >> "10000000000") the totality of "observations" demands a potentially >> different optimal lossless compression. The length of "387420489 10000000000" >> may _still_ be inadequate to justify inclusion of the exponentiation >> algorithm, but at some point, it may well be justified. >> >> One can argue that these additional observations amount to "increasing >> context" and hence takes us ever further from the notion of a "natural UTM" >> and you would be exactly right! All of these additional "observations" >> (expressions) might be thought of as the "use cases" of Maxima -- and it is >> the job of "experts" to treat all of these observations as the practical >> heuristic in choice of the UTM in the sense that they are working to >> achieve maximal compression of all expressions ever to challenge Maxima's >> simplification. >> >> So, for instance, rather than blindly iterating through all expressions >> to find which are both smaller than the original and equal to the original, >> there are obvious transforms that one can apply to an expression to produce >> expressions that are _known_ to be equal to the original. This is known as >> "algebra". Moreover, rather than assuming that the space of expression >> complexity is convex (ie: to get to the optimum, all you have to do is take >> steps, each of which decreases the size) one can apply known techniques to >> escape from local minima by temporarily increasing the length of the >> expression. This is not only what humans do, it is what a wide array of >> search techniques do, such as adaptive simulated annealing, genetic >> programming, symbolic regression, etc. >> >> Now, for the coup de grâce: >> >> #2 (Observing what experts do to simplify expressions.) relates directly >> to practical efforts in expertise acquisition, most fabulously demonstrated >> by AlphaGo by Google' DeepMind's team (co-founded by Hutter's PhD >> students). For decades, Go has been considered uncrackable by machine >> learning because it is so "intuitive". Although they were able to achieve >> world-class expertise by observing games by world-class experts -- and that >> would be adequate for Maxima -- AlphaGo Zero, was able to master the >> game of Go without human knowledge >> <https://www.nature.com/articles/nature24270>. Treat algebra (and >> calculus) as a "game" with "rules" that require "intuition" of "experts" >> and there may be something done -- even if not to the degree achieved by >> AlphaGo Zero. >> >> "It seems to me that the most important discovery since Gödel was the >>> discovery by Chaitin, Solomonoff and Kolmogorov of the concept called >>> Algorithmic Probability which is a fundamental new theory of how to make >>> predictions given a collection of experiences and this is a beautiful >>> theory, everybody should learn it, but it’s got one problem, that is, that >>> you cannot actually calculate what this theory predicts because it is too >>> hard, it requires an infinite amount of work. However, it should be >>> possible to make practical approximations to the Chaitin, Kolmogorov, >>> Solomonoff theory that would make better predictions than anything we have >>> today. Everybody should learn all about that and spend the rest of their >>> lives working on it." >>> >>> Marvin Minsky >>> Panel discussion on The Limits of Understanding >>> World Science Festival, NYC, Dec 14, 2014 >>> >>> >> > |