Re: [Maxima-discuss] What am I missing about the Simplify options?

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

On Mon, May 21, 2018 at 9:37 AM, Richard Fateman <fa...@be...>
wrote:

> ...
> It is hard to know how to interpret Marvin Minsky's statement
> (and unfortunately, not possible to ask him ...)  but I read it
> as having to do with Machine Learning (ML),  where one is
> given a "collection of experiences" and wants to "make predictions".
> I think he is saying that CSK stuff, appropriately approximated
> would be better than what goes on today.
>

If I may speak with some priority, if not authority, as the originator of
the idea for the Hutter Prize for Lossless Compression of Human Knowledge
<http://prize.hutter1.net/#history> (and as one of the people who walked
out when Minsky took the stage to give the plenary speech at IEEE's second
ICNN):

Machine Learning is best viewed, to a first approximation, as lossless
compression to achieve unsupervised learning.   What Minsky is saying that
although one cannot prove one has found the optimal compression of a given
set of "observations" (data set), one can obviously compare degrees of
compression with a single parameter:  Size.

What is the connection between compression and simplification?

To ask the question is to answer it but for one parameter:  Choice of
Universal Turing Machine.

As you point out the UTM can be Maxima itself.  However, as you (and
"Understanding Expression Simplification") point out, that choice leads to
questions such as "Which is simpler, 9^9... or 387420489?"  However to
characterize such questions as indicating the "irrelevance" to of framing
CAS terms of compression (even when bolstered by Kolmogorov Complexity's
incomputability) is throwing the baby out with the bathwater.  Minsky (and
others have previously) dealt with the incomputability argument for
"irrelevance".  The other argument(s) for "irrelevance will be dealt with
presently.

Let me break down "compression of observations" in response to Minsky's
expert systems approach:

Marvin Minsky provided some of the impetus to build Macsyma,
> the predecessor to Maxima.  The argument being that if we
> built enough experts (Macsyma being an expert on math -- a
> better-defined domain than, say, literature) we could merge them
> via some meta-expert into a more general intelligence.
> Essays of later work  (Society of Minds) push further in
> various directions.  I don't know how Minsky's thinking evolved
> to end up promoting Algorithmic Probability, and I
> may even be mischaracterizing it as a kind of ML,
> since I have not studied it.
>

In a CAS, there are two levels of "observation":

1) Treating an expression as an "observation".
2) Observing what experts do to simplify expressions.

In the "9^9 vs  387420489" question, we are dealing with #1 which really
just boils down to the question "Can the choice of UTM be treated as
arbitrary in practice?"

The answer is, "It can't."

Science is a low-context culture
<https://en.wikipedia.org/wiki/High-context_and_low-context_cultures>.  The
choice of UTM for science imposes minimal context for its specification.
Think about the video disc sent out with the Voyage spacecraft to
communicate with alien life forms.  In this low-context, the smallest
description of a UTM would be something like the axioms of set theory.
This leads to the (presently vague) notion of a "natural UTM" which has the
simplest specification in a low context culture.

Now, before you jump to the conclusion that this implies "387420489" is
simpler than "9^9" because the algorithm for exponentiation needn't be
included in the program outputting those digits, keep in mind that one
expression is only one observation.  With each new expression (say
"10000000000") the totality of "observations" demands a potentially
different optimal lossless compression.  The length of  "387420489
10000000000"
may _still_ be inadequate to justify inclusion of the exponentiation
algorithm, but at some point, it may well be justified.

One can argue that these additional observations amount to "increasing
context" and hence takes us ever further from the notion of a "natural UTM"
and you would be exactly right!  All of these additional "observations"
(expressions) might be thought of as the "use cases" of Maxima -- and it is
the job of "experts" to treat all of these observations as the practical
heuristic in choice of the UTM in the sense that they are working to
achieve maximal compression of all expressions ever to challenge Maxima's
simplification.

So, for instance, rather than blindly iterating through all expressions to
find which are both smaller than the original and equal to the original,
there are obvious transforms that one can apply to an expression to produce
expressions that are _known_ to be equal to the original.  This is known as
"algebra".  Moreover, rather than assuming that the space of expression
complexity is convex (ie: to get to the optimum, all you have to do is take
steps, each of which decreases the size) one can apply known techniques to
escape from local minima by temporarily increasing the length of the
expression.  This is not only what humans do, it is what a wide array of
search techniques do, such as adaptive simulated annealing, genetic
programming, symbolic regression, etc.

Now, for the coup de grâce:

#2 (Observing what experts do to simplify expressions.) relates directly to
practical efforts in expertise acquisition, most fabulously demonstrated by
AlphaGo by Google' DeepMind's team (co-founded by Hutter's PhD students).
For decades, Go has been considered uncrackable by machine learning because
it is so "intuitive".  Although they were able to achieve world-class
expertise by observing games by world-class experts -- and that would be
adequate for Maxima -- AlphaGo Zero, was able to master the game of Go
without human knowledge <https://www.nature.com/articles/nature24270>.
Treat algebra (and calculus) as a "game" with "rules" that require
"intuition" of "experts" and there may be something done -- even if not to
the degree achieved by AlphaGo Zero.

>
>
> "It seems to me that the most important discovery since Gödel was the
> discovery by Chaitin, Solomonoff and Kolmogorov of the concept called
> Algorithmic Probability which is a fundamental new theory of how to make
> predictions given a collection of experiences and this is a beautiful
> theory, everybody should learn it, but it’s got one problem, that is, that
> you cannot actually calculate what this theory predicts because it is too
> hard, it requires an infinite amount of work. However, it should be
> possible to make practical approximations to the Chaitin, Kolmogorov,
> Solomonoff theory that would make better predictions than anything we have
> today. Everybody should learn all about that and spend the rest of their
> lives working on it."
>
> Marvin Minsky
> Panel discussion on The Limits of Understanding
> World Science Festival, NYC, Dec 14, 2014
>
>
>
>

Re: [Maxima-discuss] What am I missing about the Simplify options?

Computer Algebra System written in Common Lisp

Re: [Maxima-discuss] What am I missing about the Simplify options?