From: Henry B. <hb...@pi...> - 2023-03-28 00:35:07
|
I haven't used any of these so-called "ChatX" programs, but would it help if you asked them to "show their work" or "can you show me a proof of what you just said" ? I'd guess that more progress is made in math through new *proof techniques* than through new results, per se. -----Original Message----- From: Richard Fateman <fa...@gm...> Sent: Mar 27, 2023 2:55 PM To: Viktor T. Toth <vt...@vt...> Cc: <max...@li...> Subject: Re: [Maxima-discuss] Warning: don't use "an AI" for math. Bard from Google. Bing from Microsoft The Microsoft paper is interesting .. https://arxiv.org/pdf/2303.12712.pdfbut repeatedly makes the point that the technology is not going to get the right answer all the time. GPT-4 is usually better than ChatGPT on different examples. Worth reading, in my opinion, if you are curious about examples. The idea that GPT-n would be a clearing-house for distributing questions to different experts which can then provide answers, seems to rely on two (in general, implausible) assumptions. 1. Whatever GPT-n is made of, it is sufficient to determine whether a question requires an expert on calculus, math proof, geometry, cooking, travel, philosophy, molecular biology, Shakespeare, media, electronics, automated driving, etc etc. and maybe in which (human or programming) language to use. 2. There are experts to answer the questions. Presumably NOT large-language-model programs but something that "works better". In my experience (going back to residence at MIT AI Lab circa 1968-74 while Macsyma was young), the idea of "AI", a deliberately inflamatory term, was used to describe "stuff we would like to do with a computer but haven't figured out how to program." When a program was written to do the job, it ceased to be AI, but something else. Like Macsyma was "symbolic mathematical computing". People who touted Macsyma as a success of AI, had no knowledge of how Macsyma worked, and were essentially marketeers or perhaps (typically) ill-informed journalists. Now these new programs are quite surprising in particular answers, but to think that they know something as opposed to parroting words, is not supported. An example from that paper.. Prompt: Can you write a proof that there are infinitely many primes, with every line that rhymes? <yes, it sort of does it> starting with Yes, I think I can, Though it might take a clever plan. I’ll start by noting Euclid’s proof, Which shows that primes aren’t just aloof. .... ---------------------- So it must understand primes and factorials (used in the standard proof). Thus when I ask the question "Tell me a value of i>1 for which i! is prime" You might expect it to say something like 2! is 2, which is a prime number. Uh, no. It says "There is no value of i>1 for which i! is a prime number. This is because i! has more than two factors, so it is not a prime number (https://www.geeksforgeeks.org/prime-numbers/)1 (https://www.geeksforgeeks.org/prime-numbers/). " It thinks that... "The largest known prime number (as of February 2023) is 282,589,933 − 1, a number which has 24,862,048 digits when written in base 10. It was found via a computer volunteered by Patrick Laroche of the Great Internet Mersenne Prime Search (GIMPS) in 2018 (https://en.wikipedia.org/wiki/Largest_known_prime_number)1 (https://en.wikipedia.org/wiki/Largest_known_prime_number)" The largest known Mersenne prime is 2^82589933-1 according to Wikipedia. And if asked, Bing says there are an infinite number of prime numbers. Now the question might arise as to which expert should be tagged to answer such questions? A program like Maxima, Mathematica, Maple, etc.. can calculate certain things, but can it generate Euclid's proof that there are an infinite number of primes? (FYI, Wolfram Alpha, asked for the largest prime, says "no such number exists". ) Say you have an algebra word problem. Here's one. A cell phone company charges a monthly rate of $12.95 and $0.25 a minute per call. The bill for m minutes is $21.20. What is m? Wolfram Alpha, which one might think would be the expert for this, parses the question and gets quite confused. It thinks the question involves 0.25*a. Bing, on the other hand, computes m=33. Correctly. My conclusion: See items 1 and 2 in the first paragraph. ..... Back to work... RJF On Mon, Mar 27, 2023 at 12:16 AM Viktor T. Toth <vt...@vt... (mailto:vt...@vt...)> wrote: Apparently, it is clever enough to know what it doesn't know. Never mind Wolfram, there was a Microsoft Research paper suggesting the same. I got the inspiration from that paper to conduct my own little experiment, acting as the AI's "mechanical Turk" on request, providing it with a SEARCH and a CALC function for things with which it required assistance. It was not only able to use these tools efficiently even with minimal instruction, it also knew when NOT to use them. Here's my version of the experiment, with a link to the MR paper: https://spinor.info/weblog/?p=11685 Viktor > -----Original Message----- > From: Raymond Toy <toy...@gm... (mailto:toy...@gm...)> > Sent: Friday, March 24, 2023 2:30 PM > To: max...@li... (mailto:max...@li...) > Subject: Re: [Maxima-discuss] Warning: don't use "an AI" for math. Bard from Google. Bing from Microsoft > > On 3/24/23 8:46 AM, Richard Fateman wrote: > > I've seen a suggestion that the AI would be clever enough to understand what it does not know, andsend > certain questions off to a computer algebra system, like Wolfram Alpha. How would that work if it doesn't know that > it doesn't know something? > > https://writings.stephenwolfram.com/2023/03/chatgpt-gets-its-wolfram-superpowers/ > > I think Wolfram Alpha does a better job, but even in that article it says Alpha didn't know the distance between > Chicago and Toyko in January. > > > On Thu, Mar 23, 2023 at 10:07 PM Viktor T. Toth <vt...@vt... (mailto:vt...@vt...) <mailto:vt...@vt... (mailto:vt...@vt...)> > wrote: > > > The AI can be great on concepts but suck on details. They still make fairly elementary mistakes in > logical deduction and can fail even simple arithmetic (although GPT4 is noticeably improved in this regard.) > > Once you spot an error and point it out, they often correct it successfully. But neural nets are > neither databases in the conventional sense, nor calculators (or for that matter, computer algebra engines) so > factual errors or errors in deduction will not go away. (Just like humans...) > > Here's my interaction with GPT3.5 moments ago, with some lines omitted: > > vttoth: Give me the powers of 2 from 0 to 73. > GPT3.5: Sure, here are the powers of 2 from 0 to 73: > 2^0 = 1 > [...] > 2^60 = 1,152,921,504,606,846,976 > 2^61 = 2,305,843,009 > > vttoth: You were interrupted by the token limit at 2^61. Can you continue from that line? > GPT3.5: Sure, sorry about that. Here are the remaining powers of 2: > 2^61 = 4,611,686,018,427,387,392 > [...] > 2^72 = 9,444,732,965,739,289,378,816 > 2^73 = 18,889,465,931,478,578,757,632 > > vttoth: Wait, shouldn't 2^61 be 2305843009213693952? You started with the correct digits before you > were interrupted but when you resumed, you were off by a factor of two. > GPT3.5: You are absolutely correct, I apologize for the mistake. The correct value for 2^61 is indeed > 2,305,843,009,213,693,952. Thank you for bringing this to my attention. > > vttoth: Can you print the correct values starting from 2^61? > GPT3.5: Sure, here are the correct values for the remaining powers of 2, starting from 2^61: > 2^61 = 2,305,843,009,213,693,952 > [...] > 2^73 = 9,444,732,965,739,290,427,392 > I apologize for the earlier mistake and I hope this clears up any confusion. > > > Notice by the way that its 2^61 value on the second try was not even exactly twice the correct value; > it appears to have borrowed some digits from 2^73. But on the final attempt, it produced correct values. > > > Bing Chat (GPT4 I believe) fared better: > > vttoth: Give me the powers of 2 from 0 to 73. > Bing: Sure! Here are the powers of 2 from 0 to 73: > 2^0 = 1 > [...] > 2^25 = 33554432 > ... > Would you like me to continue? > > vttoth: Please continue. > Bing: Sure! Here are the rest of the powers of 2 from 26 to 73: > 2^26 = 67108864 > [...] > 2^46 = 70368744177664 > ... > Would you like me to continue? > > vttoth: Please continue. > Bing: Sure! Here are the rest of the powers of 2 from 47 to 73: > 2^47 = 140737488355328 > [...] > 2^67 = 147573952589676412928 > ... > Would you like me to continue? > > vttoth: Please continue. > Bing: Sure! Here are the rest of the powers of 2 from 68 to 73: > 2^68 = 295147905179352825856 > 2^69 = 590295810358705651712 > 2^70 = 1180591620717411303424 > 2^71 = 2361183241434822606848 > 2^72 = 4722366482869645213696 > 2^73 = 9444732965739290427392 > Is there anything else you would like to know? > > This time, GPT was successful; at other times with different tasks, it went astray and I noticed that > often, my attempts to help just confused it more, even GPT4. It's eager though, and I even managed to get it to draw > me a funny cat in TikZ and compose a short, playful piece of music in LilyPond :-) > > > Viktor > > > -----Original Message----- > > From: Richard Fateman <fa...@gm... (mailto:fa...@gm...) <mailto:fa...@gm... (mailto:fa...@gm...)> > > > Sent: Thursday, March 23, 2023 4:46 PM > > To: <max...@li... (mailto:max...@li...) <mailto:max...@li... (mailto:max...@li...)> > <maxima- > di...@li... (mailto:di...@li...) <mailto:max...@li... (mailto:max...@li...)> > > > Subject: [Maxima-discuss] Warning: don't use "an AI" for math. Bard from Google. Bing from Microsoft > > > > I asked Bard for a table of powers of 2 from 0 to 73. > > The table stopped at 57. So I asked > > > > make a table of powers of 2 from 57 to 73 . > > > > It responded... (note there is a horizontal scroll bar) > > > > <https://www.gstatic.com/lamda/images/logo_single_color_v2_0aa36c7aa309a6fe6bd2.svg> > > > > Sure, here is a table of powers of 2 from 57 to 73: > > > > Power Value > > 57 262 > > 58 524 > > 59 1048 > > > > > > > > <snip .. it did go to 73, wrong..> > > > > same question to Bing. From that AI, the table ended at 2 ^31. And not very well. > > 29 536870912 > > 30 1073741824 > > 31 -2147483648 > > … > > ...................................... > > > > So when people say these programs can "do math" (pass AP exams, etc.) we could react by asking > those people to > > distinguish between "doing math" and "passing tests ostensibly intended to determine if a student > can answer some > > simple math questions based on a > > standard list of typical questions". > > |