“The experience seemed roughly on par with trying to advise a mediocre, but not completely incompetent, graduate student.”
With regard to interacting with the equivalent of Alexa. That’s a remarkable difference in 5 years.
Not only that, it also helped me reimagine and conceptualize a new measure of statistical dependency based on Jensen-Shannon divergence that works very well. And it came up with a super fast implementation of normalized mutual information, something I tried to include in the library originally but struggled to find something fast enough when dealing with large vectors (say, 15,000 dimensions and up).
While it wasn’t able to give perfect Rust code that compiled on the very first try, it was able to fix all the bugs in one more try after pasting in all the compiler warning problems from VScode. In contrast, gpt-4o usually would take dozens of tries to fix all the many rust type errors, lifetime/borrowing errors, and so on that it would inevitably introduce. And Claude3.5 sonnet is just plain stupid when it comes to Rust for some reason.
I really have to say, this feels like a true game changer, especially when you have really challenging tasks that you would be hard pressed to find many humans capable of helping with (at least without shelling out $500k+/year in compensation for).
And it’s not just the performance optimization and relatively bug free code— it’s the creative problem solving and synthesis of huge amounts of core mathematical and algorithmic knowledge plus contemporary research results, combined with a strong ability to understand what you’re trying to accomplish and making it happen.
Here is the diff to the code file showing the changes:
https://github.com/Dicklesworthstone/fast_vector_similarity/...
For example, I asked a pretty simple question here and it got completely confused:
https://moorier.com/math-chat-1.png https://moorier.com/math-chat-2.png https://moorier.com/math-chat-3.png
(Full chat should be here: https://chatgpt.com/share/66e5d2dd-0b08-8011-89c8-f6895f3217...)
This is like when you’re being interviewed for a programming job and the interviewer explains some problem to you that it took their team months to figure out, and then they’re disappointed you can’t whiteboard out the solution they came up with in 40 minutes without access to google.
AI has no emotional barrier to wasted effort, which make them better reasoners than their innate ability would suggest.
*(I remember a specific impressive example from 6 months ago: I asked if certain definitions could be relaxed to allow complex analysis on a non-orientable manifold, like a Klein bottle, something I spent a lot of time puzzling over, and an LLM instantly figured out it would make the Cauchy-Riemann equations globally inconsistent. (In a sense the arbitrary sign convention in CR defines an orientation on a manifold: reversing manifold orientation is the same as swapping i with -i. I understand this now, solely because an LLM suggested looking at it). Of course, I'm sure this isn't original LLM thinking—the math's certainly written down somewhere in its training material, in some highly specific postgraduate textbook I have no knowledge of. That's not relevant to me. For me, it's absolutely impossible to answer this type of question, where I have very little idea where to start, without either an LLM or a PhD-level domain specialist. There is no other tool that can make this kind of semantic-level search accessible to me. I'm very carefully thinking how best to make use of such an, incredibly powerful but alien, tool...)
But on the other hand it misses important detail and hallucinates, just like GPT-4o. And can need a lot of hand holding and correction to get to the right answer, so much so that sometimes you wonder if it would have been easier to just do it yourself. Only this time it's worse because you're waiting 20-60 seconds for an answer.
I wonder if what it excels at is just the stuff that I don't need it for. I'm not in classic STEM, I'm in software engineering, and o1 isn't so much better that it justifies the wait time (yet).
One area I haven't explored is using it to plan implementation or architectural changes. I feel like it might be better for this, but need the right problems to throw at it.
[0] https://www.nytimes.com/games/connections
[1] https://chatgpt.com/share/66e40d64-6f70-8004-9fe5-83dd3653a5...
Coming from Terence Tao that seems pretty remarkable to me?
Any other takes by mathematicians out there?
Just off the top of my head, maybe a RLHF run performed by academic experts and geared towards “creative applications” could get us farther than we are? Given how much the original RLHF run cost with underpaid workers in developing countries that might be exorbitantly expensive, but it’s worth a dream. Perhaps as a governmental or NGO-driven open source initiative…
Of course, a core problem here is defining “creativity” in stringent — or in Chomsky’s words, “scientific” — terms. RLHF dodged that a bit by leaning on the intuitive capabilities of your human critics. I’m constantly opining about how LLMs solved the frame problem, but perhaps it’s better characterized as a partial solution for a relatively easy/basic environment: stories about the real world. The Abstract/Academic/Scientific Frame Problem might be another breakthrough away, yet…
I've tried a variety of ways to ask various LLMs to help solve this. Finally with access to ChatGPT o1-preview I was able to get a good answer. The first answer was wrong, but with a little more prompting and clarification I was able to get the answer I wanted to relate the positions of P0, P1, P2 and P3 so that a Bézier curve could be G3. This isn't something that is unknown because there are many CAD programs which can do this already, but I had not been able to find the answer I was looking for in a form that was useful to me.
I don't really know where that puts o1-preview relative to a math grad student, but after spending tons of time over a couple years on this pet project, getting an answer from a chat bot was one of the more magical moments I've had with technology in a long time.
Math grad students everywhere now have a benchmark to determine if Terry Tao considers them to be mediocre or incompetent.
If you know the contours of the answer and can describe what you are looking for it can quickly find it for you.
If they are overly optimistic, perhaps it would be good to hear the opinions of Wiles and Perelman.
As LLMs continue to improve I feel like anyone making a living doing the "99% perspiration" part of intellectual labor is about to enter a world of hurt.
There’s at least a “complexity” if not a “problem” in terms of judging models that to a first approximation have been trained on “everything”.
Have people tried putting these things up against serious mathematical problems that are well studied? With or with Lean hinting has anyone gotten like, the Shimura-Taniyama conjecture/proof out?
Appreciate the no fucks given categorization of grad students.
Is the most important part imo. A big goal should be some ai system coming up with its own discovery and ideas. Really unclear how we can get from the current paradigm to it coming up with something like general relativity, like Einstein. Does it require embodiment?
Is that an accurate description? I thought it just runs the LLM for longer, and multiple times,and truncates the beginning of the output.
Find a, b, c distinct positive integers satisfying a^3 + b^3 = c^4. Hint: try dividing all sides by c^3, then giving values to (a/c) and (b/c).
Given the log scale on compute to improve performance, it is not a guarantee that the ratio can be improved so much in a few years
But I was curious and I asked something very simple, Euclid's first postulate and I got this answer:
Euclid's Postulate 1: "Through any two points, there is exactly one straight line."
In fact Euclid's Postulate 1 is "To draw a straight line from any point to any point." http://aleph0.clarku.edu/~djoyce/java/elements/bookI/bookI.h...
I think AI answer is not correct, it may be some textbook interpretation but I was expecting Euclid's exact wording.
Edit: Google's Gemini gives the exact wording of the postulate and then comments that this means that you can draw one line bitween two points. I think this is better
“The experience seemed roughly on par with trying to advise a mediocre, but not completely incompetent, graduate student.”
It performs way better than undergrads. Funny he didn’t point that out but only made some slight to it about being a bad graduate student. Don’t believe me, open the book and ask away. It’s amazing, even if it is a “mediocre graduate student” which is far better than a good graduate student or professor that gives you no help or time for all that money you forked over.
It’s already worth the money, ignore this shitty write up by someone they doesn’t need its help.
I work in a field related to operations research (OR), and ChatGPT 4o has ingested enough of the OR literature that it's able to spit out very useful Mixed Integer Programming (MIP) formulations for many "problem shapes". For instance, I can give it a logic problem like "i need to put i items in n buckets based on a score, but I want to fill each bucket sequentially" and it actually spits out a very usable math formulation. I usually just need to tweak it a bit. It also warns against weak formulations where the logic might fail, which is tremendously useful for avoiding pitfalls. Compare this to the old way, which is to rack my brain over a weekend to figure out a water-tight formulation of MIP optimization problem (which is often not straightforward for non-intuitive problems). GPT has saved me so much time in this corner of my world.
Yes, you probably wouldn't be able to use ChatGPT well for this purpose unless you understood MIP optimization in the first place -- and you do need to break down the problem into smaller chunks so GPT can reason in steps -- but for someone who can and does, the $20/month I pay for ChatGPT more than pays for itself.
side: a lot of people who complain on HN that (paid/good - only Sonnet 3.5 and GPT4o are in this category) LLMs are useless to them probably (1) do not know how to use LLMs in way that maximizes their strengths; (2) have expectations that are too high based on the hype, expecting one-shot magic bullets. (3) LLMs are really not good for their domain. But many of the low-effort comments seem to mostly fall into (1) and (2) -- cynicism rather than cautious optimism.
Many of us who have discovered how to exploit LLMs in their areas of strength -- and know how to check for their mistakes -- often find them providing significant leverage in our work.