One would think we should have reached the point where the disparity between lofty AI rhetoric and reality on the ground would be decreasing. Yet this post shows just the opposite. I'm hardly the first to say this, but at some point the many hundreds of billions of dollars pouring into frontier model development has to show a greater return on investment, or the risk of industry collapse will grow uncomfortably high.
I have more hope in Quantum computing to assist in fields such as CFD, QCD, overall quantum physics than AI. So I guess I'm saying Im tired of trying to optimize ML AI to give reliable results.
Having done a bit of research and having a strong desire to get things right, I find this very disturbing. Thanks for highlighting this. Researchers using AI in their work (not necessarily AI researchers) should be more open about publishing failures.
The combination of the file drawer problem and lack of transparency about dependency on hyperparameters, etc, makes it very hard to accurately assess impact and progress.
I'm working on a large-scale physics research project right now. I'm not myself a physicist but I work with a lot of them. I can already tell that AI is accelerating this work, but it isn't in the way that you describe. It's not like they are training advanced AI models to do something that sounds really cutting-edge with AI.
Instead, much of the day-to-day work of a physicist, at least in some fields, is basic Python programming. And the LLMs are really good at this! Better than many physicists. Someone can be an excellent physicist, top 1%, but a mediocre Python programmer. And the LLMs already know all the details of astropy, they are good at converting one file format to another, cleaning data, all these mundane tasks that soak up the time of physicists.
If the AI can quickly do the most boring 50% of your tasks, suddenly you're accelerated to twice the speed. Plus, for most physicists, this frees them up to spend more time on the *interesting* stuff.
But this does not count as “AI doing physics.” Instead, it’s AI doing the grunt work that used to be done by overqualified grad students. It is a step forward.
Thanks for the excellent post! The lack of sound evaluation methods and the reliance on a handful of benchmarks in AI and ML is a huge problem. Similarly the cherry picking of favourable results and data sets is just plain old scientific malpractice and should be treated as such!
I am very excited about AI, but in scientific work, as what I am doing, they are not a substitute for honest math modeling.
A math PDE captures the essence of the problem. A neural net will find a best fit for your samples. Highly suspect.
Google had good luck with neural nets for weather modeling. I think they cannot offer guarantees though. If your weather pattern is in historical data, they will fit it well. Otherwise it may give junk.
Insightful article. I find that publication bias is understudied and not talked about enough. We looked at its role in overoptimism in ML-driven science here:
I didn’t realise a scientific experiment could be “unsuccessful” unless the experiment can’t be completed. Maybe that’s what you mean, but I’m a non-scientist, so forgive me if I’m missing something obvious.
There's an interesting take on what constitutes research study "failure" in this Stack Exchange answer by scientist Jake Beal (his LinkedIn bio: https://www.linkedin.com/in/jake-beal/)
"A "failure" is any case where you didn't get what you wanted in the study. This might be a negative result, but it might also be due to error, mistakes, design problems, management problems, etc."
"A "negative result" is a special type of failure, which clearly establishes that the system that you are dealing with could not produce the result you wanted or expected."
An instance where an experiment couldn't be completed would fall under the first, broader category.
An instance where an experiment or other study returned results that did "not confirm what you expect or did not come out statistically significant" (wording from https://goldbio.com/articles/article/Publishing-Failure-in-Science) would fall within the second, narrower category of "negative result."
Thank you Aron! That’s much clearer to me now. Although if I was a scientist I’d worry about calling a result where my hypothesis was invalidated a “failure”, because that would introduce unnecessary psychological bias into the process.
AI is still in very early years. It is on an exponential improvement curve.
Suggest taking another level set in one to two years as to the capabilities of AI at that point and compare it to your present day expectations. I am certain the improvement will be significant.
Given that AI is only very early in development, it would be impossible to find any valid "peer reviewed research" [lol]
But here's some reading materials from a known expert in the AI field that will hopefully provide you some satiation. Do you think that 2027 or 2028 would qualify as exponential?
Hosted by Ross Douthat - Mr. Douthat is an Opinion columnist and the host of the “Interesting Times” podcast.
The Forecast for 2027? Total A.I. Domination.
Losing your job may be the best-case scenario.
Below is an edited transcript of an episode of “Interesting Times.” We recommend listening to it in its original form for the full effect. You can do so using the player above or on the NYT Audio app, Apple, Spotify, Amazon Music, YouTube, iHeartRadio or wherever you get your podcasts.
----
Ross Douthat: How fast is the artificial intelligence revolution really happening? What would machine superintelligence really mean for ordinary human beings? When will Skynet be fully operational?
Are human beings destined to merge with some kind of machine god — or be destroyed by our own creation? What do A.I. researchers really expect, desire and fear?
My guest today is an A.I. researcher who’s written a dramatic forecast suggesting that we may get answers to all of those questions a lot sooner than you might think. His forecast suggests that by 2027, which is just around the corner, some kind of machine god may be with us, ushering in a weird, post-scarcity utopia — or threatening to kill us all.
Thanks for the links. AI 2027 sci-fi dressed as scientific prediction is not very convincing for me. They would need to show other alternative scenarios and/or give much more solid substantiation of the one they’ve picked as the most likely. Lot of their arguments are based on very subjective guesses. They IMHO dramatically underestimate how hard it is to change physical reality.
Did you actually read the whole AI 2027 article or was it TL:DR?
IMO, too many people predicting the future are worried about their future job or life prospects and find solace by clinging to how relatively slowly things occurred in the past.
By all measures, the speed of technological and societal change has accelerated significantly over the past 150 years and there is no reason to expect that speed to slow down., especially in computer power and AI development.
'Prediction is very difficult, especially if it's about the future.'
-- Niels Bohr
If you are interesting in reading a SF series about how the near term future of AI could play out, I suggest checking out:
From Wikipedia: "The Singularity series by William Hertling is a collection of science fiction novels that explore the implications of artificial intelligence on society, the emergence of a technological singularity, and the challenges humanity faces as it integrates more closely with the technology it creates. The series is known for its realistic depiction of AI development and the ethical, societal, and personal dilemmas that arise from the blurring lines between human and machine intelligence."
It's 4 books and an easy, kind of fun read.
You might also want to ponder this 30 year old short story from Wired magazine (when it used to be worth reading):
Issue 3.03 - Mar 1995
Faded Genes
By Greg Blonder
In 2088, our branch on the tree of life will come crashing down, ending a very modest (if critically acclaimed) run on planet earth. The culprit? Not global warming. Not atomic war. Not flesh-eating bacteria.
Not even too much television. The culprit is the integrated circuit - aided by the surprising power of exponential growth. We will be driven to extinction by a smarter and more adaptable species - the computer. And our only hope is to try and accelerate human evolution with the aid of genetic engineering.
While the concern is valid, the analysis falls short by conflating fundamentally distinct concepts: optimization tools, machine learning (ML), and artificial intelligence (AI). It’s a common misconception to equate ML with AI, despite their significant differences in scope and application.
Quite an insightful read! Thank you for sharing your balanced and nuanced take on the topic.
Excellent post!
One would think we should have reached the point where the disparity between lofty AI rhetoric and reality on the ground would be decreasing. Yet this post shows just the opposite. I'm hardly the first to say this, but at some point the many hundreds of billions of dollars pouring into frontier model development has to show a greater return on investment, or the risk of industry collapse will grow uncomfortably high.
I have more hope in Quantum computing to assist in fields such as CFD, QCD, overall quantum physics than AI. So I guess I'm saying Im tired of trying to optimize ML AI to give reliable results.
Great piece. Thanks for keeping it real. The hype is blinding and extremely well funded.
Having done a bit of research and having a strong desire to get things right, I find this very disturbing. Thanks for highlighting this. Researchers using AI in their work (not necessarily AI researchers) should be more open about publishing failures.
Great and informative piece.
The combination of the file drawer problem and lack of transparency about dependency on hyperparameters, etc, makes it very hard to accurately assess impact and progress.
I'm working on a large-scale physics research project right now. I'm not myself a physicist but I work with a lot of them. I can already tell that AI is accelerating this work, but it isn't in the way that you describe. It's not like they are training advanced AI models to do something that sounds really cutting-edge with AI.
Instead, much of the day-to-day work of a physicist, at least in some fields, is basic Python programming. And the LLMs are really good at this! Better than many physicists. Someone can be an excellent physicist, top 1%, but a mediocre Python programmer. And the LLMs already know all the details of astropy, they are good at converting one file format to another, cleaning data, all these mundane tasks that soak up the time of physicists.
If the AI can quickly do the most boring 50% of your tasks, suddenly you're accelerated to twice the speed. Plus, for most physicists, this frees them up to spend more time on the *interesting* stuff.
What else LLMs could accelerate apart of coding and work with text?
and only if using some recommendations and emerging best practices e.g. chain of vibes https://blog.thepete.net/blog/2025/04/14/chain-of-vibes/
But this does not count as “AI doing physics.” Instead, it’s AI doing the grunt work that used to be done by overqualified grad students. It is a step forward.
Thanks for the excellent post! The lack of sound evaluation methods and the reliance on a handful of benchmarks in AI and ML is a huge problem. Similarly the cherry picking of favourable results and data sets is just plain old scientific malpractice and should be treated as such!
This is super interesting. Thank you for sharing!
I am very excited about AI, but in scientific work, as what I am doing, they are not a substitute for honest math modeling.
A math PDE captures the essence of the problem. A neural net will find a best fit for your samples. Highly suspect.
Google had good luck with neural nets for weather modeling. I think they cannot offer guarantees though. If your weather pattern is in historical data, they will fit it well. Otherwise it may give junk.
YET!
are there particular LLM models that folks here have found more reliable than others--
Insightful article. I find that publication bias is understudied and not talked about enough. We looked at its role in overoptimism in ML-driven science here:
https://www.cell.com/patterns/fulltext/S2666-3899(25)00033-9
I didn’t realise a scientific experiment could be “unsuccessful” unless the experiment can’t be completed. Maybe that’s what you mean, but I’m a non-scientist, so forgive me if I’m missing something obvious.
There's an interesting take on what constitutes research study "failure" in this Stack Exchange answer by scientist Jake Beal (his LinkedIn bio: https://www.linkedin.com/in/jake-beal/)
https://academia.stackexchange.com/a/41676
"A "failure" is any case where you didn't get what you wanted in the study. This might be a negative result, but it might also be due to error, mistakes, design problems, management problems, etc."
"A "negative result" is a special type of failure, which clearly establishes that the system that you are dealing with could not produce the result you wanted or expected."
An instance where an experiment couldn't be completed would fall under the first, broader category.
An instance where an experiment or other study returned results that did "not confirm what you expect or did not come out statistically significant" (wording from https://goldbio.com/articles/article/Publishing-Failure-in-Science) would fall within the second, narrower category of "negative result."
Thank you Aron! That’s much clearer to me now. Although if I was a scientist I’d worry about calling a result where my hypothesis was invalidated a “failure”, because that would introduce unnecessary psychological bias into the process.
AI is still in very early years. It is on an exponential improvement curve.
Suggest taking another level set in one to two years as to the capabilities of AI at that point and compare it to your present day expectations. I am certain the improvement will be significant.
How do you know that “AI is on exponential improvement curve”? Any references to peer reviewed research?
Given that AI is only very early in development, it would be impossible to find any valid "peer reviewed research" [lol]
But here's some reading materials from a known expert in the AI field that will hopefully provide you some satiation. Do you think that 2027 or 2028 would qualify as exponential?
-----
Daniel Kokotajlo
@DKokotajlo
"How, exactly, could AI take over by 2027?"
https://x.com/DKokotajlo/status/1907826614186209524
AND
An Interview With the Herald of the Apocalypse
May 15, 2025
Hosted by Ross Douthat - Mr. Douthat is an Opinion columnist and the host of the “Interesting Times” podcast.
The Forecast for 2027? Total A.I. Domination.
Losing your job may be the best-case scenario.
Below is an edited transcript of an episode of “Interesting Times.” We recommend listening to it in its original form for the full effect. You can do so using the player above or on the NYT Audio app, Apple, Spotify, Amazon Music, YouTube, iHeartRadio or wherever you get your podcasts.
----
Ross Douthat: How fast is the artificial intelligence revolution really happening? What would machine superintelligence really mean for ordinary human beings? When will Skynet be fully operational?
Are human beings destined to merge with some kind of machine god — or be destroyed by our own creation? What do A.I. researchers really expect, desire and fear?
My guest today is an A.I. researcher who’s written a dramatic forecast suggesting that we may get answers to all of those questions a lot sooner than you might think. His forecast suggests that by 2027, which is just around the corner, some kind of machine god may be with us, ushering in a weird, post-scarcity utopia — or threatening to kill us all.
...
https://www.nytimes.com/2025/05/15/opinion/artifical-intelligence-2027.html
Thanks for the links. AI 2027 sci-fi dressed as scientific prediction is not very convincing for me. They would need to show other alternative scenarios and/or give much more solid substantiation of the one they’ve picked as the most likely. Lot of their arguments are based on very subjective guesses. They IMHO dramatically underestimate how hard it is to change physical reality.
Did you actually read the whole AI 2027 article or was it TL:DR?
IMO, too many people predicting the future are worried about their future job or life prospects and find solace by clinging to how relatively slowly things occurred in the past.
By all measures, the speed of technological and societal change has accelerated significantly over the past 150 years and there is no reason to expect that speed to slow down., especially in computer power and AI development.
'Prediction is very difficult, especially if it's about the future.'
-- Niels Bohr
If you are interesting in reading a SF series about how the near term future of AI could play out, I suggest checking out:
From Wikipedia: "The Singularity series by William Hertling is a collection of science fiction novels that explore the implications of artificial intelligence on society, the emergence of a technological singularity, and the challenges humanity faces as it integrates more closely with the technology it creates. The series is known for its realistic depiction of AI development and the ethical, societal, and personal dilemmas that arise from the blurring lines between human and machine intelligence."
It's 4 books and an easy, kind of fun read.
You might also want to ponder this 30 year old short story from Wired magazine (when it used to be worth reading):
Issue 3.03 - Mar 1995
Faded Genes
By Greg Blonder
In 2088, our branch on the tree of life will come crashing down, ending a very modest (if critically acclaimed) run on planet earth. The culprit? Not global warming. Not atomic war. Not flesh-eating bacteria.
Not even too much television. The culprit is the integrated circuit - aided by the surprising power of exponential growth. We will be driven to extinction by a smarter and more adaptable species - the computer. And our only hope is to try and accelerate human evolution with the aid of genetic engineering.
...
https://www.wired.com/1995/03/blonder-if/
While the concern is valid, the analysis falls short by conflating fundamentally distinct concepts: optimization tools, machine learning (ML), and artificial intelligence (AI). It’s a common misconception to equate ML with AI, despite their significant differences in scope and application.
Check Reframing 'AI for Science': Moving Beyond the Hype https://www.linkedin.com/pulse/reframing-ai-science-moving-beyond-hype-sashikumaar-ganesan-2tmqc
Isn’t ML a subfield of AI? … and LLMs a subfield of ML?
chatGPT-ass comment