30 Comments
User's avatar
Gianni Berardi's avatar

Claude is for sure the best for vibe coding.

https://v0.dev/ is the silver medal

IMVHO

Expand full comment
Ksenia Se's avatar

Thinking alike :) we just published a deep dive into 15 agents https://github.com/The-Focus-AI/june-2025-coding-agent-report

Expand full comment
Oleg  Alexandrov's avatar

Coding agents are still useful only in small incremental doses. I use one myself, but any time one slips into caffeinated vibe mode, next day one needs to throw it all out and build it back in small pieces.

Expand full comment
Chuck Mire's avatar

I am over 81 years old, and I used to program standalone Microsoft Executable Utilities that I would post online. I saw that these old utilities might not run in native mode with the advent of ARM computers. Rather than letting them possibly die, I use ChatGPT to convert some of them to successful standalone JavaScript utilities that I have uploaded here:

https://qb45.org/files.php?cat=2

Though this is “Vibe Coding” since I didn’t first start to learn JavaScript due to my age, I was able to succeed because I knew in intimate detail how my Windows executables were structured and was able to give precise prompts to ChatGPT.

Expand full comment
David Lobron's avatar

Wow, this is fascinating. I use Cursor, with Claude as its backend. I recently went down a rabbit-hole where Cursor helped me refine a unit test. The test result got better and better, but I realized partway through that I was simply overfitting on that test, and the code was actually hiding problems. I find that Cursor generally helps a lot, but it doesn't replace human judgement in many situations.

Expand full comment
Sam Tobin-Hochstadt's avatar

I think the "programming in English" idea is wrong, in a few important ways. Most directly, as you noted, getting coding agents to work for you requires that you already know how to program. More generally, working with AI tools is, as Ethan Mollick keeps saying, a management task. Dealing with a coding agent (especially a fairly autonomous one like Codex) is like dealing with an army of junior developers. You tell them things in english, but also with references to code, and you have to read and review the code and potentially debug it in order to make the whole thing work. Most people who manage individual developers at tech companies are former (or current) programmers.

Maybe things will eventually evolve past this stage -- the head of search at Google, for example, doesn't write or read code as part of management. But the leap to that kind of autonomy is at least as big as the leap that got us to this stage in the first place.

Expand full comment
Timothy B. Lee's avatar

I'm not sure I'm following you here. You are saying that vibe-coding isn't programming because you have to know how to program in order to vibe-code effectively?

When the industry first made the transition from machine code to compiled languages in the 1950s and 1960s, I assume there were a lot of programmers who straddled the line between the two—they'd write some programs in Fortran and some in assembly, or maybe they'd write most of their program in Fortran but hand-code the most performance-sensitive parts in assembly. I don't think it would have made sense to say that writing Fortran is not programming because you have to also know assembly to be good at it.

By the same token, in the future programming will involve writing instructions for the computer in a mixture of English and traditional programming languages like Python. Arguing over definitions isn't that interesting but I don't see why we can't describe the process of writing English prompts for a coding agent as a form of programming—especially if the prompt is detailed instructions that the agent is going to tranform fairly directly into traditional code.

Expand full comment
Sam Tobin-Hochstadt's avatar

I think that as things stand, working with AI programming tools is very different from working with a compiler. With a compiler, you basically treat the result as a black box that you almost never look at. But that isn't like (at least my) experience using AI tools. Instead the product of the tool is code, which you then have to deal with.

Instead I believe that the right model for thinking about an AI tool is the human programmer, and that's in fact the way that Codex and Claude Code and GitHubs new agent present themselves. You communicate with them in natural language as you do with a colleague, but they produce code that you (maybe) read like a colleague would.

Expand full comment
Robert Ruzitschka's avatar

I think the metaphor is not correct. A higher level programming language implements the machine code that is ultimately needed in a predictable and efficient way. If you use println, you can be assured that the machine code generated that implements this abstraction is highly optimized and reliable. In the case of vibe coding you can’t know this as the model generates in the best case working code, so the required functionality is implemented, but you can’t say if the solution is reliable and optimized- the code is generated based on training data and a statistical process that is completely obfuscated from you. So if you want to achieve pro results you need to be able to read the generated code, understand the algorithms implemented and you need to know if there are more efficient ways of implementation. Then you can guide the agent to refactor. Ultimately you will just save the time for typing. I don’t see how this can change in the future. All of these problems are inherent to the design of LLMs .

Tl,dr: compilers are a different category of abstraction and proper vibe coding (coding in English as programming language) for problems that are more complex is far away.

Expand full comment
Sam Tobin-Hochstadt's avatar

I also want to put this a different way. People who contract for software systems but aren't going to work with the code basically always work through someone who manages the relationship between the customer and the developer. Sometimes that person is a manager and the customer is an executive. Sometimes the person has that as their job description, as at a place like Accenture. Sometimes those roles are filled by the same person, in a one person consultancy.

But that role is very different from being a programmer, as anyone who works as a one person consultant will say. Codex and Claude Code are fulfilling the job of a programmer and need to be managed like one. Bolt.new and similar are trying to fulfill both roles with AI, which is obviously more challenging.

Expand full comment
Michael S Ferrell's avatar

i worked at ibm for 30 years starting coding with mainframe assembly code - real bit pushing! at the end of my development career i was writing documents about requirements for software systems, suggestions on implementation options and reviewing documents written by other software architects. should we distinguish "coding" per se from "software engineering"

Expand full comment
Jim's avatar

This was fun, I really enjoyed reading about the results of your experiment.

Expand full comment
AnthonyCV's avatar

>People like to talk about coding agents replacing engineers, but I think it makes more sense to think about this the way Andrej Karpathy put it a couple of years ago: “The hottest new programming language is English.”

Yes, but also, there are a finite number of translation and creation steps happening between thought and thing. If an English description of code can be translated to C++ or Python code, then the remaining unknown is how hard it is going to be, not now but for future coding agents, to translate the concept "I want a piece of software that does X" into the English language prompts that get agents to generate the English language code that gets other agents to generate the Python code needed. And how hard it is going to be to evaluate the results.

Expand full comment
David M Lewis's avatar

I am reminded that at a previous company, we were put through a class on effective specifications. The manager who ran my session began by saying that they had analyzed a large number of internal error reports, and that roughly two-thirds of the errors could be traced to specifications that were either vague or not internally consistent. That fits very well with what you’re saying about how to use these tools.

Expand full comment
werdnagreb's avatar

This was really interesting. What I’m taking away from this is that the assistants that are more vibe-y are good for prototyping only. You probably don’t want to build anything that you’re concerned about getting hacked. (This can be super powerful for designers who just want to get their ideas out.)

The others are for professional engineers and still need a lot of hand holding, but are good for speeding repetitive tasks up.

One thing you didn’t mention is what is the quality of the code that they wrote. Six months from now is there going to be a problem when you need to make a change?

Expand full comment
Timothy B. Lee's avatar

Yes, this is exactly right! The vibe-coding tools are good for prototyping but don't use them for production code.

Aaron still does conventional code review on all the code generated by the AI, and he didn't mention it being noticeably worse than human-written code. But I think this is part of why it's important to provide a lot of context—you need to explicitly tell the agent what characteristics make code good at your specific organization and provide feedback when they don't do it right.

Expand full comment
Nimish Sanghi's avatar

The point you make in 2nd last para on why we still need programmers who can use english vs C++/Java/Python as coding medium to give effect to a system is the key lot of people don't get it or it gets lost in the hype of "AI will replace programmers"

On your experiment even I have had superior experience with Cursor vs other tools.

Expand full comment
Jurgita's avatar

I really enjoyed reading about your experience testing different agents! I recently tried Claude Code and was genuinely impressed—for the first time, I felt like an AI agent was actually useful for web development (for someone with limited web development experience). I would love to see what your best AI-generated website looked like, if you arw open to sharing. I am curious to see what others are achieving with these tools. Sometimes I wonder if my expectations are just a bit too high.

Expand full comment
Timothy B. Lee's avatar

I thought about trying to publish the best websites, but I'm worried that the data might be inaccurate, which would be unfair to Waymo. I didn't notice any inaccuracies, but I didn't check it super carefully. Which honestly is related to the broader thesis of the piece—you can fairly easily get a coding agent to make something that looks superficially right, but it takes extra work to generate something you're confident is correct.

Expand full comment
Kit's avatar

You could write a legal contract in plain old English. And that works. Unless something goes wrong. And something always goes wrong. Then you need lawyers and judges figuring out what was actually written. Legalese actually serves a purpose.

Beyond a certain level of complexity, capturing complex logic in a natural language grows very unnatural.

Expand full comment
Chuck Mire's avatar

My favorite quote about programmers, by Chuck Mire:

“The programmer, like the poet, works only slightly removed from pure thought-stuff.

He builds his castles in the air, from air, creating by exertion of the imagination.

Few media of creation are so flexible, so easy to polish and rework, so readily capable of realizing grand conceptual structures.

Yet the program construct, unlike the poet's words, is real in the sense that it moves and works, producing visible outputs separate from the construct itself.

The magic of myth and legend has come true in our time.

One types the correct incantation on a keyboard, and a display screen comes to life, showing things that never were nor could be.”

- Frederick P. Brooks -

Expand full comment
Jonathan Aquino's avatar

Why no love for Amazon Q Developer CLI? Uses Claude under the covers and only $20/month for seemingly unlimited access

Expand full comment
Timothy B. Lee's avatar

I had not heard of it! I talked to a number of programmers in preparation for this story and nobody mentioned it.

Expand full comment