Coding agents are still useful only in small incremental doses. I use one myself, but any time one slips into caffeinated vibe mode, next day one needs to throw it all out and build it back in small pieces.
I am over 81 years old, and I used to program standalone Microsoft Executable Utilities that I would post online. I saw that these old utilities might not run in native mode with the advent of ARM computers. Rather than letting them possibly die, I use ChatGPT to convert some of them to successful standalone JavaScript utilities that I have uploaded here:
Though this is “Vibe Coding” since I didn’t first start to learn JavaScript due to my age, I was able to succeed because I knew in intimate detail how my Windows executables were structured and was able to give precise prompts to ChatGPT.
Wow, this is fascinating. I use Cursor, with Claude as its backend. I recently went down a rabbit-hole where Cursor helped me refine a unit test. The test result got better and better, but I realized partway through that I was simply overfitting on that test, and the code was actually hiding problems. I find that Cursor generally helps a lot, but it doesn't replace human judgement in many situations.
I think the "programming in English" idea is wrong, in a few important ways. Most directly, as you noted, getting coding agents to work for you requires that you already know how to program. More generally, working with AI tools is, as Ethan Mollick keeps saying, a management task. Dealing with a coding agent (especially a fairly autonomous one like Codex) is like dealing with an army of junior developers. You tell them things in english, but also with references to code, and you have to read and review the code and potentially debug it in order to make the whole thing work. Most people who manage individual developers at tech companies are former (or current) programmers.
Maybe things will eventually evolve past this stage -- the head of search at Google, for example, doesn't write or read code as part of management. But the leap to that kind of autonomy is at least as big as the leap that got us to this stage in the first place.
I'm not sure I'm following you here. You are saying that vibe-coding isn't programming because you have to know how to program in order to vibe-code effectively?
When the industry first made the transition from machine code to compiled languages in the 1950s and 1960s, I assume there were a lot of programmers who straddled the line between the two—they'd write some programs in Fortran and some in assembly, or maybe they'd write most of their program in Fortran but hand-code the most performance-sensitive parts in assembly. I don't think it would have made sense to say that writing Fortran is not programming because you have to also know assembly to be good at it.
By the same token, in the future programming will involve writing instructions for the computer in a mixture of English and traditional programming languages like Python. Arguing over definitions isn't that interesting but I don't see why we can't describe the process of writing English prompts for a coding agent as a form of programming—especially if the prompt is detailed instructions that the agent is going to tranform fairly directly into traditional code.
I think that as things stand, working with AI programming tools is very different from working with a compiler. With a compiler, you basically treat the result as a black box that you almost never look at. But that isn't like (at least my) experience using AI tools. Instead the product of the tool is code, which you then have to deal with.
Instead I believe that the right model for thinking about an AI tool is the human programmer, and that's in fact the way that Codex and Claude Code and GitHubs new agent present themselves. You communicate with them in natural language as you do with a colleague, but they produce code that you (maybe) read like a colleague would.
I think the metaphor is not correct. A higher level programming language implements the machine code that is ultimately needed in a predictable and efficient way. If you use println, you can be assured that the machine code generated that implements this abstraction is highly optimized and reliable. In the case of vibe coding you can’t know this as the model generates in the best case working code, so the required functionality is implemented, but you can’t say if the solution is reliable and optimized- the code is generated based on training data and a statistical process that is completely obfuscated from you. So if you want to achieve pro results you need to be able to read the generated code, understand the algorithms implemented and you need to know if there are more efficient ways of implementation. Then you can guide the agent to refactor. Ultimately you will just save the time for typing. I don’t see how this can change in the future. All of these problems are inherent to the design of LLMs .
Tl,dr: compilers are a different category of abstraction and proper vibe coding (coding in English as programming language) for problems that are more complex is far away.
i worked at ibm for 30 years starting coding with mainframe assembly code - real bit pushing! at the end of my development career i was writing documents about requirements for software systems, suggestions on implementation options and reviewing documents written by other software architects. should we distinguish "coding" per se from "software engineering"
>People like to talk about coding agents replacing engineers, but I think it makes more sense to think about this the way Andrej Karpathy put it a couple of years ago: “The hottest new programming language is English.”
Yes, but also, there are a finite number of translation and creation steps happening between thought and thing. If an English description of code can be translated to C++ or Python code, then the remaining unknown is how hard it is going to be, not now but for future coding agents, to translate the concept "I want a piece of software that does X" into the English language prompts that get agents to generate the English language code that gets other agents to generate the Python code needed. And how hard it is going to be to evaluate the results.
I am reminded that at a previous company, we were put through a class on effective specifications. The manager who ran my session began by saying that they had analyzed a large number of internal error reports, and that roughly two-thirds of the errors could be traced to specifications that were either vague or not internally consistent. That fits very well with what you’re saying about how to use these tools.
This was really interesting. What I’m taking away from this is that the assistants that are more vibe-y are good for prototyping only. You probably don’t want to build anything that you’re concerned about getting hacked. (This can be super powerful for designers who just want to get their ideas out.)
The others are for professional engineers and still need a lot of hand holding, but are good for speeding repetitive tasks up.
One thing you didn’t mention is what is the quality of the code that they wrote. Six months from now is there going to be a problem when you need to make a change?
The point you make in 2nd last para on why we still need programmers who can use english vs C++/Java/Python as coding medium to give effect to a system is the key lot of people don't get it or it gets lost in the hype of "AI will replace programmers"
On your experiment even I have had superior experience with Cursor vs other tools.
Claude is for sure the best for vibe coding.
https://v0.dev/ is the silver medal
IMVHO
Thinking alike :) we just published a deep dive into 15 agents https://github.com/The-Focus-AI/june-2025-coding-agent-report
Coding agents are still useful only in small incremental doses. I use one myself, but any time one slips into caffeinated vibe mode, next day one needs to throw it all out and build it back in small pieces.
I am over 81 years old, and I used to program standalone Microsoft Executable Utilities that I would post online. I saw that these old utilities might not run in native mode with the advent of ARM computers. Rather than letting them possibly die, I use ChatGPT to convert some of them to successful standalone JavaScript utilities that I have uploaded here:
https://qb45.org/files.php?cat=2
Though this is “Vibe Coding” since I didn’t first start to learn JavaScript due to my age, I was able to succeed because I knew in intimate detail how my Windows executables were structured and was able to give precise prompts to ChatGPT.
Wow, this is fascinating. I use Cursor, with Claude as its backend. I recently went down a rabbit-hole where Cursor helped me refine a unit test. The test result got better and better, but I realized partway through that I was simply overfitting on that test, and the code was actually hiding problems. I find that Cursor generally helps a lot, but it doesn't replace human judgement in many situations.
I think the "programming in English" idea is wrong, in a few important ways. Most directly, as you noted, getting coding agents to work for you requires that you already know how to program. More generally, working with AI tools is, as Ethan Mollick keeps saying, a management task. Dealing with a coding agent (especially a fairly autonomous one like Codex) is like dealing with an army of junior developers. You tell them things in english, but also with references to code, and you have to read and review the code and potentially debug it in order to make the whole thing work. Most people who manage individual developers at tech companies are former (or current) programmers.
Maybe things will eventually evolve past this stage -- the head of search at Google, for example, doesn't write or read code as part of management. But the leap to that kind of autonomy is at least as big as the leap that got us to this stage in the first place.
I'm not sure I'm following you here. You are saying that vibe-coding isn't programming because you have to know how to program in order to vibe-code effectively?
When the industry first made the transition from machine code to compiled languages in the 1950s and 1960s, I assume there were a lot of programmers who straddled the line between the two—they'd write some programs in Fortran and some in assembly, or maybe they'd write most of their program in Fortran but hand-code the most performance-sensitive parts in assembly. I don't think it would have made sense to say that writing Fortran is not programming because you have to also know assembly to be good at it.
By the same token, in the future programming will involve writing instructions for the computer in a mixture of English and traditional programming languages like Python. Arguing over definitions isn't that interesting but I don't see why we can't describe the process of writing English prompts for a coding agent as a form of programming—especially if the prompt is detailed instructions that the agent is going to tranform fairly directly into traditional code.
I think that as things stand, working with AI programming tools is very different from working with a compiler. With a compiler, you basically treat the result as a black box that you almost never look at. But that isn't like (at least my) experience using AI tools. Instead the product of the tool is code, which you then have to deal with.
Instead I believe that the right model for thinking about an AI tool is the human programmer, and that's in fact the way that Codex and Claude Code and GitHubs new agent present themselves. You communicate with them in natural language as you do with a colleague, but they produce code that you (maybe) read like a colleague would.
I think the metaphor is not correct. A higher level programming language implements the machine code that is ultimately needed in a predictable and efficient way. If you use println, you can be assured that the machine code generated that implements this abstraction is highly optimized and reliable. In the case of vibe coding you can’t know this as the model generates in the best case working code, so the required functionality is implemented, but you can’t say if the solution is reliable and optimized- the code is generated based on training data and a statistical process that is completely obfuscated from you. So if you want to achieve pro results you need to be able to read the generated code, understand the algorithms implemented and you need to know if there are more efficient ways of implementation. Then you can guide the agent to refactor. Ultimately you will just save the time for typing. I don’t see how this can change in the future. All of these problems are inherent to the design of LLMs .
Tl,dr: compilers are a different category of abstraction and proper vibe coding (coding in English as programming language) for problems that are more complex is far away.
i worked at ibm for 30 years starting coding with mainframe assembly code - real bit pushing! at the end of my development career i was writing documents about requirements for software systems, suggestions on implementation options and reviewing documents written by other software architects. should we distinguish "coding" per se from "software engineering"
This was fun, I really enjoyed reading about the results of your experiment.
>People like to talk about coding agents replacing engineers, but I think it makes more sense to think about this the way Andrej Karpathy put it a couple of years ago: “The hottest new programming language is English.”
Yes, but also, there are a finite number of translation and creation steps happening between thought and thing. If an English description of code can be translated to C++ or Python code, then the remaining unknown is how hard it is going to be, not now but for future coding agents, to translate the concept "I want a piece of software that does X" into the English language prompts that get agents to generate the English language code that gets other agents to generate the Python code needed. And how hard it is going to be to evaluate the results.
I am reminded that at a previous company, we were put through a class on effective specifications. The manager who ran my session began by saying that they had analyzed a large number of internal error reports, and that roughly two-thirds of the errors could be traced to specifications that were either vague or not internally consistent. That fits very well with what you’re saying about how to use these tools.
This was really interesting. What I’m taking away from this is that the assistants that are more vibe-y are good for prototyping only. You probably don’t want to build anything that you’re concerned about getting hacked. (This can be super powerful for designers who just want to get their ideas out.)
The others are for professional engineers and still need a lot of hand holding, but are good for speeding repetitive tasks up.
One thing you didn’t mention is what is the quality of the code that they wrote. Six months from now is there going to be a problem when you need to make a change?
The point you make in 2nd last para on why we still need programmers who can use english vs C++/Java/Python as coding medium to give effect to a system is the key lot of people don't get it or it gets lost in the hype of "AI will replace programmers"
On your experiment even I have had superior experience with Cursor vs other tools.