I went hands-on with ChatGPT Codex and the vibe was not good – here’s what happened

viceblur5gettyimages-1371081152 — Aleksandra Konoplia/Getty Pictures

ZDNET’s key takeaways

ChatGPT Codex wrote code and saved me time.
It additionally created a severe bug, however it was in a position to get well.
Codex continues to be primarily based on the GPT-4 LLM structure.

Properly, vibe coding this isn’t. I discovered the expertise to be gradual, cumbersome, annoying, and incomplete. However it all labored out in the long run.

ChatGPT Codex is ChatGPT’s agentic device devoted to code writing and modification. It could possibly entry your GitHub repository, make adjustments, and subject pull requests. You’ll be able to then evaluate the outcomes and determine whether or not or to not incorporate them.

The 10 products our readers bought most ahead of Black Friday (No. 1 will stop your doomscrolling)

November 29, 2025

I found the best Black Friday iPad deals 2025: Shop Apple discounts up to 25%

November 28, 2025

Additionally: How to move your codebase into GitHub for analysis by ChatGPT Deep Research – and why you should

My main improvement undertaking is a PHP and JavaScript-based WordPress plugin for website safety. There is a fundamental plugin obtainable without spending a dime, and a few add-on plugins that improve the capabilities of the core plugin. My non-public improvement repo incorporates all of this, in addition to some upkeep plugins I depend on for consumer assist.

This repo incorporates 431 information. That is the primary time I’ve tried to get an AI to work throughout my total ecosystem of plugins in a personal repository. I beforehand used Jules to add a feature to the core plugin, however as a result of it solely had entry to the core plugin’s open supply repository, it could not take note of your complete ecosystem of merchandise.

Earlier final week, I made a decision to present ChatGPT Codex a run at my code. Then this occurred.

GPT-5 launched

On Thursday, GPT-5 slammed into the AI world like a freight prepare. Initially, OpenAI tried to power everybody to make use of the brand new mannequin. Subsequently, they added legacy mannequin assist when lots of their clients went ballistic.

I ran GPT-5 towards my set of programming tests, and it failed half of them. So, I used to be significantly interested in whether or not Codex nonetheless supported the GPT-4 structure or would power builders into GPT-5.

Nevertheless, after I queried Codex 5 days after GPT-5 launched, the AI responded that it was nonetheless primarily based on “OpenAl’s GPT-4 structure.”

not-gpt-5 — Screenshot by David Gewirtz/ZDNET

I took two issues from that:

OpenAI is not prepared to maneuver Codex coding to GPT-5 (which, recall, failed half my exams).
The outcomes, conclusions, and screenshots I took of my Codex exams are nonetheless legitimate, since Codex continues to be primarily based on GPT-4.

With that, right here is the results of my still-very-much-not-GPT-5 have a look at ChatGPT Codex.

Getting began

My first step was asking ChatGPT Codex to look at the codebase. I used the Ask mode of Codex, which does evaluation, however would not really change any code.

examine — Screenshot by David Gewirtz/ZDNET

I hoped for one thing as deep and complete because the one I acquired from ChatGPT Deep Research a few months ago, however as an alternative, I acquired a a lot much less full evaluation.

overview — Screenshot by David Gewirtz/ZDNET

I discovered a simpler method was to ask Codex to do a fast safety audit and let me know if there have been any points. This is how I prompted it.

Establish any severe safety issues. Ignore plugins Anybody With Hyperlink, License Fixer, and Settings Nuker. Anybody With Hyperlink is within the very early phases of coding, and isn’t prepared for code evaluate. License Fixer and Settings Nuker are specialty plugins that don’t want a safety audit.

Codex recognized three fundamental areas for enchancment.

security-issues — Screenshot by David Gewirtz/ZDNET

All three areas had been legitimate, though I’m not ready to switch the serialization knowledge construction presently, as a result of I am saving that for an entire preferences overhaul. The $_POST grievance is managed, however with a unique method than Codex seen.

Additionally: The best AI for coding in 2025 (and what not to use)

The third space — the nonce and cross-site request forgery (CSRF) danger — was one thing value altering immediately. Whereas entry to the consumer interface for the plugin is assumed to be decided by login position, the plugins themselves do not explicitly examine that the particular person submitting the plugin settings for motion is allowed to take action.

That is what I made a decision to ask Codex to repair.

Fixing the code

Subsequent up, I instructed Codex to make fixes within the code. I modified the setting from Ask mode to Code mode so the AI would really try adjustments. As with ChatGPT Agent, Codex spins up a digital terminal to do a few of its work.

terminal — Screenshot by David Gewirtz/ZDNET

When the method accomplished, Codex confirmed a diff (the distinction between authentic and to-be-modified code).

diff-1 — Screenshot by David Gewirtz/ZDNET

I used to be heartened to see that the adjustments had been fairly surgical. Codex did not attempt to rewrite massive sections of the plugin; it simply modified the small areas that wanted enchancment.

In a couple of areas, it dug in and altered a couple of extra strains, however these adjustments had been nonetheless fairly particular to the unique immediate.

At one level, I used to be curious to know why it added a brand new foreach loop to iterate over an array, so I requested.

added-question — Screenshot by David Gewirtz/ZDNET

As you’ll be able to see above, I received again a reasonably clear response on its reasoning. It made sense, so I moved on, persevering with to evaluate Codex’s proposed adjustments.

All instructed, Codex proposed making adjustments to 9 separate information. As soon as I used to be glad with the adjustments, I clicked Create PR. That creates a pull request, which is how any GitHub consumer suggests adjustments to a codebase. As soon as the PR is created, the undertaking proprietor (me, on this case) has the choice to approve these adjustments, which provides them into the precise code.

It is a good mechanism, and Codex does a clear job of working inside GitHub’s atmosphere.

pr-request-1 — Screenshot by David Gewirtz/ZDNET

As soon as I used to be satisfied the adjustments had been good, I merged Codex’s work again into the primary codebase.

Houston, now we have an issue

I introduced the adjustments down from GitHub to my take a look at machine and tried to run the now-modified plugin. Watch for it…

boom — Screenshot by David Gewirtz/ZDNET

Yeah. That is not what’s speculated to occur. To be honest, I’ve generated my very own share of error screens similar to that, so I am unable to actually get offended on the AI.

As a substitute, I took a screenshot of the error and handed it to Codex, together with a immediate telling Codex, “Selective Content material plugin now fails after making adjustments you urged. Listed below are the errors.”

It took the AI three minutes to counsel a repair, which it introduced to me in a brand new diff.

diff-3 — Screenshot by David Gewirtz/ZDNET

I merged that become the codebase, as soon as once more introduced it right down to my take a look at server, and it labored. Disaster averted.

No vibe, no stream

After I’m not in a rush and I’ve the time, coding can present a really nice way of thinking. I get right into a kind of stream with the language, the machine, and what looks like a connection between my fingers and the pc’s CPU. Not solely is it quite a lot of enjoyable, however it will also be emotionally transcendent.

Working with ChatGPT Codex was not enjoyable. It wasn’t hateful. It simply wasn’t enjoyable. It felt extra like exchanging emails with a very recalcitrant contractor than having a gathering of the minds with a coding buddy.

Additionally: How to use GPT-5 in VS Code with GitHub Copilot

Codex offered its responses in about 10 or quarter-hour, whereas the identical code would in all probability have taken me a couple of hours.

Would I’ve created the identical bug as Codex? In all probability not. As a part of the method of considering by means of that algorithm, I almost certainly would have averted the error Codex made. However I undoubtedly would have created a couple of extra bugs primarily based on mistyping or syntax errors.

To be honest, had I launched the identical bug as Codex did, it could have taken me significantly longer than three minutes to seek out and repair it. Add one other hour or so no less than.

So Codex did the job, however I wasn’t in stream. Usually, after I code and I am inside a selected file or subsystem, I do quite a lot of work in that space. It is like cleansing day. For those who’re cleansing one a part of the toilet, you would possibly as nicely clear all of it.

However Codex clearly works greatest with small, easy directions. Give it one class of change, and work by means of that one change earlier than introducing new components. Like I mentioned, it does work and it’s a useful gizmo. However utilizing it positively felt like extra of a chore than programming usually does, despite the fact that it saved me quite a lot of time.

Additionally: Google’s Jules AI coding agent built a new feature I could actually ship – while I made coffee

I haven’t got tangible take a look at outcomes, however after testing Google’s Jules in Might and ChatGPT’s Codex now, I get the impression that Jules is ready to get a deeper understanding of the code. At this level, I am unable to actually assist that assertion with quite a lot of knowledge; it is simply an impression.

I’ll attempt working one other undertaking by means of Jules. It will likely be attention-grabbing to see if Codex adjustments a lot as soon as OpenAI feels protected sufficient to include GPT-5. Let’s understand that OpenAI eats its personal pet food with Codex, which means it makes use of Codex to construct its code. They could have seen the identical iffy outcomes I discovered in my exams. They is perhaps ready till GPT-5 has baked for a bit longer.

Have you ever tried utilizing AI coding instruments like ChatGPT Codex or Google’s Jules in your improvement workflow? What sorts of duties did you throw at them? How nicely did they carry out? Did you are feeling like the method helped you’re employed extra effectively? Did it gradual you down and take you out of your coding stream?

Do you favor giving your instruments small, surgical jobs, or are you searching for an agent that may deal with big-picture structure and reasoning? Tell us within the feedback under.

You’ll be able to comply with my day-to-day undertaking updates on social media. Make sure you subscribe to my weekly update newsletter, and comply with me on Twitter/X at @DavidGewirtz, on Fb at Facebook.com/DavidGewirtz, on Instagram at Instagram.com/DavidGewirtz, on Bluesky at @DavidGewirtz.com, and on YouTube at YouTube.com/DavidGewirtzTV.

Source link