Close Menu

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    Google fixes bug that could reveal users’ private phone numbers

    June 9, 2025

    Qualcomm to acquire semiconductor firm Alphawave Semi for $2.4B

    June 9, 2025

    PayPal is adding hotel booking within its app, powered by Selfbook

    June 9, 2025
    Facebook X (Twitter) Instagram
    • Home
    • Technology
    • Gaming
    • Phones
    • Buy Now
    Facebook X (Twitter) Instagram Pinterest Vimeo
    My BlogMy Blog
    • Home
    • Features
      • Example Post
      • Typography
      • Contact
      • View All On Demos
    • Technology

      Is the Hyperloop Doomed? What Elon Musk’s Latest Setback Really Means

      March 10, 2022

      The Best Early Black Friday Deals on Gaming Laptops and Accessories

      March 10, 2022

      Apple Watch’s ECG Can Help Diagnose Heart Problem: Research

      January 19, 2021

      Simple Tips and Tricks to Take Care of Your Expensive DSLR Camera

      January 16, 2021

      Tech Study Reveals Effects of Mobile Technology on Professionals

      January 15, 2021
    • Typography
    • Phones
      1. Technology
      2. Gaming
      3. Gadgets
      4. View All

      Is the Hyperloop Doomed? What Elon Musk’s Latest Setback Really Means

      March 10, 2022

      The Best Early Black Friday Deals on Gaming Laptops and Accessories

      March 10, 2022

      Apple Watch’s ECG Can Help Diagnose Heart Problem: Research

      January 19, 2021

      Simple Tips and Tricks to Take Care of Your Expensive DSLR Camera

      January 16, 2021

      Game Development This Week: Save On Essential Tools and More

      November 19, 2022

      Riot Games Acquires a Wargaming Studio to Help With Live Game Development

      March 10, 2022

      Keep Talking and Nobody Explodes: A Boomer Gaming in VR

      March 12, 2021

      Hologate Announces New Plans for First Large Format World VR Arcade

      January 16, 2021
      8.9

      DJI Avata Review: Immersive FPV Flying For Drone Enthusiasts

      January 15, 2021
      8.9

      Bose QuietComfort Earbuds II: Noise-Cancellation Kings Reviewed

      January 15, 2021

      Thousands Of PC Games Discounted In New Black Friday Sale

      January 15, 2021

      Could Solar-Powered Headphones Be The Next Must-Have?

      January 15, 2021

      Will Using a VPN on Phone Helps Protect You from Ransomware?

      January 14, 2021

      Popular New Xbox Game Pass Game Being Review Bombed With “0s”

      January 14, 2021

      Google Says Surveillance Vendor Targeted Samsung Phones

      January 14, 2021

      Why Are iPhones More Expensive Than Android Phones?

      January 14, 2021
    • Buy Now
    Subscribe
    My BlogMy Blog
    Home»Uncategorized»OpenAI’s Codex is part of a new cohort of agentic coding tools
    Uncategorized

    OpenAI’s Codex is part of a new cohort of agentic coding tools

    Y U RajuBy Y U RajuMay 20, 2025No Comments4 Mins Read
    Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
    Share
    Facebook Twitter LinkedIn Pinterest Email


    Last Friday, OpenAI introduced a new coding system called Codex, designed to perform complex programming tasks from natural language commands. Codex moves OpenAI into a new cohort of agentic coding tools that is just beginning to take shape.

    From GitHub’s early Copilot to contemporary tools like Cursor and Windsurf, most AI coding assistants operate as an exceptionally intelligent form of autocomplete. The tools generally live in an integrated development environment, and users interact directly with the AI-generated code. The prospect of simply assigning a task and returning when it’s finished is largely out of reach. 

    But these new agentic coding tools, led by products like Devin, SWE-Agent, OpenHands, and the aforementioned OpenAI Codex, are designed to work without users ever having to see the code. The goal is to operate like the manager of an engineering team, assigning issues through workplace systems like Asana or Slack and checking in when a solution has been reached. 

    For believers in forms of highly capable AI, it’s the next logical step in a natural progression of automation taking over more and more software work.

    “In the beginning, people just wrote code by pressing every single keystroke,” explains Kilian Lieret, a Princeton researcher and member of the SWE-Agent team. “GitHub Copilot was the first product that offered real auto-complete, which is kind of stage two. You’re still absolutely in the loop, but sometimes you can take a shortcut.” 

    The goal for agentic systems is to move beyond developer environments entirely, instead presenting coding agents with an issue and leaving them to resolve it on their own. “We pull things back to the management layer, where I just assign a bug report and the bot tries to fix it completely autonomously,” says Lieret.

    It’s an ambitious aim, and so far, it’s proven difficult.

    After Devin became generally available at the end of 2024, it drew scathing criticism from YouTube pundits, as well as a more measured critique from an early client at Answer.AI. The overall impression was a familiar one for vibe-coding veterans: with so many errors, overseeing the models takes as much work as doing the task manually. (While Devin’s rollout has been a bit rocky, it hasn’t stopped fundraisers from recognizing the potential – in March, Devin’s parent company, Cognition AI, reportedly raised hundreds of millions of dollars at a $4 billion valuation.)

    Even supporters of the technology caution against unsupervised vibe-coding, seeing the new coding agents as powerful elements in a human-supervised development process.

    “Right now, and I would say, for the foreseeable future, a human has to step in at code review time to look at the code that’s been written,” says Robert Brennan, the CEO of All Hands AI, which maintains OpenHands. “I’ve seen several people work themselves into a mess by just auto-approving every bit of code that the agent writes. It gets out of hand fast.”

    Hallucinations are an ongoing problem as well. Brennan recalls one incident in which, when asked about an API that had been released after the OpenHands agent’s training data cutoff, the agent fabricated details of an API that fit the description. All Hands AI says it’s working on systems to catch these hallucinations before they can cause harm, but there isn’t a simple fix.

    Arguably the best measure of agentic programming progress is the SWE-Bench leaderboards, where developers can test their models against a set of unresolved issues from open GitHub repositories. OpenHands currently holds the top spot on the verified leaderboard, solving 65.8% of the problem set. OpenAI claims that one of the models powering Codex, codex-1, can do better, listing a 72.1% score in its announcement – although the score came with a few caveats and hasn’t been independently verified.

    The concern among many in the tech industry is that high benchmark scores don’t necessarily translate to truly hands-off agentic coding. If agentic coders can only solve three out of every four problems, they’re going to require significant oversight from human developers – particularly when tackling complex systems with multiple stages.

    Like most AI tools, the hope is that improvements to foundation models will come at a steady pace, eventually enabling agentic coding systems to grow into reliable developer tools. But finding ways to manage hallucinations and other reliability issues will be crucial for getting there.

    “I think there is a little bit of a sound barrier effect,” Brennan says. “The question is, how much trust can you shift to the agents, so they take more out of your workload at the end of the day?”



    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Previous ArticleGravitee, a platform that helps companies manage APIs, raises $60M
    Next Article Brex partners with former competitor Zip, with an eye on reducing cash burn to get to an IPO
    Y U Raju

    Related Posts

    Uncategorized

    Google fixes bug that could reveal users’ private phone numbers

    June 9, 2025
    Uncategorized

    Qualcomm to acquire semiconductor firm Alphawave Semi for $2.4B

    June 9, 2025
    Uncategorized

    PayPal is adding hotel booking within its app, powered by Selfbook

    June 9, 2025
    Add A Comment
    Leave A Reply Cancel Reply

    Demo
    Top Posts

    IBM acquires data analysis startup Seek AI, opens AI accelerator in NYC

    June 2, 20259 Views

    Windsurf says Anthropic is limiting its direct access to Claude AI models

    June 4, 20258 Views

    DeepSeek may have used Google’s Gemini to train its latest model

    June 3, 20257 Views
    Stay In Touch
    • Facebook
    • YouTube
    • TikTok
    • WhatsApp
    • Twitter
    • Instagram
    Latest Reviews
    85
    Featured

    Pico 4 Review: Should You Actually Buy One Instead Of Quest 2?

    thf0oJanuary 15, 2021
    8.1
    Uncategorized

    A Review of the Venus Optics Argus 18mm f/0.95 MFT APO Lens

    thf0oJanuary 15, 2021
    8.9
    Editor's Picks

    DJI Avata Review: Immersive FPV Flying For Drone Enthusiasts

    thf0oJanuary 15, 2021

    Subscribe to Updates

    Get the latest tech news from FooBar about tech, design and biz.

    Demo
    Most Popular

    IBM acquires data analysis startup Seek AI, opens AI accelerator in NYC

    June 2, 20259 Views

    Windsurf says Anthropic is limiting its direct access to Claude AI models

    June 4, 20258 Views

    DeepSeek may have used Google’s Gemini to train its latest model

    June 3, 20257 Views
    Our Picks

    Google fixes bug that could reveal users’ private phone numbers

    June 9, 2025

    Qualcomm to acquire semiconductor firm Alphawave Semi for $2.4B

    June 9, 2025

    PayPal is adding hotel booking within its app, powered by Selfbook

    June 9, 2025

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    Facebook X (Twitter) Instagram Pinterest
    • Home
    • Technology
    • Gaming
    • Phones
    • Buy Now
    © 2025 ThemeSphere. Designed by ThemeSphere.

    Type above and press Enter to search. Press Esc to cancel.