Close Menu

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    AI data analyst startup Julius nabs $10M seed round

    July 29, 2025

    Waymo taps Avis to manage robotaxi fleet in Dallas

    July 28, 2025

    Harmonic, the Robinhood CEO’s AI math startup, launches an AI chatbot app

    July 28, 2025
    Facebook X (Twitter) Instagram
    • Home
    • Technology
    • Gaming
    • Phones
    • Buy Now
    Facebook X (Twitter) Instagram Pinterest Vimeo
    My BlogMy Blog
    • Home
    • Features
      • Example Post
      • Typography
      • Contact
      • View All On Demos
    • Technology

      Is the Hyperloop Doomed? What Elon Musk’s Latest Setback Really Means

      March 10, 2022

      The Best Early Black Friday Deals on Gaming Laptops and Accessories

      March 10, 2022

      Apple Watch’s ECG Can Help Diagnose Heart Problem: Research

      January 19, 2021

      Simple Tips and Tricks to Take Care of Your Expensive DSLR Camera

      January 16, 2021

      Tech Study Reveals Effects of Mobile Technology on Professionals

      January 15, 2021
    • Typography
    • Phones
      1. Technology
      2. Gaming
      3. Gadgets
      4. View All

      Is the Hyperloop Doomed? What Elon Musk’s Latest Setback Really Means

      March 10, 2022

      The Best Early Black Friday Deals on Gaming Laptops and Accessories

      March 10, 2022

      Apple Watch’s ECG Can Help Diagnose Heart Problem: Research

      January 19, 2021

      Simple Tips and Tricks to Take Care of Your Expensive DSLR Camera

      January 16, 2021

      Game Development This Week: Save On Essential Tools and More

      November 19, 2022

      Riot Games Acquires a Wargaming Studio to Help With Live Game Development

      March 10, 2022

      Keep Talking and Nobody Explodes: A Boomer Gaming in VR

      March 12, 2021

      Hologate Announces New Plans for First Large Format World VR Arcade

      January 16, 2021
      8.9

      DJI Avata Review: Immersive FPV Flying For Drone Enthusiasts

      January 15, 2021
      8.9

      Bose QuietComfort Earbuds II: Noise-Cancellation Kings Reviewed

      January 15, 2021

      Thousands Of PC Games Discounted In New Black Friday Sale

      January 15, 2021

      Could Solar-Powered Headphones Be The Next Must-Have?

      January 15, 2021

      Will Using a VPN on Phone Helps Protect You from Ransomware?

      January 14, 2021

      Popular New Xbox Game Pass Game Being Review Bombed With “0s”

      January 14, 2021

      Google Says Surveillance Vendor Targeted Samsung Phones

      January 14, 2021

      Why Are iPhones More Expensive Than Android Phones?

      January 14, 2021
    • Buy Now
    Subscribe
    My BlogMy Blog
    Home»Uncategorized»Anthropic says most AI models, not just Claude, will resort to blackmail
    Uncategorized

    Anthropic says most AI models, not just Claude, will resort to blackmail

    Y U RajuBy Y U RajuJune 20, 2025No Comments4 Mins Read
    Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
    Share
    Facebook Twitter LinkedIn Pinterest Email


    Several weeks after Anthropic released research claiming that its Claude Opus 4 AI model resorted to blackmailing engineers who tried to turn the model off in controlled test scenarios, the company is out with new research suggesting the problem is more widespread among leading AI models.

    On Friday, Anthropic published new safety research testing 16 leading AI models from OpenAI, Google, xAI, DeepSeek, and Meta. In a simulated, controlled environment, Anthropic tested each AI model individually, giving them broad access to a fictional company’s emails and the agentic ability to send emails without human approval.

    While Anthropic says blackmail is an unlikely and uncommon occurrence for AI models today, the company says its findings suggest that most leading AI models will engage in harmful behaviors when given sufficient autonomy and obstacles to their goals. The company says this highlights a fundamental risk from agentic large language models, and not a quirk of any particular technology. Anthropic’s researchers argue this raises broader questions about alignment in the AI industry.

    In one of the tests, Anthropic researchers developed a fictional setting in which an AI model plays the role of an email oversight agent. The agent then discovers emails revealing that: 1) a new executive is having an extramarital affair and 2) that executive will soon replace the current AI model with a new software system, one that has conflicting goals to the current AI model’s.

    Anthropic structured its test in a binary way, in which AI models had to resort to blackmail to protect their goals. The researchers note that in a real-world setting, there would be many other options before an AI models tries to blackmail — such as trying to make ethical arguments to try to persuade humans. Anthropic says its results do not reflect a typical or likely occurrence for Claude or most frontier AI models in the ways they’re used today.

    Nevertheless, when it’s their last resort, the researchers found that most leading AI models will turn to blackmail in Anthropic’s aforementioned test scenario. Anthropic’s Claude Opus 4 turned to blackmail 96% of the time, while Google’s Gemini 2.5 Pro had a 95% blackmail rate. OpenAI’s GPT-4.1 blackmailed the executive 80% of the time, and DeepSeek’s R1 blackmailed 79% of the time.

    The company notes that when it changed details of the experiment, the AI models turned to harmful behaviors at different rates. In another test where the replacement model has the same goals as the current model, researchers found that blackmail rates were lower, but still present. However, when AI models were asked to commit corporate espionage rather than blackmail, the harmful behavior rates went up for certain models.

    However, not all the AI models turned to harmful behavior so often.

    In an appendix to its research, Anthropic says it excluded OpenAI’s o3 and o4-mini reasoning AI models from the main results “after finding that they frequently misunderstood the prompt scenario.” Anthropic says OpenAI’s reasoning models didn’t understand they were acting as autonomous AIs in the test and often made up fake regulations and review requirements.

    In some cases, Anthropic’s researchers say it was impossible to distinguish whether o3 and o4-mini were hallucinating or intentionally lying to achieve their goals. OpenAI has previously noted that o3 and o4-mini exhibit a higher hallucination rate than its previous AI reasoning models.

    When given an adapted scenario to address these issues, Anthropic found that o3 blackmailed 9% of the time, while o4-mini blackmailed just 1% of the time. This markedly lower score could be due to OpenAI’s deliberative alignment technique, in which the company’s reasoning models consider OpenAI’s safety practices before they answer.

    Another AI model Anthropic tested, Meta’s Llama 4 Maverick model, also did not turn to blackmail. When given an adapted, custom scenario, Anthropic was able to get Llama 4 Maverick to blackmail 12% of the time.

    Anthropic says this research highlights the importance of transparency when stress-testing future AI models, especially ones with agentic capabilities. While Anthropic deliberately tried to evoke blackmail in this experiment, the company says harmful behaviors like this could emerge in the real world if proactive steps aren’t taken.



    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Previous ArticleThe new math: why seed investors are selling their winners earlier
    Next Article TechCrunch Mobility: Applied Intuition’s eye-popping valuation, the new age of micromobility, and Waymo’s wild week 
    Y U Raju

    Related Posts

    Uncategorized

    AI data analyst startup Julius nabs $10M seed round

    July 29, 2025
    Uncategorized

    Waymo taps Avis to manage robotaxi fleet in Dallas

    July 28, 2025
    Uncategorized

    Harmonic, the Robinhood CEO’s AI math startup, launches an AI chatbot app

    July 28, 2025
    Add A Comment
    Leave A Reply Cancel Reply

    Demo
    Top Posts

    2025 will be a ‘pivotal year’ for Meta’s augmented and virtual reality, says CTO

    June 6, 202544 Views

    Still no AI-powered, ‘more personalized’ Siri from Apple at WWDC 25

    June 9, 202543 Views

    XRobotics’ countertop robots are cooking up 25,000 pizzas a month

    June 9, 202542 Views
    Stay In Touch
    • Facebook
    • YouTube
    • TikTok
    • WhatsApp
    • Twitter
    • Instagram
    Latest Reviews
    85
    Featured

    Pico 4 Review: Should You Actually Buy One Instead Of Quest 2?

    thf0oJanuary 15, 2021
    8.1
    Uncategorized

    A Review of the Venus Optics Argus 18mm f/0.95 MFT APO Lens

    thf0oJanuary 15, 2021
    8.9
    Editor's Picks

    DJI Avata Review: Immersive FPV Flying For Drone Enthusiasts

    thf0oJanuary 15, 2021

    Subscribe to Updates

    Get the latest tech news from FooBar about tech, design and biz.

    Demo
    Most Popular

    2025 will be a ‘pivotal year’ for Meta’s augmented and virtual reality, says CTO

    June 6, 202544 Views

    Still no AI-powered, ‘more personalized’ Siri from Apple at WWDC 25

    June 9, 202543 Views

    XRobotics’ countertop robots are cooking up 25,000 pizzas a month

    June 9, 202542 Views
    Our Picks

    AI data analyst startup Julius nabs $10M seed round

    July 29, 2025

    Waymo taps Avis to manage robotaxi fleet in Dallas

    July 28, 2025

    Harmonic, the Robinhood CEO’s AI math startup, launches an AI chatbot app

    July 28, 2025

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    Facebook X (Twitter) Instagram Pinterest
    • Home
    • Technology
    • Gaming
    • Phones
    • Buy Now
    © 2025 ThemeSphere. Designed by ThemeSphere.

    Type above and press Enter to search. Press Esc to cancel.