The new model demonstrates enhanced planning capabilities and maintains productivity during longer work sessions compared to its predecessor, according to the company. It achieves state-of-the-art results on Terminal-Bench 2.0, an agentic coding evaluation, and leads frontier models on Humanity's Last Exam, a multidisciplinary reasoning test.
On GDPval-AA, which measures performance on knowledge work tasks across finance, legal, and other professional domains, Opus 4.6 outperforms OpenAI's GPT-5.2 by approximately 144 Elo points and its predecessor by 190 points, according to independent testing by Artificial Analysis.
The model features refined debugging and code review capabilities, allowing it to identify its own errors more effectively. Early access partners including GitHub, Notion, and Cursor reported improvements in handling complex, multi-step coding tasks and navigating large codebases.
Read Also: Claude AI Plans NASA's Perseverance Rover Mars Route
"Early testing shows Claude Opus 4.6 delivering on the complex, multi-step coding work developers face every day," said Mario Rodriguez, Chief Product Officer at GitHub.
Anthropic introduced several API features alongside the release, including adaptive thinking, which allows the model to determine when deeper reasoning is beneficial, and context compaction, which automatically summarizes older context in long conversations. The company also implemented four effort levels, low, medium, high, and max, giving developers granular control over the model's resource allocation.
"Early testing shows Claude Opus 4.6 delivering on the complex, multi-step coding work developers face every day," said Mario Rodriguez, Chief Product Officer at GitHub.
Anthropic introduced several API features alongside the release, including adaptive thinking, which allows the model to determine when deeper reasoning is beneficial, and context compaction, which automatically summarizes older context in long conversations. The company also implemented four effort levels, low, medium, high, and max, giving developers granular control over the model's resource allocation.
Read Also: Does the Pullout Method (Withdrawal) Actually Work?
The one-million-token context window represents a significant expansion from previous versions, though premium pricing applies for prompts exceeding 200,000 tokens. The model also supports outputs of up to 128,000 tokens.
In safety evaluations, Opus 4.6 demonstrated low rates of misaligned behaviors including deception and sycophancy, matching the safety profile of Claude Opus 4.5. The company conducted what it describes as the most comprehensive safety testing of any model to date, including new evaluations for user wellbeing and enhanced cybersecurity safeguards.
Harvey, a legal technology company, reported that the model achieved a 90.2% score on BigLaw Bench, with 40% perfect scores. Thomson Reuters noted meaningful improvements in long-context performance for research workflows.
The model is available through claude.ai, Anthropic's API, and major cloud platforms. Pricing remains unchanged at $5 per million input tokens and $25 per million output tokens, with premium rates for extended context usage.
Anthropic also released upgrades to Claude in Excel and introduced Claude in PowerPoint as a research preview, enabling the model to generate presentations while maintaining brand consistency with existing templates.
The one-million-token context window represents a significant expansion from previous versions, though premium pricing applies for prompts exceeding 200,000 tokens. The model also supports outputs of up to 128,000 tokens.
In safety evaluations, Opus 4.6 demonstrated low rates of misaligned behaviors including deception and sycophancy, matching the safety profile of Claude Opus 4.5. The company conducted what it describes as the most comprehensive safety testing of any model to date, including new evaluations for user wellbeing and enhanced cybersecurity safeguards.
Harvey, a legal technology company, reported that the model achieved a 90.2% score on BigLaw Bench, with 40% perfect scores. Thomson Reuters noted meaningful improvements in long-context performance for research workflows.
The model is available through claude.ai, Anthropic's API, and major cloud platforms. Pricing remains unchanged at $5 per million input tokens and $25 per million output tokens, with premium rates for extended context usage.
Anthropic also released upgrades to Claude in Excel and introduced Claude in PowerPoint as a research preview, enabling the model to generate presentations while maintaining brand consistency with existing templates.



I truly appreciate you spending your valuable time here. To help make this blog the best it can be, I would love your feedback on this post. Let me know in the comments: How could this article be better? Was it clear? Did it have the right amount of detail? Did you notice any errors?
If you found any of the articles helpful, please consider sharing it.