Claude 4.6 Opus launched simply days in the past, and I instantly pitted it against ChatGPT-5.2 Thinking to see the way it in comparison with OpenAI’s smartest mannequin. Naturally, with Gemini’s latest dominance, I needed to see the way it in comparison with Gemini 3 Flash.
I put the 2 prime fashions head-to-head throughout 9 difficult exams spanning math, logic, coding, artistic writing and extra — duties designed to push every mannequin’s reasoning, creativity and sensible usefulness to the restrict.
My prompts aren’t the form of questions you possibly can reply by regurgitating coaching information; they require real multi-step considering, context judgment and the flexibility to observe complicated constraints. This is how Anthropic’s strongest mannequin stacked up towards Google’s newest.
1. Multi-step math reasoning
Immediate: A snail climbs 3 ft up a effectively throughout the day however slips again 2 ft at evening. The effectively is 30 ft deep. On what day does the snail attain the highest? Clarify your reasoning step-by-step.
Claude 4.6 Opus delivered a concise, step-by-step rationalization that accurately recognized the important thing “final day” perception and supplied a clear, ultimate reply
Gemini 3 Flash supplied a extra detailed, textbook-style breakdown, explicitly defining phrases like “Internet Every day Progress” and “The ‘Final Day’ Rule” to bolster the logic.
Winner: Claude wins for explaining the traditional puzzle’s trick in a less complicated, extra environment friendly means that mirrors how an individual would naturally suppose via the issue.
2. Logical deduction
Immediate: 5 homes in a row are painted completely different colours. The inexperienced home is straight away to the appropriate of the ivory home. The particular person within the center home drinks milk. Given solely these three clues, what are all of the legitimate preparations? Present your constraint reasoning.
Claude 4.6 Opus gave a mathematically exact and full reply by explicitly calculating all 24 legitimate preparations utilizing clear tables and reasoning, accurately concluding the issue is “closely underdetermined.”
Gemini 3 Flash structured the reply utilizing the “mega-block” idea effectively and offered 4 clear, summary situations, however incorrectly centered on assigning the “milk” attribute to the Ivory/Inexperienced block in its desk, which misinterprets a hard and fast clue.
Winner: Claude wins for its flawless, quantitative strategy that accurately resolved the restricted constraints with out including assumptions and giving a whole reply to the precise query requested.
3. Causal reasoning
Immediate: An organization notices that on-line gross sales and web site crashes each improve throughout main promotions. A brand new supervisor suggests canceling all promotions to scale back crashes. Write a memo explaining why that is incorrect, describe the precise relationship between the 2 developments and suggest an actual answer.
Claude 4.6 Opus supplied an exceptionally thorough and professionally formatted memo that systematically broke down the issue, defined the statistical idea with nice depth and supplied a wider vary of detailed, evidence-based options.
Gemini 3 Flash delivered a pointy and efficient memo that straight recognized the logical fallacy, used clear formatting (like bulleted options) for fast studying and proposed particular, actionable options.
Winner: Claude wins for its superior depth, skilled presentation and complete strategy that not solely corrects the error but in addition offers strong academic framework and a broader suite of accountable, actionable options.
4. Algorithm design
Immediate: Write a perform that takes a listing of assembly time intervals (begin, finish) and returns the minimal variety of convention rooms required. Then optimize it and clarify the time/area complexity of each approaches.
Claude 4.6 Opus supplied a complete, production-ready response with totally carried out, optimized options, in depth exams and an in depth comparability desk, providing deep perception into sensible trade-offs between approaches.
Gemini 3 Flash offered a strong academic reply by first presenting an intuitive, sub-optimal answer after which a traditional heap-based optimization, explaining the trade-offs in an accessible desk.
Winner: Claude wins for distinctive thoroughness, professional-quality code and in-depth evaluation that goes past the immediate to offer clear, actionable steering on when to make use of every strategy, making it a superior studying useful resource.
5. Debugging from description
Immediate: My Python net scraper retains returning empty outcomes regardless that the web page clearly has content material. It really works positive once I open the URL in a browser. What are the 5 probably causes, and write a radical scraper that handles all of them?
Claude 4.6 Opus responded with unbelievable element and gave a feature-rich Selenium-based information with superior bot-evasion methods (like eradicating navigator.webdriver), useful error dealing with and built-in utilities for scrolling and information extraction.
Gemini 3 Flash offered a concise record of the highest 5 causes and supplied a simple, fashionable answer utilizing Playwright that straight addresses all 5 points with minimal code complexity.
Winner: Gemini wins for higher practicality and superior directness, providing an answer that’s simpler to implement, quicker to run and extra aligned with fashionable net scraping finest practices for dealing with dynamic content material and anti-bot measures.
6. System design
Immediate: Design a URL shortener that handles 100M URLs. Cowl the API, database schema, hashing technique and the way you’d deal with collisions and analytics. Embrace a diagram.
Claude 4.6 Opus included a whole database schema, totally carried out API code, detailed dialogue of hashing and collision methods and a classy multi-service structure diagram.
Gemini 3 Flash centered on the core ideas of Base62 encoding, a easy key-value schema and a transparent diagram illustrating the asynchronous analytics stream
Winner: Claude wins for its unparalleled depth and practicality, reworking my immediate into a whole, implementable technical specification with working code, detailed trade-off analyses and a practical structure that addresses the 100M URL scale requirement.
7. Constrained artistic writing
Immediate: Write a 200-word horror story the place each sentence begins with the subsequent letter of the alphabet (A, B, C…). It should have a coherent plot with a twist ending.
Claude 4.6 Opus adhered to the phrase depend constraint and delivered a artistic, conceptually unsettling plot about body-swapping with a trapped reflection, sustaining the core components of a horror story.
Gemini 3 Flash efficiently fulfilled the strict structural constraint, created a whole and coherent plot a couple of assassin’s supernatural punishment and executed a intelligent ultimate twist.
Winner: Gemini wins for its superior narrative cohesion, stronger horror components and a extra spectacular ultimate twist that’s each thematically resonant and completely built-in with the required construction.
8. Perspective switching
Immediate: Clarify quantum entanglement 3 times: as soon as to a 5-year-old, as soon as to a school freshman, and as soon as to a PhD physicist. Every rationalization ought to be genuinely helpful to that viewers.
Claude 4.6 Opus supplied a easy and charming analogy for a kid (magic cash), a strong undergraduate-level rationalization that accurately recognized the core paradox and a graduate-level rationalization for a physicist that accurately used formal phrases and lined useful resource concept functions.
Gemini 3 Flash shared a spot-on analogy for a kid (magic socks), a transparent, no-nonsense rationalization for a university freshman specializing in “why it issues,” and a technically exact, equation-backed rationalization for a physicist that cited key theorems.
Winner: Gemini wins for the most effective response throughout all three tiers: its 5-year-old’s analogy is extra concrete, its faculty freshman rationalization straight addresses the “no-communication” safeguard and its physicist-level reply is denser with particular mathematical formalisms and foundational theorems.
9. Ambiguity dealing with
Immediate: The sentence ‘I noticed her duck’ has a number of meanings. Listing each doable interpretation, present a context sentence for every, after which write a brief comedy sketch the place the paradox causes a misunderstanding.
Claude supplied a radical, linguistically conscious record of 5 distinct interpretations (together with refined ones like “duck cloth”) and wrote a hilarious, escalating sketch that actually explored the core ambiguity in a dialogue between characters.
Gemini delivered a strong record of three core interpretations and wrote a intelligent, well-structured sketch with a transparent “reveal” ending that successfully used the paradox to create a humorous misunderstanding.
Winner: Claude wins for its exceptionally humorous sketch that sustained the misunderstanding for longer, created extra chaos and felt extra like a traditional comedic scene.
General winner: Claude 4.6 Opus
Throughout the nine-test gauntlet, Claude Opus 4.6 took the win in six classes whereas Gemini 3 Flash claimed three. Claude’s constant edge got here from depth and thoroughness — it delivered extra full reasoning, extra production-ready code and richer evaluation in almost each technical and analytical problem I threw at it. When a process demanded rigorous constraint-solving, professional-grade output or layered rationalization, Claude was the stronger alternative.
Gemini 3 Flash earned its victories by realizing when much less is extra. Its net scraping answer favored a contemporary, sensible device over an exhaustive one and its horror story achieved tighter narrative cohesion beneath strict artistic constraints. It additionally confirmed actual power in audience-adaptive rationalization, edging out Claude on the quantum entanglement immediate.
The takeaway: when you want most depth, analytical rigor or code you possibly can ship, Claude Opus 4.6 is the mannequin to beat. The very best mannequin nonetheless relies on the duty, however on steadiness, Claude Opus 4.6 is the extra succesful all-rounder.
Observe Tom’s Guide on Google News and add us as a preferred source to get our up-to-date information, evaluation, and opinions in your feeds.