OpenAI Launches GPT-5.2: Most Advanced AI Model Yet

OpenAI Unveils GPT-5.2 with Major Improvements for Developers

OpenAI has released GPT-5.2, positioning it as the most advanced model for professional knowledge work and long-running AI agents. The launch represents a significant leap forward in practical AI capabilities, with measurable improvements across coding, reasoning, and real-world task execution.

According to OpenAI, the average ChatGPT Enterprise user already saves 40-60 minutes daily using AI apps, with heavy users reporting time savings exceeding 10 hours per week. GPT-5.2 aims to unlock even greater productivity gains through enhanced capabilities in spreadsheet creation, presentation building, code generation, image perception, and complex multi-step project handling.

Breakthrough Performance on Professional Tasks

GPT-5.2 Thinking achieves a groundbreaking 70.9% win-or-tie rate against industry professionals on GDPval, a benchmark testing well-specified knowledge work across 44 occupations. This marks the first time an OpenAI model has reached or exceeded human expert-level performance on this evaluation.

The model produces outputs at over 11 times the speed and less than 1% of the cost of expert professionals, suggesting significant potential for AI-assisted workflows when paired with human oversight. Tasks evaluated include creating sales presentations, accounting spreadsheets, urgent care schedules, and manufacturing diagrams.

One GDPval evaluator noted the quality leap: "It appears to have been done by a professional company with staff, and has a surprisingly well-designed layout and advice for both deliverables, though with one we still have some minor errors to correct."

State-of-the-Art Coding Capabilities

For developers, GPT-5.2 Thinking delivers impressive results on rigorous software engineering benchmarks. The model achieves 55.6% on SWE-Bench Pro, testing real-world engineering tasks across four programming languages, and 80% on SWE-bench Verified.

Early testing reveals particular strength in front-end development and complex UI work, including 3D elements. The model can reliably debug production code, implement feature requests, refactor large codebases, and ship fixes end-to-end with reduced manual intervention.

Jeff Wang, CEO of Windsurf, described GPT-5.2 as "the biggest leap for GPT models in agentic coding since GPT-5" and noted it has become "a SOTA coding model in its price range." This praise from the leader of a cutting-edge AI-native IDE underscores the model's significant impact on developer workflows.

Enhanced Vision and Long-Context Understanding

GPT-5.2 Thinking roughly halves error rates on chart reasoning and software interface understanding compared to previous models. On CharXiv Reasoning, which tests visual chart interpretation from scientific papers, the model achieves 88.7% accuracy with Python apps enabled.

The model also sets new standards for long-context reasoning, achieving near 100% accuracy on the 4-needle MRCR variant extending to 256,000 tokens. This enables professionals to work with extensive documents—reports, contracts, research papers, transcripts—while maintaining coherence across hundreds of thousands of tokens.

Superior Tool Calling and Workflow Management

GPT-5.2 Thinking reaches 98.7% on Tau2-bench Telecom, demonstrating reliable tool usage across long, multi-turn customer support tasks. This translates to stronger end-to-end workflows for resolving complex cases that require coordinating multiple systems and generating final outputs.

AJ Orbach, CEO of Triple Whale, reported: "GPT-5.2 unlocked a complete architecture shift for us. We collapsed a fragile, multi-agent system into a single mega-agent with 20+ apps... It's faster, smarter, and 100x easier to maintain."

Scientific and Mathematical Reasoning

GPT-5.2 Pro achieves 93.2% on GPQA Diamond, a graduate-level benchmark for physics, chemistry, and biology questions, while GPT-5.2 Thinking scores 92.4%. On FrontierMath expert-level mathematics problems (Tier 1-3), the model solves 40.3% of challenges.

OpenAI reports that GPT-5.2 Pro is the first model to exceed 90% on ARC-AGI-1, achieving this milestone while reducing costs by approximately 390× compared to the previous o3-preview model.

Availability and Pricing

GPT-5.2 is rolling out today in ChatGPT for paid plans (Plus, Pro, Go, Business, Enterprise) in three variants: Instant, Thinking, and Pro. For developers, the model is immediately available via API.

API pricing is set at $1.75 per million input tokens and $14 per million output tokens, with a 90% discount on cached inputs. Despite higher per-token costs than GPT-5.1, OpenAI reports that GPT-5.2's greater efficiency often results in lower overall costs for achieving specific quality levels.

The model supports a new "xhigh" reasoning effort parameter for tasks prioritizing maximum quality. OpenAI has no current plans to deprecate GPT-5.1, GPT-5, or GPT-4.1 in the API.

Safety and Responsible Development

GPT-5.2 incorporates OpenAI's safe completion research, reducing hallucination rates by approximately 30% compared to GPT-5.1 Thinking. The model shows meaningful improvements in handling sensitive conversations related to mental health, suicide prevention, and emotional distress.

OpenAI is rolling out age prediction capabilities to automatically apply content protections for users under 18, building on existing parental controls and safety measures.

The company acknowledges ongoing challenges, including over-refusals in some contexts, and emphasizes continued work to balance safety with utility in practical applications.

Looking to explore more innovative AI apps that can transform your workflow? Discover cutting-edge applications for coding, productivity, and creative work on Appse, your comprehensive directory for the latest AI innovations.

Source: Introducing GPT-5.2