OpenAI Releases CoT-Control Evaluation Suite

4 months ago

CoT-Control

Release Date: 2026-03-05
Tasks: >13,000
Benchmarks Included: GPQA, MMLU-Pro, HLE, BFCL, SWE-Bench Verified
Key Finding: Low controllability in frontier models

OpenAI released the open-source CoT-Control evaluation suite and a related research paper on March 5, 2026. The suite comprises over 13,000 tasks assessing reasoning models' ability to control their chain-of-thought to evade monitoring. Frontier models, including GPT-5.4 Thinking, demonstrate low controllability.

Sources

https://openai.com/index/reasoning-models-chain-of-thought-controllability/
https://x.com/OpenAI/status/2029650046002811280