OpenAI Releases CoT-Control Evaluation Suite
CoT-Control
- Release Date
- 2026-03-05
- Tasks
- >13,000
- Benchmarks Included
- GPQA, MMLU-Pro, HLE, BFCL, SWE-Bench Verified
- Key Finding
- Low controllability in frontier models
OpenAI released the open-source CoT-Control evaluation suite and a related research paper on March 5, 2026. The suite comprises over 13,000 tasks assessing reasoning models' ability to control their chain-of-thought to evade monitoring. Frontier models, including GPT-5.4 Thinking, demonstrate low controllability.