Thinking Machines launches low-latency AI interaction model after Murati’s OpenAI exit

Thinking Machines Labs, the AI startup founded by former OpenAI CTO Mira Murati, released a demo and technical blog on May 11, 2026, for an Interaction Model designed to process voice, video and text simultaneously with 0.40-second response latency. The company said the system uses a full-duplex architecture that handles input and output every 200 milliseconds inside a single Transformer, without a separate speech encoder. Its front-end model is a 276 billion-parameter mixture-of-experts system with 12 billion active parameters, paired with a background model for asynchronous reasoning, web search and tool use. Benchmark data cited by the company showed TML-Interaction-Small scoring 77.8 on FD-bench, compared with 46.8 for OpenAI’s GPT-realtime-2.0, with latency of 0.40 seconds versus 1.18 seconds. Thinking Machines previously raised about $2 billion led by Andreessen Horowitz at a reported $12 billion valuation and has about 130 employees, many from OpenAI.

EditorJack Lee