Microsoft has unveiled its newest AI model in the Phi family, Phi-mini-flash-reasoning. The Phi-4-mini-flash-reasoning is an open, small language model (SLM) designed to retain high reasoning efficiency for edge devices, mobile applications, and other resource-constrained environments. The SLM is a compact AI model engineered for fast, on-device logical reasoning.
“This new model follows Phi-4-mini but is built on a new hybrid architecture that achieves up to 10 times higher throughput and a 2 to 3 times average reduction in latency, enabling significantly faster inference without sacrificing reasoning performance. Ready to power real-world solutions that demand efficiency and flexibility,” said Microsoft.
The new model is inspired by its family member Phi-4-mini but is made for better performance. Phi-4-mini-flash-reasoning is built on a new hybrid architecture that provides a throughput 10 times faster than its predecessor. It also reduced the average latency by two or three times, which enables a significantly faster inference without sacrificing the reasoning. The model is available on the NVIDIA API Catalog, Azure AI Foundry, and Hugging Face today.
For organised, mathematically oriented reasoning tasks, the 3.8 billion parameter open model is optimised on high-quality synthetic data while maintaining support for a 64k token context length. In contrast to previous Phi models, Phi-4-mini-flash-reasoning presents a new “decoder-hybrid-decoder” architecture called SambaY, which combines a revolutionary Gated Memory Unit (GMU), sliding window attention, and state-space models (Mamba) to improve long-context performance and decrease decoding complexity.
Microsoft claims that this configuration enables the model to add costly attention layers with lightweight GMUs while maintaining linear prefill calculation time. As a result, inference efficiency is much increased, which makes it suitable for usage on a single GPU or in latency-sensitive applications like adaptive learning apps and real-time teaching tools.
Microsoft shared that the SLM Phi-4-mini-flash-reasoning outperforms a model twice its size in tasks such as AIME24/25 and Math500 by providing a faster response time on the VLLM inference framework.
With safety features including supervised fine-tuning (SFT), direct preference optimization (DPO), and reinforcement learning from human feedback (RLHF), the release is in line with Microsoft’s larger campaign for ethical AI. The business points out that all Phi models adhere to its fundamental values of openness, confidentiality, and inclusivity.
© IE Online Media Services Pvt Ltd