Microsoft launches a new AI model, Phi-4-mini-flash-reasoning, with 10 times faster responses -

Microsoft has unveiled its newest AI model in the Phi family, Phi-mini-flash-reasoning. The Phi-4-mini-flash-reasoning is an open, small language model (SLM) designed to retain high reasoning efficiency for edge devices, mobile applications, and other resource-constrained environments. The SLM is a compact AI model engineered for fast, on-device logical reasoning.

“This new model follows Phi-4-mini but is built on a new hybrid architecture that achieves up to 10 times higher throughput and a 2 to 3 times average reduction in latency, enabling significantly faster inference without sacrificing reasoning performance. Ready to power real-world solutions that demand efficiency and flexibility,” said Microsoft.

The new model is inspired by its family member Phi-4-mini but is made for better performance. Phi-4-mini-flash-reasoning is built on a new hybrid architecture that provides a throughput 10 times faster than its predecessor. It also reduced the average latency by two or three times, which enables a significantly faster inference without sacrificing the reasoning. The model is available on the NVIDIA API Catalog, Azure AI Foundry, and Hugging Face today.

Story continues below this ad

For organised, mathematically oriented reasoning tasks, the 3.8 billion parameter open model is optimised on high-quality synthetic data while maintaining support for a 64k token context length. In contrast to previous Phi models, Phi-4-mini-flash-reasoning presents a new “decoder-hybrid-decoder” architecture called SambaY, which combines a revolutionary Gated Memory Unit (GMU), sliding window attention, and state-space models (Mamba) to improve long-context performance and decrease decoding complexity.

Microsoft claims that this configuration enables the model to add costly attention layers with lightweight GMUs while maintaining linear prefill calculation time. As a result, inference efficiency is much increased, which makes it suitable for usage on a single GPU or in latency-sensitive applications like adaptive learning apps and real-time teaching tools.

Microsoft shared that the SLM Phi-4-mini-flash-reasoning outperforms a model twice its size in tasks such as AIME24/25 and Math500 by providing a faster response time on the VLLM inference framework.

With safety features including supervised fine-tuning (SFT), direct preference optimization (DPO), and reinforcement learning from human feedback (RLHF), the release is in line with Microsoft’s larger campaign for ethical AI. The business points out that all Phi models adhere to its fundamental values of openness, confidentiality, and inclusivity.

Source link

Microsoft launches a new AI model, Phi-4-mini-flash-reasoning, with 10 times faster responses

Leave a Reply Cancel reply

CATEGORIES

LATEST NEWS

Blogs & Opinion

Related Posts

Android 13 beta 2 announced at I/O 2022: Here’s what’s new

For blind internet users, the fix can be worse than the flaws

Nothing Headphone (1) review: Vintage flair and clear sound for a premium price

Leave a Reply Cancel reply

CATEGORIES

LATEST NEWS

Blogs & Opinion