San Diego, October 27, 2025- Qualcomm unveiled two new rack-scale inference solutions, the AI200 and AI250, pitching them as cost-efficient, high-memory options for operators running generative-AI workloads in data centres. The company says the AI200 will begin shipping in 2026, with the higher-performance AI250 following in 2027.
Qualcomm frames the two systems as purpose-built for inference rather than training, focusing on memory capacity and power efficiency, key factors for customers who need to serve large language models at scale without blowing out electricity or real-estate budgets. The AI200 is described in Qualcomm materials as supporting up to 768 GB of on-card memory in common configurations, while the AI250 is presented as a step up in memory bandwidth and overall rack throughput. These specs come from Qualcomm’s release and vendor briefings; independent benchmarks will be needed to validate real-world performance.
“With Qualcomm AI200 and AI250, we’re redefining what’s possible for rack-scale AI inference. These innovative new AI infrastructure solutions empower customers to deploy generative AI at unprecedented TCO, while maintaining the flexibility and security modern data centers demand,” said Durga Malladi, SVP & GM, Technology Planning, Edge Solutions & Data Center, Qualcomm Technologies, Inc. “Our rich software stack and open ecosystem support make it easier than ever for developers and enterprises to integrate, manage, and scale already trained AI models on our optimized AI inference solutions. With seamless compatibility for leading AI frameworks and one-click model deployment, Qualcomm AI200 and AI250 are designed for frictionless adoption and rapid innovation.”
Qualcomm also flagged early commercial interest and go-to-market partnerships, naming HUMAIN among its early collaborators to deploy rack systems in regional cloud and sovereign projects, a tie meant to demonstrate the company's ability to sell integrated racks and services, not just chips. The market reaction to the announcement was notable, with traders and analysts flagging Qualcomm’s move as a credible challenge to established inference suppliers.
The practical takeaway for data-centre operators is straightforward: these products aim to offer a middle path between raw performance and total cost of ownership. By increasing memory per accelerator and improving memory bandwidth at rack scale, Qualcomm hopes to lower the cost to serve large models for inference workloads where latency, energy use, and rack density matter most. Adoption will hinge on software ecosystem support, model compatibility, and real benchmarks that prove vendor claims.
For now, Qualcomm’s announcement is a clear sign that the inference hardware market is heating up, and that suppliers from outside the traditional GPU incumbency are willing to compete on the metrics operators care about: memory, power, and TCO.