OpenInfer has raised $8 million in funding to redefine AI inference for edge purposes.
It’s the mind baby of Behnam Bastani and Reza Nourai, who spent almost a decade of constructing and scaling AI programs collectively at Meta’s Actuality Labs and Roblox.
Via their work on the forefront of AI and system design, Bastani and Nourai witnessed firsthand how deep system structure permits steady, large-scale AI inference. Nonetheless, at the moment’s AI inference stays locked behind cloud APIs and hosted programs—a barrier for low-latency, personal, and cost-efficient edge purposes. OpenInfer modifications that. It desires to agnostic to the kinds of gadgets on the edge, Bastani mentioned in an interview with GamesBeat.
By enabling the seamless execution of enormous AI fashions instantly on gadgets—from SoCs to the cloud—OpenInfer removes these boundaries, enabling inference of AI fashions with out compromising efficiency.
The implication? Think about a world the place your telephone anticipates your wants in actual time — translating languages immediately, enhancing photographs with studio-quality precision, or powering a voice assistant that actually understands you. With AI inference working instantly in your machine, customers can anticipate sooner efficiency, better privateness, and uninterrupted performance regardless of the place they’re. This shift eliminates lag and brings clever, high-speed computing to the palm of your hand.
Constructing the OpenInfer Engine: AI Agent Inference Engine

Since founding the corporate six months in the past, Bastani and Nourai have assembled a group of
seven, together with former colleagues from their time at Meta. Whereas at Meta, they’d constructed Oculus
Hyperlink collectively, showcasing their experience in low-latency, high-performance system design.
Bastani beforehand served as Director of Structure at Meta’s Actuality Labs and led groups at
Google centered on cellular rendering, VR, and show programs. Most just lately, he was Senior
Director of Engineering for Engine AI at Roblox. Nourai has held senior engineering roles in
graphics and gaming at business leaders together with Roblox, Meta, Magic Leap, and Microsoft.
OpenInfer is constructing the OpenInfer Engine, what they name an “AI agent inference engine”
designed for unmatched efficiency and seamless integration.
To perform the primary purpose of unmatched efficiency, the primary launch of the OpenInfer
Engine delivers 2-3x sooner inference in comparison with Llama.cpp and Ollama for distilled DeepSeek
fashions. This enhance comes from focused optimizations, together with streamlined dealing with of
quantized values, improved reminiscence entry by enhanced caching, and model-specific
tuning—all with out requiring modifications to the fashions.
To perform the second purpose of seamless integration with easy deployment, the
OpenInfer Engine is designed as a drop-in alternative, permitting customers to modify endpoints
just by updating a URL. Present brokers and frameworks proceed to perform seamlessly,
with none modifications.
“OpenInfer’s developments mark a significant leap for AI builders. By considerably boosting
inference speeds, Behnam and his group are making real-time AI purposes extra responsive,
accelerating improvement cycles, and enabling highly effective fashions to run effectively on edge
gadgets. This opens new potentialities for on-device intelligence and expands what’s potential in
AI-driven innovation,” mentioned Ernestine Fu Mak, Managing Associate at Courageous Capital and an
investor in OpenInfer.
OpenInfer is pioneering hardware-specific optimizations to drive high-performance AI inference
on giant fashions—outperforming business leaders on edge gadgets. By designing inference from
the bottom up, they’re unlocking increased throughput, decrease reminiscence utilization, and seamless
execution on native {hardware}.
Future roadmap: Seamless AI inference throughout all gadgets
OpenInfer’s launch is well-timed, particularly in gentle of current DeepSeek information. As AI adoption
accelerates, inference has overtaken coaching as the first driver of compute demand. Whereas
improvements like DeepSeek scale back computational necessities for each coaching and inference,
edge-based purposes nonetheless battle with efficiency and effectivity resulting from restricted processing
energy. Operating giant AI fashions on client gadgets calls for new inference strategies that
allow low-latency, high-throughput efficiency with out counting on cloud infrastructure,
creating vital alternatives for corporations optimizing AI for native {hardware}.
“With out OpenInfer, AI inference on edge gadgets is inefficient as a result of absence of a transparent
{hardware} abstraction layer. This problem makes deploying giant fashions on
compute-constrained platforms extremely tough, pushing AI workloads again to the
cloud—the place they develop into expensive, gradual, and depending on community situations. OpenInfer
revolutionizes inference on the sting,” mentioned Gokul Rajaram, an investor in OpenInfer. Rajaram is
an angel investor and at the moment a board member of Coinbase and Pinterest.
Particularly, OpenInfer is uniquely positioned to assist silicon and {hardware} distributors improve AI
inference efficiency on gadgets. Enterprises needing on-device AI for privateness, value, or
reliability can leverage OpenInfer, with key purposes in robotics, protection, agentic AI, and
mannequin improvement.
In cellular gaming, OpenInfer’s know-how permits ultra-responsive gameplay with real-time
adaptive AI. Enabling on-system inference permits for decreased latency and smarter in-game
dynamics. Gamers will get pleasure from smoother graphics, AI-powered personalised challenges, and a
extra immersive expertise evolving with each transfer.
“At OpenInfer, our imaginative and prescient is to seamlessly combine AI into each floor,” mentioned Bastani. “We intention to determine OpenInfer because the default inference engine throughout all gadgets—powering AI in self-driving vehicles, laptops, cellular gadgets, robots, and extra.”
OpenInfer has raised an $8 million seed spherical for its first spherical of financing. Traders embrace
Courageous Capital, Cota Capital, Essence VC, Operator Stack, StemAI, Oculus VR’s Co-founder and former CEO Brendan Iribe, Google Deepmind’s Chief Scientist Jeff Dean, Microsoft Experiences and Gadgets’ Chief Product Officer Aparna Chennapragada, angel investor Gokul Rajaram, and others.
“The present AI ecosystem is dominated by a couple of centralized gamers who management entry to
inference by cloud APIs and hosted providers. At OpenInfer, we’re altering that,” mentioned
Bastani. “Our identify displays our mission: we’re ‘opening’ entry to AI inference—giving
everybody the power to run highly effective AI fashions regionally, with out being locked into costly cloud
providers. We imagine in a future the place AI is accessible, decentralized, and really within the palms of
its customers.”