From Lambda to Positron: Mitesh Agrawal’s Journey to Redefine AI Hardware
Eqvista spotlights Mitesh Agrawal, CEO of Positron AI, to explore the shift from GPU-dominated AI infrastructure to purpose-built inference hardware that delivers 3-3.5x better performance per dollar and up to 66% lower power usage.
Mithesh shares his journey from scaling Lambda’s GPU cloud to leading Positron’s FPGA-proven, US-manufactured systems optimised for Transformer models. He details innovations like >90% memory utilisation and multi-model concurrency, enabling customers like Cloudflare to consolidate racks without code changes, and discusses the $51.6M Series A funding that fuels ASIC development.
Positron targets enterprises, inference-as-a-service providers, and AI deployers facing skyrocketing deployment costs, positioning itself to fragment Nvidia’s dominance by prioritising real-world economics over general-purpose flexibility. Mitesh emphasises pivotal breakthroughs, such as early production wins and capital efficiency—building the first system on just $12.5M—and offers advice for AI hardware innovators to focus on customer pain points with disciplined scaling.

Mitesh, can you share what motivated you to transition from Lambda, a successful AI compute cloud provider deeply integrated with Nvidia, to Positron AI, which is a custom AI inference-focused silicon and a technology platform?
When I was at Lambda, we built one of the largest GPU inference and training clouds. That gave me a front-row seat to both the promise and the limits of scaling on general-purpose GPUs. Customers were running real production workloads, and what stood out was that inference — not training — was quickly becoming the dominant cost Center.
We kept seeing the same problem: GPUs are incredible for training, but once you move to deployment, most of their silicon sits idle. Memory bandwidth becomes the bottleneck, power bills skyrocket, and customers pay for racks of capacity they don’t actually use.
That was the motivation for Positron. We didn’t want to optimize around benchmarks or theoretical FLOPs. We wanted to design silicon and a platform that solved the real-world pain of deploying AI at scale. By focusing on inference efficiency, we could deliver better economics, higher utilization, and a much more sustainable infrastructure model.
In short, Positron is the company I wished existed when I was at Lambda.
Could you explain how Positron’s hardware achieves such significant improvements in performance per dollar and performance per watt compared to Nvidia’s GPUs? What innovations in architecture or manufacturing enable this?
The gains come from focus. GPUs are designed to do everything — graphics, training,inference — which makes them incredibly flexible, but also inefficient for any one task.We built Positron from the ground up for one workload: transformer inference.
That led to two core innovations. First, our architecture maximizes memory bandwidth utilization. On GPUs, you typically see ~30% utilization because the compute and memory subsystems aren’t balanced for inference. We routinely hit over 90%, which means you’re actually getting the performance you’re paying for.
Second, we designed the system to support multi-model concurrency. Instead of a 1:1 mapping between a GPU and a single large model, our hardware can host many models simultaneously on a single card. That lets enterprises consolidate racks, cut power draw, and increase density without rewriting their software stack.
On the manufacturing side, we also prioritized capital efficiency. We proved this architecture on FPGA to get into customer hands quickly, and now we’re taping out an ASIC. And importantly, everything is built in the U.S., which reduces geopolitical risk and tariffs. The result is >90% memory bandwidth utilization and 3 – 3.5× better performance-per-dollar with up to 66% lower power versus Nvidia’s current GPUs, based on vendor-measured real inference workloads.
What are the main market segments or customer types Positron is targeting with its custom AI inference hardware?
Right now, we’re focused on customers who are feeling the inference crunch most acutely. That includes enterprises deploying large-scale conversational AI, inference-as-a-service providers who need to serve millions of queries cost-effectively, and companies rolling out generative code and agent workloads in production.
In practice, that looks like Fortune 500 enterprises running internal chatbots, infrastructure providers who want to offer inference capacity as a service, and fast-growing AI companies that can’t afford GPU economics at scale. Our early deployments include players like Cloudflare and Parasail, who saw the need to consolidate models, cut power draw, and improve economics without rewriting their software stack.
Long-term, the market is massive. Every company that deploys AI — from financial services to healthcare to government — will face the same challenge: how to deliver intelligence at scale, sustainably. We see Positron as the infrastructure layer that makes that possible.
What are the biggest challenges you see in disrupting Nvidia’s dominance in AI hardware, and how does Positron plan to address them?
Nvidia has done an extraordinary job building not just chips, but an ecosystem. CUDA, developer tooling, and sheer scale give them an enormous advantage — and we’re not naïve about that.
But the reality is that AI infrastructure needs are fragmenting. Training giant frontier models and running inference at scale are very different problems. GPUs will remain the default for training, but they’re increasingly inefficient for deployment. That’s the wedge we’re focused on.
The challenges are threefold:
1. Compatibility — customers don’t want to rewrite their code. We designed our platform to drop into existing workflows with no software changes.
2. Trust — enterprises need to see real performance gains in production. That’s why we built on FPGA first, proved efficiency with early customers, and only then moved to ASIC.
3. Scale — Nvidia’s volumes drive down cost. We address that by being capital efficient and manufacturing in the U.S., where policy tailwinds like the CHIPS Act align with customer demand for secure, domestic supply chains.
Our strategy isn’t to replace GPUs everywhere, rather, it’s to give enterprises a better option where GPUs are weakest, inference, and to do it in a way that immediately improves their economics.
Could you describe any pivotal moments or breakthroughs in Positron’s product development that confirmed you were on the right path?
One of the pivotal moments was when we first hit over 90% memory bandwidth utilization on a real transformer model. For years, I’d watched customers at Lambda burn capital on racks of GPUs that rarely got above 30%. The day our prototype consistently ran at triple that efficiency, it was clear we were solving a real bottleneck in workloads customers actually cared about.
Another turning point was our decision to go FPGA-first. It wasn’t the obvious choice,but it let us prove performance and get systems into production years faster than waiting for an ASIC. Early customers were able to test their own models, validate the economics, and give us direct feedback. That validation loop is what gave us the conviction that we were on the right path.
And when some of the most sophisticated infrastructure players — names like Cloudflare — deployed us in production, that was the ultimate proof. If companies operating at that scale were willing to bet on a young platform, it told us we were building something indispensable.

Positron recently closed a $51.6 million Series A round led by Valor Equity Partners, DFJ Growth and Atreides Management. How did you approach fundraising for this round, and what were the key factors that attracted these investors?
Our approach to the Series A was very focused. We weren’t trying to raise on a vision alone — we had real performance data, customer deployments, and a capital-efficient path to scale. That combination made the story resonate.
The key factor that attracted investors like Valor, DFJ Growth, and Atreides was the shift we were seeing in the market. Training was capturing headlines, but inference was quietly becoming the bigger economic challenge. Our architecture — with >90% memory bandwidth utilization, multi-model concurrency, and U.S.-based manufacturing— was purpose-built to solve that.
The other factor was discipline. We built our first production system on just $12.5 million, and proved it in live workloads. Investors knew we could deliver results without burning through hundreds of millions. In today’s market, that kind of efficiency matters as much as the technology.
Ultimately, they saw what we see: inference as the main event in AI infrastructure. And Positron is positioned to define how it scales.
What are the key metrics or milestones you focus on to measure Positron’s success during its early growth phase?
Early on, the only metrics that matter are whether customers are actually running us in production and what efficiency gains they’re seeing. If someone can consolidate racks or cut their power bill because of Positron, that’s the real impact.
On the technical side, we track utilization — memory bandwidth, performance-per-watt — because those translate directly into dollars saved for customers. And from a company perspective, I’ve always been disciplined about capital efficiency. We built and shipped our first product for $12.5M. If we can keep proving out technology with that kind of discipline, it sets us up to scale in the right way.
Looking ahead, how do you see the AI hardware and technology landscape evolving in the next five years, especially with advancements in inference workloads?
Five years from now, inference is going to be the main pressure point. Training will still matter, but the workloads that keep growing are the ones that run constantly — chat, code generation, agents. That’s where the cost and energy use pile up.
The hardware market will adapt. GPUs will keep their role in training, but deployment will shift to purpose-built systems. What customers will care about is utilization: how many models you can run per rack, how many tokens you can process per watt. The biggest constraint will be energy. Data centers can’t keep scaling power and cooling at the pace demand is growing. The companies that solve for that will define the next era of AI infrastructure.
What advice would you give to tech entrepreneurs aiming to innovate in the AI hardware and infrastructure ecosystem?
AI hardware looks glamorous from the outside, but it’s brutally hard. My advice is to stay close to customer pain. Don’t chase benchmarks or theoretical gains — figure out what’s actually blocking deployment and solve that with focus.
Second, be disciplined about capital. Hardware takes real dollars to build, but raising huge rounds too early can push you into building the wrong thing at the wrong scale. We built our first production system on $12.5M because proving the architecture mattered more than raising for show.
And finally, be patient with the ecosystem. Nvidia built CUDA, tooling, and trust over decades. If you want to innovate here, you need the same long-term view. Start with something customers can drop into their stack today, earn trust, and build from there.
