Gimlet Labs Raises $80M to Unlock AI's Hidden Hardware Capacity with First Multi-Silicon Inference Cloud
A Stanford-founded startup claims it can make AI inference 3–10x faster at the same cost by running workloads simultaneously across CPUs, GPUs, and SRAM chips—unlocking idle compute that sits unused 70–85% of the time.
The Multi-Silicon Revolution: How Gimlet Labs Is Solving AI's Most Expensive Bottleneck
Every time an AI model generates a response, predicts an outcome, or processes a document, it burns through specialized hardware in a surprisingly inefficient way. According to a new startup that just closed an $80 million Series A round, the world's largest AI deployments are wasting hundreds of billions of dollars in idle compute—and the fix turns out to be elegant software, not more hardware.
The Problem: AI Workloads Are Hardware Misfits
Gimlet Labs, founded by Stanford adjunct professor Zain Asgar and his co-founders Michelle Nguyen, Omid Azizi, and Natalie Serrino, emerged from stealth in October 2025 with a striking claim: data centers are using their deployed hardware only 15 to 30 percent of the time, even during peak AI workloads.
The reason is architectural. Agentic AI systems—the kind that chain together multiple reasoning steps, tool calls, and memory lookups—don't have uniform compute needs. As lead investor Tim Tully of Menlo Ventures explained in his funding blog post: "A single agent may chain together multiple steps, and each requires different hardware: Inference is compute-bound; decode is memory-bound; and tool calls are network-bound."
No single chip type handles all three optimally. Traditional GPUs excel at compute-heavy inference but waste resources on memory-bound or network-bound operations. The result is expensive, powerful hardware sitting idle while AI applications wait for the right bottleneck to clear.
"We basically run across whatever different hardware that's available," said Asgar. "Our goal was basically to try to figure out how you can get AI workloads to be 10x more efficient than ever, today."
The Solution: Orchestrating Heterogeneous Silicon
Gimlet Labs has built what it calls the world's first and only multi-silicon inference cloud—a software layer that can simultaneously slice and distribute an AI workload across CPUs, GPUs, high-memory SRAM systems, and purpose-built AI accelerators. The company has already inked partnerships with NVIDIA, AMD, Intel, ARM, Cerebras, and d-Matrix.
The platform works either as a software deployment within a customer's own infrastructure or through Gimlet Cloud, its managed API offering. The target customer is not individual developers—it's the frontier model labs and hyperscale cloud providers with enormous, heterogeneous fleets of compute.
Gimlet claims its platform delivers 3x to 10x inference speedups for the same cost and power budget. It can even split underlying model weights so that different portions of a single model run across different chip architectures, matching hardware strengths to computational demands.
Already Generating Revenue—and Doubling Its Customers
The company publicly launched in October 2025 with eight-figure revenues out of the gate—at least $10 million—an unusually strong position for a newly emerged startup. Asgar told TechCrunch that Gimlet's customer base has more than doubled in the four months since launch, and now includes "a major model maker and an extremely large cloud computing company," though he declined to name them.
The $80 million Series A was led by Menlo Ventures, with the capital earmarked for expanding the team and scaling out Gimlet Cloud to meet what the company calls record demand. The four co-founders previously worked together at Pixie, an open-source observability tool for Kubernetes that was acquired, giving the team a track record of building infrastructure-layer companies.
The Bigger Picture: A $7 Trillion Compute Race
If the current AI infrastructure buildout continues at its current pace, McKinsey estimates data center spending will total nearly $7 trillion by 2030. Gimlet's pitch—that you can extract dramatically more value from existing hardware through smarter orchestration software—is a compelling counter-narrative to the dominant "just buy more GPUs" approach.
As aging GPUs get redeployed and new chip architectures enter the market, the heterogeneous fleet will only grow. The question Gimlet is betting on: will enterprises pay for sophisticated software to extract efficiency from that messy pile of silicon, rather than continuously purchasing homogeneous upgrades?
Based on early traction, at least some of the world's largest AI operators believe the answer is yes.
0 Comments
No comments yet. Be the first to say something.