Where 8080 makes intelligence feel instant.
8080 core
Intelligence should be everywhere, but inference is still too slow and too expensive. Models are good enough; the bottleneck is GPU economics. 8080 is the first inference cloud built around inference ASICs (not GPUs), delivering massive speed and cost gains by burning models into silicon.
We co-locate inference with CPUs, GPUs, storage, and code sandboxes so agent workflows stay on-rack with minimal latency near end users.
Sitemap.xml focus
Latency target
<50ms
Set an SLA that matters to the user experience.
Throughput target
XXk TPS
Replace with the customer’s real volume.
Cost target
Xx lower
Define the savings you are aiming for.
Region coverage
Global
List priority regions for deployment.
Use cases
The highest-leverage opportunities blend low latency with high-volume workloads powered by ASIC inference, not GPUs.
Describe the most valuable workflow you want to accelerate.
Add another inference-heavy workflow.
Note where rapid iteration would unlock new revenue or growth.
Describe a low-latency workflow that benefits from on-rack compute.
About 8080
8080 is built to deliver inference that feels instantaneous, economical, and dependable in production.
Fastest and cheapest possible inference, designed to make intelligence feel instant everywhere.
Mixed compute colocated with inference for real-time voice, media, and agent workflows, deployed via the 8080 SDK.
8080 is built around a new generation of inference ASICs, delivering massive speed and power gains versus general GPUs.
Drop-in integration with existing inference services. Migrate workloads without rewriting product logic or orchestration.
CLI-first workflows with logs, tracing, and fast iteration loops so teams can ship and debug production inference quickly.
8080 is structured as a hardware + software + ops stack optimized for two things: lowest possible latency and lowest possible cost.
Model availability
Availability and throughput targets for production deployment.
| Model class | Availability | Prefill speed | Generate speed | Cost |
|---|---|---|---|---|
| 8B class model | Available now | 160k | 20k | 80% cheaper |
| 30B class model | April 2026 | 200k | 25k | 80% cheaper |
| 235B class model | Summer 2026 | 400k | 25k | 80% cheaper |
Next step
8080 is launching in Q1 2026 with early access for partners ready to scale real-time inference. We can reserve capacity, co-design deployment regions, and map the fastest path from pilot to production.