Introducing the Corespan 5090 — Now AvailableLearn More →
Blog2 min read

AI Inference Economics: Corespan Announces Inference System with 8 to 12x NVIDIA RTX 5090 GPUs

Corespan’s PRU 2500 pools 8 to 12 NVIDIA RTX 5090 GPUs in a high-density, interconnect-attached system built to improve utilization and scale AI inference efficiently.

Bill Koss - CEO and President of Corespan Systems

A New Approach to GPU Density

Most AI infrastructure still ties GPUs to fixed server chassis, which can leave expensive accelerator capacity sitting idle. The Corespan PRU 2500 changes that model by concentrating 8 to 12 NVIDIA RTX 5090 GPUs into a shared resource unit that can be attached to standard servers on demand.

Built for High-Utilization Inference

The PRU 2500 is a high-density PCIe Gen 5 chassis designed to serve as a shared accelerator pool rather than a traditional server. By disaggregating GPUs from the host, operators can allocate resources more dynamically across workloads and tenants, helping push utilization higher while improving cost efficiency.

Dense Performance in a Compact Footprint

With support for up to 12 RTX 5090 GPUs, the PRU 2500 delivers substantial compute and memory bandwidth in a single chassis. This density makes it well suited for containerized inference, multi-tenant GPU services, and distributed AI workloads that need more performance without adding more full servers.

Liquid Cooling for Rack-Scale Deployment

To support this level of density, the PRU 2500 uses a hybrid liquid-cooling design built to manage roughly 7 kW of thermal load. Direct-to-chip cooling helps maintain stable performance under sustained demand while enabling a far more compact GPU footprint than conventional air-cooled systems.

Interconnect-Attached and Kubernetes-Ready

The PRU 2500 connects to one or more host servers through Corespan’s photonic interconnect and FIC 2500 Fabric Interface Card. Remote GPUs appear as local PCIe devices to the operating system and NVIDIA software stack, allowing operators to use familiar tools such as Docker, Kubernetes, NVIDIA Container Toolkit, and standard device plugins.

Flexible Deployment Options

Organizations can deploy the PRU 2500 with a single host or across dual-host configurations, depending on performance, resiliency, and tenant-isolation requirements. This flexibility allows operators to partition GPU resources dynamically and reassign them quickly as workload demand changes.

Why It Matters for Operators

For GPU-as-a-Service providers and enterprise AI teams, the PRU 2500 offers a more efficient infrastructure model. It reduces the need for redundant CPUs, NICs, power supplies, and rack space, while enabling shared GPU pools, elastic multi-tenancy, and better capital efficiency.

Available Now

The PRU 2500 with NVIDIA RTX 5090 GPUs is available in 8-, 10-, and 12-GPU configurations. It is designed for organizations building high-density, Kubernetes-native AI inference environments that need more flexibility, better utilization, and a cleaner path to scaling GPU infrastructure.

Download brief

Composable Infrastructure for AI & Modern Data Centers