Moving a Distributed GPU Service into TEEs

Summary

Traditional GPU computing has a blind spot for data confidentiality: unlike CPUs, GPUs historically lacked hardware-backed Trusted Execution Environments (TEEs). That changed with NVIDIA's Hopper architecture and GPU Confidential Computing. This log covers the practical experience of moving a distributed GPU service into a TEE, why we needed it, how the hardware isolation actually works, what changed in practice, and other things I observed.

Why We Needed Confidential GPU Computing

In our project, a privacy-focused blockchain, users entrust us with sensitive data that must remain private even from us as a service operator. To build true trust, we wanted a setup where not even we could peek into customer data or intermediate results.

TEEs make this possible by isolating computations and ensuring data remains accessible only to trusted software inside a verified enclave. With GPU TEEs, we can offload cryptographic workloads and heavy computation while maintaining this isolation. It also eliminates a major attack surface because data is never visible in plaintext to the host or hypervisor.

How GPU Memory Security Works

NVIDIA's GPU Confidential Computing (GPU-CC), introduced with the Hopper architecture, finally extends TEEs to GPUs. It is not a one-to-one mirror of CPU-based TEE features since the GPU world has its own design constraints and threat model.

Here are the key mechanisms that make GPU TEEs work:

No Runtime Memory Encryption: Unlike AMD SEV or Intel TDX, GPUs do not encrypt on-package HBM memory. The assumption is that decapsulating the GPU to probe HBM physically is out of scope for realistic attacks, so security relies on strict hardware access control.
Encrypted PCIe Transfers and Staging Buffers: The CPU-GPU interconnect is untrusted, so all data passing over PCIe is encrypted. The CPU's Confidential Virtual Machine (CVM) and the GPU establish an SPDM (Security Protocol and Data Model) session to exchange cryptographic keys. Data first passes through unencrypted staging buffers in system memory, but the GPU's Copy Engine (CE) and GPU System Processor (GSP) decrypt and move it securely into the Compute Protected Region (CPR).
Compute Protected Region (CPR) and PCIe Firewall: Most of the GPU's memory, roughly 90%, forms a secure enclave called the CPR. A hardware PCIe firewall blocks direct access to this region. Standard host access paths, such as PCIe Base Address Registers (including BAR2), are completely disabled in confidential mode to prevent host or hypervisor snooping.
MMU Enforcement with Copy Engine Integration: The GPU's Memory Management Unit (MMU) works with the Copy Engine to strictly enforce memory boundaries. Once a compute engine operates inside the CPR, it cannot access unprotected memory. Any attempt to violate this rule triggers a memory fault.
Mandatory Memory Scrubbing: Before a GPU can be detached and reassigned, the Secure Processor (SEC2) performs a soft reset that wipes all cryptographic keys and data from memory, preventing potential data leakage if a malicious VM were to claim the GPU afterward.

These combined mechanisms provide strong isolation while keeping overhead low enough for heavy compute workloads.

Making Your Environment TEE-Ready

Not every setup can run confidential workloads today. You will need a compatible CPU (with SEV-SNP or TDX support) and a Hopper or newer GPU. NVIDIA provides a Secure AI Compatibility Matrix to check whether your hardware and software stack supports confidential computing.

Some quick sanity checks:

nvidia-smi conf-compute -grs
# Should output: Confidential Compute GPUs Ready state: ready

nvidia-smi conf-compute -f
# Should output: CC status: ON

To verify security, perform attestation, the process of cryptographically confirming that the GPU and its drivers are genuine and secure. NVIDIA's Attestation SDK supports both local and remote verification. It can attest not only individual GPUs but also the NVSwitch fabric in multi-GPU systems. For example:

python3 /gpu-attestation/nv_attestation_sdk/tests/SmallGPUTest.py

Once attested, you can run your workloads normally inside the CVM. For most use cases, no code changes are needed.

What Changes in Practice

From a developer's perspective, moving workloads into TEEs was surprisingly smooth. Our existing core compute worked without modification but we had to work around different limiations.

A few caveats:

MIG (Multi-Instance GPU) is not supported in TEE mode because MIG partitions share memory and bus resources, which violates the strict physical and logical isolation required for confidential computing.
Multi-GPU setups remain limited for now. Cloud platforms like GCP currently offer only single-GPU passthrough, but in theory, Hopper supports either single-GPU or full-platform passthrough. The newer Blackwell generation adds enhanced inter-GPU security features, improving isolation for connected GPUs operating in TEE mode.
MPS (Multi-Process Service) is not officially supported, but it worked well in our tests and does so in our current production environment.

For our distributed use case, this meant we needed to redesign our code around not being able to use MIG and multi-GPU instances and distribute that load onto several machines and splitting resources using MPS only.

Performance Observations

We benchmarked our cryptographic workloads, mainly NTT, MSM, and ZK-friendly hashing, inside the GPU TEE. The overhead was minimal, typically under 5%, mostly due to encrypted transfers over PCIe. Compute throughput itself remained virtually identical. Minimizing IO and parallelizing compute with data transfers is key to keeping the overhead low.

This made the trade-off very appealing: hardware-enforced confidentiality with minimal runtime penalty.

What This Unlocks

With both CPU and GPU attestation, we can offer users a strong guarantee that their data stays private, verified cryptographically at runtime.

Of course, transparency still matters. If the code is not open source, users cannot verify what happens inside the enclave. However, TEEs make it cryptographically impossible for operators to leak or tamper with user data from outside.

Resources and Further Reading

If you want to explore deeper, these are excellent starting points: