Moving a Distributed GPU Service into TEEs

Technical Log - March 2026

GPU Security CUDA TEE

Summary

Traditional GPU computing has a blind spot for data confidentiality: unlike CPUs, GPUs historically lacked hardware-backed Trusted Execution Environments (TEEs). That changed with NVIDIA's Hopper architecture and GPU Confidential Computing. This log covers the practical experience of moving a distributed GPU service into a TEE, why we needed it, how the hardware isolation actually works, what changed in practice, and other things I observed.

Why We Needed Confidential GPU Computing

In our project, a privacy-focused blockchain, users entrust us with sensitive data that must remain private even from us as a service operator. To build true trust, we wanted a setup where not even we could peek into customer data or intermediate results.

TEEs make this possible by isolating computations and ensuring data remains accessible only to trusted software inside a verified enclave. With GPU TEEs, we can offload cryptographic workloads and heavy computation while maintaining this isolation. It also eliminates a major attack surface because data is never visible in plaintext to the host or hypervisor.

How GPU Memory Security Works

NVIDIA's GPU Confidential Computing (GPU-CC), introduced with the Hopper architecture, finally extends TEEs to GPUs. It is not a one-to-one mirror of CPU-based TEE features since the GPU world has its own design constraints and threat model.

Here are the key mechanisms that make GPU TEEs work:

These combined mechanisms provide strong isolation while keeping overhead low enough for heavy compute workloads.

Making Your Environment TEE-Ready

Not every setup can run confidential workloads today. You will need a compatible CPU (with SEV-SNP or TDX support) and a Hopper or newer GPU. NVIDIA provides a Secure AI Compatibility Matrix to check whether your hardware and software stack supports confidential computing.

Some quick sanity checks:

nvidia-smi conf-compute -grs
# Should output: Confidential Compute GPUs Ready state: ready

nvidia-smi conf-compute -f
# Should output: CC status: ON

To verify security, perform attestation, the process of cryptographically confirming that the GPU and its drivers are genuine and secure. NVIDIA's Attestation SDK supports both local and remote verification. It can attest not only individual GPUs but also the NVSwitch fabric in multi-GPU systems. For example:

python3 /gpu-attestation/nv_attestation_sdk/tests/SmallGPUTest.py

Once attested, you can run your workloads normally inside the CVM. For most use cases, no code changes are needed.

What Changes in Practice

From a developer's perspective, moving workloads into TEEs was surprisingly smooth. Our existing core compute worked without modification but we had to work around different limiations.

A few caveats:

For our distributed use case, this meant we needed to redesign our code around not being able to use MIG and multi-GPU instances and distribute that load onto several machines and splitting resources using MPS only.

Performance Observations

We benchmarked our cryptographic workloads, mainly NTT, MSM, and ZK-friendly hashing, inside the GPU TEE. The overhead was minimal, typically under 5%, mostly due to encrypted transfers over PCIe. Compute throughput itself remained virtually identical. Minimizing IO and parallelizing compute with data transfers is key to keeping the overhead low.

This made the trade-off very appealing: hardware-enforced confidentiality with minimal runtime penalty.

What This Unlocks

With both CPU and GPU attestation, we can offer users a strong guarantee that their data stays private, verified cryptographically at runtime.

Of course, transparency still matters. If the code is not open source, users cannot verify what happens inside the enclave. However, TEEs make it cryptographically impossible for operators to leak or tamper with user data from outside.

Resources and Further Reading

If you want to explore deeper, these are excellent starting points: