GPU Passthrough and Metal Acceleration in ArcBox VMs
[Engineering]

GPU Passthrough and Metal Acceleration in ArcBox VMs

2026-01-08

Priya Sharma

Priya Sharma

Engineering

Kai Nakamura

Kai Nakamura

Engineering

The Problem: GPU Access in VMs

Running GPU-intensive workloads in containers has traditionally required:

  • Complex driver sharing
  • Significant performance overheads
  • Limited compatibility with container runtimes
  • Security vulnerabilities from direct hardware access

On Apple Silicon, we've developed a novel approach: Metal API bridging.

Metal API Bridging

Instead of direct GPU hardware access, we bridge the Metal API through the hypervisor:

┌─────────────────────────┐
│  Host Metal Runtime     │
└────────────┬────────────┘
             │ Metal API Bridge
         ┌───▼────┐
         │ VMM    │
         └───┬────┘

      ┌──────▼──────┐
      │ Guest VM    │
      │ Metal API   │
      │ Client      │
      └─────────────┘

Key Advantages

Security

  • No direct GPU hardware access from VM
  • Metal commands validated by bridge
  • Resource isolation maintained
  • Attack surface minimized

Performance

  • Near-native GPU utilization
  • Efficient memory sharing
  • Minimal latency on API calls
  • Hardware command queue access

Compatibility

  • Works with existing Metal code
  • Compatible with ML frameworks (PyTorch, TensorFlow)
  • No code changes required
  • Drop-in acceleration

Benchmark Results

Machine Learning Inference

We tested on an M3 Max MacBook Pro with 10-core GPU:

WorkloadNo GPUTraditional Container GPUArcBox Metal Bridge
ResNet-50 (1000 images)45.2s12.1s11.8s
BERT Inference38.5s9.2s9.1s
Stable Diffusion120.2s28.5s27.3s
Median Speedup3.8x4.0x4.2x

The Metal Bridge achieves 4x GPU throughput compared to CPU-only inference.

Implementation

Metal IPC Bridge

We created a custom Inter-Process Communication layer:

  1. Guest process makes Metal API calls
  2. Calls are serialized through VMM
  3. Host runtime validates and executes
  4. Results returned through shared memory buffers
  5. Minimal copying overhead

Shared Memory Optimization

GPU memory is intelligently shared:

// Metal buffer allocation flows through bridge
let buffer = device.new_buffer_with_data(
    data.as_ptr() as *const libc::c_void,
    data.len(),
    // Bridge maps to host GPU memory
    MTLResourceOptions::CpuCacheModeDefaultCache,
);

Kernel Integration

The hypervisor manages:

  • GPU command queue arbitration
  • Memory protection (different VMs can't access each other's buffers)
  • Power management coordination
  • Thermal monitoring

Performance Characteristics

Latency

  • Metal API call latency: +50-100µs (serialization overhead)
  • GPU command submission: no additional latency
  • Memory access: native speed

Throughput

  • GPU memory bandwidth: 95% of native
  • Command queue throughput: 98% of native
  • Inter-VM isolation maintained

Real-World Use Cases

ML Training

Fine-tuning models in isolated containers while maintaining compute performance:

arcbox run \
  --gpu \
  --mount training-data:/data \
  pytorch:latest \
  python train.py

Data Science

Jupyter notebooks with full GPU acceleration:

import torch
# Automatically uses Metal-accelerated GPU
device = torch.device("mps")  # Metal Performance Shaders
tensor = torch.randn(1000, 1000, device=device)

Image Processing

Real-time image processing with acceleration:

// Inside an ArcBox VM
let commandBuffer = commandQueue.makeCommandBuffer()
let computeEncoder = commandBuffer?.makeComputeCommandEncoder()
// ... hardware-accelerated compute

Limitations & Considerations

Current Limitations

  • Metal-specific (not CUDA or ROCm)
  • Single GPU per host (multiplexing in development)
  • Some advanced Metal features not yet supported

Roadmap

  • Multi-GPU support via virtualization
  • Performance monitoring and profiling
  • Dynamic resource allocation
  • Unified memory support
  • Metal 3 advanced features

Comparison with Alternatives

Direct GPU Passthrough

  • ❌ Security: Direct hardware access
  • ✅ Performance: Native speed
  • ❌ Flexibility: One workload at a time

Container GPU Sharing (Docker)

  • ✅ Security: Runtime isolation
  • ❌ Performance: Significant overhead
  • ⚠️ Compatibility: Framework-specific

ArcBox Metal Bridge

  • ✅ Security: API-mediated access
  • ✅ Performance: 4x native CPU
  • ✅ Compatibility: Zero code changes
  • ✅ Flexibility: Multiple isolated workloads

The Future

We're excited about the possibilities:

  • Quantum Computing - Quantum simulation acceleration
  • Computer Vision - Real-time video processing
  • Game Development - Game engine acceleration
  • Scientific Computing - Physics simulations
  • 3D Rendering - Metal rendering pipeline

Conclusion

ArcBox's Metal API bridging brings GPU acceleration to containerized workloads without sacrificing security or requiring code changes. It's a genuinely new approach to GPU virtualization that takes advantage of Apple Silicon's unique architecture.

In the coming months, we'll be rolling out multi-GPU support and expanded feature coverage. If you're running GPU workloads in containers, ArcBox just became a lot more interesting.

Give it a try and let us know what you build!

That's all. Except not.

Ready to ditch
Docker Desktop?

Join thousands of developers who switched to something faster, lighter, and built for the way they actually work.

Free for personal use. Pro plans available for teams.

ArcBox - Containers
Terminal

$ arcbox run nginx

Starting container...

Ready in 47ms

Sandbox