GPU & Deep Learning

Modified

2026-05-14

Bertha has NVIDIA GPUs available for deep learning and GPU-accelerated computing. This page covers how to use them from Python (PyTorch, TensorFlow) and R (torch).

TipThe key thing to know

You do not need to install CUDA yourself. PyTorch, TensorFlow, and R’s torch package all bundle their own CUDA runtime libraries. Just install the framework and it works.

Checking GPU availability

Before getting started, verify the GPUs are accessible:

# Quick check — shows GPU model, driver version, and current usage
nvidia-smi

# Or use the bertha dashboard
bertha

You can also monitor GPU usage interactively with nvtop (pre-installed).

PyTorch

Installation

Use uv to install PyTorch with GPU support:

# In a project
uv add torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

# Or ad-hoc
uv pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

The cu121 suffix means PyTorch comes bundled with CUDA 12.1 runtime libraries. This is the recommended variant — it works regardless of which system CUDA version (if any) is installed.

Verify GPU access

import torch
print("CUDA available:", torch.cuda.is_available())
print("GPU count:", torch.cuda.device_count())
if torch.cuda.is_available():
    print("GPU name:", torch.cuda.get_device_name(0))

If torch.cuda.is_available() returns False, see Troubleshooting below.

TensorFlow

Installation

# Install TensorFlow with bundled CUDA runtime
uv pip install "tensorflow[and-cuda]"

The [and-cuda] extra includes CUDA runtime libraries, so no system CUDA is needed.

Verify GPU access

import tensorflow as tf
print("GPU available:", tf.config.list_physical_devices('GPU'))

R torch

The R torch package also bundles its own CUDA libraries:

install.packages("torch")
torch::torch_is_installed()
torch::cuda_is_available()

If CUDA is not detected, torch may need a specific CUDA toolkit version available on the system. Contact the admin if you run into issues.

GPU memory management

Bertha’s GPUs are shared between users. Be mindful of memory usage:

# PyTorch — check current memory usage
import torch
print(f"Allocated: {torch.cuda.memory_allocated() / 1024**3:.2f} GB")
print(f"Cached: {torch.cuda.memory_reserved() / 1024**3:.2f} GB")

# Free unused cached memory
torch.cuda.empty_cache()
# TensorFlow — allow memory growth instead of grabbing all GPU memory
import tensorflow as tf
for gpu in tf.config.list_physical_devices('GPU'):
    tf.config.experimental.set_memory_growth(gpu, True)
NoteMonitoring

Use nvtop or the Bertha dashboard to check GPU utilization and memory usage before starting large jobs. If someone else is using most of the GPU memory, coordinate or wait.

Troubleshooting

torch.cuda.is_available() returns False

  1. Check that the NVIDIA driver is loaded: run nvidia-smi. If this fails, the system may need a reboot after a driver update.

  2. Make sure you installed the CUDA-enabled variant of PyTorch (with cu121 or similar in the index URL). The default pip install torch installs CPU-only.

  3. Verify inside Python:

    import torch
    print(torch.version.cuda)  # Should show e.g. "12.1", not None

Out of memory errors

  • Check who’s using the GPU: nvidia-smi or bertha -d
  • Free cached memory: torch.cuda.empty_cache()
  • Reduce batch size in your training loop
  • Use mixed precision training (torch.amp) to halve memory usage

TensorFlow not finding GPU

# Check what TensorFlow sees
import tensorflow as tf
print(tf.config.list_physical_devices())

If no GPU is listed, ensure you installed tensorflow[and-cuda] (not just tensorflow).


System Administration

The rest of this page covers driver installation, CUDA toolkit management, and system maintenance. Regular users don’t need this section.

NVIDIA driver installation

# Check GPU and recommended driver
lspci | grep -i nvidia
ubuntu-drivers devices

# Install recommended driver
sudo apt install nvidia-driver-545

# Reboot required after installation
sudo reboot

Driver updates and reboots

NVIDIA drivers are backwards-compatible with CUDA versions, so updating drivers does not break existing PyTorch/TensorFlow installations. However, after a driver update via apt, GPUs will not be detected until the next reboot (the old kernel module is still loaded).

Best practice: time system updates with planned reboots. There is no need to pin driver versions as long as updates include a reboot.

Unattended upgrades

To prevent NVIDIA driver updates from happening automatically (which would leave GPUs unusable until reboot), NVIDIA and CUDA packages are blacklisted in /etc/apt/apt.conf.d/50unattended-upgrades. This means driver updates must be applied manually.

CUDA toolkit management

Most users don’t need a system CUDA installation — frameworks bundle their own. System CUDA is only needed for:

  • Custom CUDA C/C++ code that requires nvcc
  • R torch when it can’t find a compatible bundled CUDA version

If needed, multiple CUDA toolkit versions can be installed side-by-side:

# Add NVIDIA CUDA repository
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2404/x86_64/cuda-keyring_1.1-1_all.deb
sudo dpkg -i cuda-keyring_1.1-1_all.deb
sudo apt update

# Install specific toolkit versions (no driver)
sudo apt install cuda-toolkit-11-8
sudo apt install cuda-toolkit-12-1

# These install to /usr/local/cuda-11.8/, /usr/local/cuda-12.1/, etc.

Environment modules can be used to switch between versions:

module load cuda/12.1
nvcc --version
Note

The effort to provide multiple CUDA versions via environment modules has been largely superseded by frameworks bundling their own CUDA runtime. Module-based CUDA is maintained for edge cases but is not the primary approach.

Driver → CUDA compatibility reference

NVIDIA Driver Supports CUDA Runtimes PyTorch Options
520+ CUDA 11.8+ cu118
530+ CUDA 12.1+ cu118, cu121
545+ CUDA 12.3+ cu118, cu121, cu126

References