CUDA Out of Memory

Strategies

  1. Batch Size Adjustment

    • Reduce the batch size to fit the available GPU memory.
    • Smaller batches use less memory, allowing the model to process without exceeding the GPU's capacity.
    • Trade-off: Smaller batch sizes may lead to noisier gradients and slower convergence.
  2. Gradient Accumulation

    • Split a large batch into smaller "micro-batches" that fit into memory.
    • Perform forward and backward passes for each micro-batch, accumulating gradients.
    • Update the model's weights after accumulating gradients for the desired batch size.
    • This allows training with effectively larger batch sizes without exceeding memory limits.
  3. Model Simplification

    • Reduce the size of the model by using fewer layers or smaller layer dimensions.
    • Use model pruning or quantization to make the model lighter.
    • Consider swapping to more memory-efficient architectures or using pre-trained models with fewer parameters.
  4. Data Preprocessing

    • Resize or crop input images to smaller dimensions to reduce memory usage.
    • Normalize and optimize data loading pipelines to minimize GPU workload.
    • Use mixed precision training (with torch.cuda.amp) to reduce the memory footprint of computations.
  5. Memory Cleanup

    • Explicitly free unused GPU memory using torch.cuda.empty_cache().
    • Ensure no unnecessary tensors are being stored in memory by removing references.
    • Use checkpointing to save intermediate activations and recompute them during the backward pass, saving memory at the cost of additional computation.