Skip to content

Containers

To run LLMs easily from laptops and desktops to the cloud, we will be introducing Apptainer (formerly Singularity) to achieve this where it has largely been adopted by the community to execute HPC workloads locally and in HPC data centers.

Benefits of Apptainer

Apptainer has quite a few benefits such as:

  • Mobility of Compute: It supports computational mobility through a single-file SIF container format.
  • Integration over Isolation: It emphasizes integration, utilizing GPUs, high-speed networks, and parallel filesystems by default. In particular, it provides native support for running application containers using NVIDIA's CUDA or AMD's ROCm, enabling easy access to GPU-enabled machine learning frameworks like PyTorch, irrespective of the host operating system, provided the host has a driver and library installation for CUDA/ROCm.
  • Compatility with Docker: It is 100% OCI compatible and aims for maximum compatibility with Docker.
  • Designed for Security: It allows unprivileged users to use containers while preventing privilege escalation within the container and this means that users are the same within and outside the container.
  • Verifiable Reproducibility and Security: It ensures trustworthy reproducibility and security through cryptographic signatures and an immutable container image format.

Installing Apptainer

The following guide assumes a Linux distribution. We have also rigorously tested this on Windows WSL2 Ubuntu which works perfectly, so if you are on a Windows machine, this guide would work too. We are pleasantly surprised till this day, the sheer investment and improvement Windows made to bring WSL2 to a state where it is almost as good as a bare metal Linux distribution installation.

Install Apptainer

All you need is to install Apptainer to be able to leverage on this repository to work in containers with multiple environments (CPU/GPU) with any packages and OS independent of your host (local) machine.

Install NVIDIA libraries

I advise you to use the Lambda Stack to manage your drivers effectively without errors. It's a single bash script that allows you to install and upgrade your NVIDIA drivers.

Check CUDA toolkit installation via nvcc -V

Set NVIDIA Container CLI Apptainer Paths (Deprecated)

Not necessary in Apptainer, only in Singularity.

To get nvidia-container-cli path

which nvidia-container-cli

To set

sudo singularity config global --set "nvidia-container-cli path" "/usr/bin/nvidia-container-cli"

To check

sudo singularity config global --get "nvidia-container-cli path"

How to Use Apptainer in CPU/GPU Mode

Option A: Transparent Image Container Workflow

This is recommended to build apptainer containers as it's transparent with the greatest reproducibility.

1. Build container (.sif)

To build and test apptainer image container, simply run the following command in any of the folders:

bash build_sif.sh

2. Run container

For CPU

apptainer shell apptainer_container_$VER_sandbox.sif

For GPU using NVIDIA Container CLI

apptainer shell --nv --nvccli apptainer_container_$VER_sandbox.sif

A quick way to test if you're able have everything good to run is to run nvidia-smi when you shell into the container.

Notes on GPU Availability

If you do not have a GPU on your host or local machine, the image will still be built and can be shared and used by machines with GPUs! The test for CUDA will fail on your local machine without GPU which will just spit out an error that you do not have a driver. There's nothing to worry about that.

Option B: Blackbox

This is not recommended to build production apptainer containers but it is good for experimentation to determine the right configuration to put into your definition file for Option A.

Note, replace $VER with whatever version is in the specific folder.

1. Build container (.sif)

To build and test apptainer image container, simply run the following command in any of the folders:

bash build_sif.sh

2. Convert container to mutable sandbox

apptainer build --sandbox apptainer_container_$VER_sandbox apptainer_container_$VER.sif

3. Shell into writable sandbox

You can install new packages and make persisting changes to the container in this step for experimentation.

apptainer shell --writable apptainer_container_$VER_sandbox

4. Convert container to immutable image (.sif)

apptainer build --sandbox apptainer_container_$VER_sandbox.sif apptainer_container_$VER_img

Basic Apptainer Commands Cheatsheet

This provides a list of useful commands.

# test apptainer
apptainer version

# pull cowsay container from docker
apptainer pull docker://sylabsio/lolcow

# build (generic that can also build from local definition files, it is like a swiss army knife, more flexible than pull)
apptainer build lolcow.sif docker://sylabsio/lolcow

# run option 1 requiring to shell into the container
apptainer shell lolcow.sif
cowsay moo

# run option 2 without requiring to shell into the container
apptainer exec lolcow.sif cowsay moo

# clean cache during builds
apptainer cache clean

Available Containers & Definition Files

We maintain a repository of Apptainer recipes in this repository, feel free to clone/download it locally.

GPU Containers

GPU containers can be found in ./containers/gpu when you clone the above repository.

Ollama Workloads

Ollama (Mistral 7b) Workloads
  • Go into container folder: cd ./containers/gpu/ollama
  • Run 1st session apptainer shell --nv --nvccli apptainer_container_0.1.sif
    • ollama serve
  • Run 2nd session (another window) apptainer shell --nv --nvccli apptainer.1.sif
    • ollama run mistral
    • You can now communicate with mistral model in your bash, or any other model you can pull on ollama website

Model Choice

This runs a Mistral model as an example. You can run any other models by swapping out mistral reference above to any models on Ollama's library.

Ollama (Gemma:2b Gemma:7b) Workloads
  • Go into container folder: cd ./containers/gpu/ollama
  • Run 1st session apptainer shell --nv --nvccli apptainer_container_0.1.sif
    • ollama serve
  • Run 2nd session (another window) apptainer shell --nv --nvccli apptainer.1.sif
    • ollama run gemma:7b or ollama run gemma:2b
    • You can now communicate with gemma model in your bash, or any other model you can pull on ollama website
Ollama (Llava 7b-v.16) Workloads
  • Go into container folder: cd ./containers/gpu/ollama
  • Run 1st session apptainer shell --nv --nvccli apptainer_container_0.1.sif
    • ollama serve
  • Run 2nd session (another window) apptainer shell --nv --nvccli apptainer.1.sif
    • ollama run llava:7b-v1.6
    • You can now communicate with multi-modal llava model in your bash, or any other model you can pull on ollama website

LLamaindex workloads

  • Go into container folder: cd ./containers/gpu/llamaindex
  • Run 1st session (first window)apptainer shell --nv --nvccli apptainer_container_0.1.sif
    • ollama serve
  • Run 2nd session (second window) apptainer shell --nv --nvccli apptainer_container_0.1.sif
    • ollama run mistral
  • Run 3rd session (third window) apptainer shell --nv --nvccli apptainer_container_0.1.sif
    • python
    • from llama_index.llms import Ollama
    • llm = Ollama(model="mistral")
    • response = llm.complete("What is Singapore")
    • print(response)

Model Choice

This runs a Mistral model as an example. You can run any other models by swapping out mistral reference above to any models on Ollama's library.

CPU Containers

CPU containers can be found in ./containers/cpu when you clone the above repository.

Math Workloads

  • Go into container folder: cd ./containers/cpu/math
    • Run apptainer shell apptainer_container_0.2.sif

Summary

We introduced Apptainer as a way to run CPU or GPU workloads that can scale from your desktops, to on-prem HPC data centers, or to cloud providers like Azure, AWS, and GCP. Subsequently, we introduced running LLMs within Apptainer offline locally with Gemma 2b/7b (Google) and Mistral 7b (Mistral AI) models. With portable container to run our code and models, we can now move to diving deep into LLMs, Multimodal Language Models, and Retrieval Augmented Generation (RAG).

Comments