Workflows

Train SmolVLA

Training is done outside of this Docker Compose stack — you need a machine or cloud instance with a GPU. The config file is configs/smolvla_ar4.yaml.

Option A — Local GPU

With lerobot installed in your Python environment:

lerobot-train \
  --config_path gpu-server/ar4-train/configs/smolvla_ar4.yaml \
  --dataset.repo_id local/ar4_pick_place

Checkpoints save to data/checkpoints/smolvla_ar4/ every 10 000 steps.

Option B — Remote GPU server

The gpu-server/ar4-train/ directory contains everything you need for training — you don't need the full repo on the GPU machine.

  1. Get the training files on the server:

    # On the GPU server — shallow clone, then go straight to the training directory
    git clone --depth 1 https://github.com/aegean-ai/ar4-physical-ai.git
    cd ar4-physical-ai/gpu-server/ar4-train
  2. Sync your dataset to the server:

    # From your local machine
    rsync -avz data/datasets/ user@gpu-server:~/ar4-physical-ai/gpu-server/ar4-train/data/datasets/
  3. Train on the server:

    # On the GPU server
    pip install lerobot
    lerobot-train \
      --config_path configs/smolvla_ar4.yaml \
      --dataset.repo_id local/ar4_pick_place

    Checkpoints save to data/checkpoints/smolvla_ar4/ under the training directory.

  4. Copy the checkpoint back locally:

    rsync -avz user@gpu-server:~/ar4-physical-ai/gpu-server/ar4-train/data/checkpoints/ data/checkpoints/

Key training config options

Open gpu-server/ar4-train/configs/smolvla_ar4.yaml to adjust:

FieldDefaultDescription
steps50000Total training steps
batch_size32Reduce if you get OOM errors
save_freq10000How often to write a checkpoint
freeze_vision_encodertrueKeep VLM backbone frozen (faster, usually better for small datasets)
wandb.enablefalseSet to true to log to Weights & Biases

On this page