Train SmolVLA

Training is done outside of this Docker Compose stack — you need a machine or cloud instance with a GPU. The config file is configs/smolvla_ar4.yaml.

Option A — Local GPU

With lerobot installed in your Python environment:

lerobot-train \
  --config_path gpu-server/ar4-train/configs/smolvla_ar4.yaml \
  --dataset.repo_id local/ar4_pick_place

Checkpoints save to data/checkpoints/smolvla_ar4/ every 10 000 steps.

Option B — Remote GPU server

The gpu-server/ar4-train/ directory contains everything you need for training — you don't need the full repo on the GPU machine.

Get the training files on the server:
# On the GPU server — shallow clone, then go straight to the training directory git clone --depth 1 https://github.com/aegean-ai/ar4-physical-ai.git cd ar4-physical-ai/gpu-server/ar4-train
Sync your dataset to the server:
# From your local machine rsync -avz data/datasets/ user@gpu-server:~/ar4-physical-ai/gpu-server/ar4-train/data/datasets/
Train on the server:
# On the GPU server pip install lerobot lerobot-train \ --config_path configs/smolvla_ar4.yaml \ --dataset.repo_id local/ar4_pick_place
Checkpoints save to data/checkpoints/smolvla_ar4/ under the training directory.
Copy the checkpoint back locally:
rsync -avz user@gpu-server:~/ar4-physical-ai/gpu-server/ar4-train/data/checkpoints/ data/checkpoints/

Key training config options

Open gpu-server/ar4-train/configs/smolvla_ar4.yaml to adjust:

Field	Default	Description
`steps`	`50000`	Total training steps
`batch_size`	`32`	Reduce if you get OOM errors
`save_freq`	`10000`	How often to write a checkpoint
`freeze_vision_encoder`	`true`	Keep VLM backbone frozen (faster, usually better for small datasets)
`wandb.enable`	`false`	Set to `true` to log to Weights & Biases

Option A — Local GPU

Option B — Remote GPU server

Key training config options

On this page