Architecture
AR4 Physical-AI is a VLA (Vision-Language-Action) platform layered on top of the AR4 ROS driver and LeRobot. The key architectural decisions:
- LeRobot-native — uses lerobot-ros as the bridge between ROS 2 and LeRobot for recording, training, and inference (see Research: LeRobot Integration)
- Submodule for driver — upstream
ar4_ros_driverstays independently updateable - Docker-first — multi-stage GPU containers (base → overlay → dev) with docker-compose orchestration
- Simulation-first — physics-enabled Gazebo world with gravity, contact properties, and graspable objects for policy development
- Zenoh middleware — decouples non-ROS components from DDS for future inference pipelines
- LeRobot dataset format — episodes recorded via
lerobot-record, stored in LeRobot v3.0 format
System Overview
LeRobot Integration
The platform uses lerobot-ros by the same author as the AR4 ROS driver. It provides a LeRobot Robot plugin (AnninAR4) that bridges ROS 2 topics to LeRobot's recording, training, and inference pipeline.
See Research: LeRobot Integration for the full investigation and rationale.
Data flow
- Recording —
lerobot-recordcallsROS2Robot.get_observation()(subscribes to/joint_states) andROS2Robot.send_action()(publishes to MoveIt Servo or trajectory controller). Episodes are stored in LeRobot v3.0 dataset format. - Training —
lerobot-traintrains an ACT, Diffusion Policy, or VLA model from the recorded dataset. - Inference —
lerobot-evaluateruns the trained policy through the sameROS2Robotinterface back into Gazebo or real hardware.
VLA backend progression
| Phase | Backend | Notes |
|---|---|---|
| v1 | LeRobot ACT | Imitation learning baseline, trained on AR4 teleop data |
| v1.5 | Cross-embodiment transfer | Fine-tune SO-101 policies on AR4 data |
| v2 | Pi0 / GR00T N1.5 | Foundation VLA models, zero-shot or fine-tuned |
Simulation
The annin_ar4_gazebo package (from the upstream vendor submodule) provides the Gazebo Harmonic simulation. Launch with:
Or via Docker:
Docker Infrastructure
All services run in Docker containers orchestrated by docker-compose.yaml. The Dockerfile uses a multi-stage build:
| Service | Image | Purpose |
|---|---|---|
sim-tabletop | overlay | Gazebo GUI simulation |
sim-tabletop-headless | overlay | Server-only Gazebo (CI, headless) |
moveit | overlay | MoveIt2 motion planning + RViz2 |
hardware | base | Real AR4 hardware driver (calibrates on startup) |
moveit-hardware | base | MoveIt2 + RViz2 for real hardware (auto-starts hardware) |
foxglove-bridge | overlay | WebSocket bridge for Foxglove Studio (:8765) |
zenoh-router | eclipse/zenoh | Central Zenoh broker with in-memory storage (:7447) |
zenoh-bridge | overlay | DDS-to-Zenoh bridge for cross-container pub/sub |
dev | dev | VS Code devcontainer with source mounts |
Zenoh Middleware
Zenoh provides a lightweight pub/sub transport layer that decouples non-ROS components (like the Optuna PID tuner) from the DDS discovery mesh. This avoids requiring ROS 2 in every container.
CycloneDDS Configuration
This host has multiple network interfaces (Docker bridges, Tailscale, Cloudflare WARP) which confuse CycloneDDS multicast discovery. Two configs force loopback-only:
docker/cyclonedds.xml(bundled at/etc/ros/cyclonedds.xml) — sim and hardware containerszenoh/cyclonedds-bridge.xml— zenoh-bridge (without SharedMemory element, unsupported by bundled CycloneDDS)
Episode Recording
Episodes are recorded using lerobot-record through the lerobot-ros bridge. Each frame captures:
- Joint positions (6 DOF + gripper) as
observation.state - Camera images as
observation.images.{camera_name} - Joint action commands as
action - Natural language task instruction
Datasets are stored in LeRobot v3.0 format (Parquet + MP4) and can be pushed to HuggingFace Hub.
Repository Structure
External dependencies (pip-installed, not in this repo):