3we: An Open Platform for Sim-to-Real Embodied AI Research¶
Authors: 3we Contributors
Abstract. Embodied AI research requires tight integration between simulation environments and physical robots, yet current platforms impose significant barriers: high hardware costs ($5,000+), complex middleware stacks, and fundamental gaps between simulated and real-world behavior. We present 3we, an open-source robot platform that addresses these challenges through: (1) an AI-First Python API that eliminates direct ROS2 interaction, reducing typical task implementations from 60+ lines to under 5; (2) reproducible open hardware under CERN-OHL-P v2 with a bill of materials of approximately $300; (3) a Backend Abstraction Layer that enforces identical data formats and coordinate frames across four backends (Mock, Gazebo, Isaac Sim, and physical hardware); (4) four Gymnasium-compatible environments (Navigation, Exploration, ObjectNav, VLN) plus a multi-agent cooperative environment; (5) a Hardware Abstraction Layer supporting heterogeneous robot platforms; and (6) trajectory recording with LeRobot/HuggingFace Hub interoperability. Three standardized benchmark tasks (PointNav, ObjectNav, Exploration) with a public leaderboard enable reproducible evaluation. By dramatically lowering the cost and complexity of sim-to-real research, 3we aims to democratize embodied AI development for the broader research community.
1. Introduction¶
Embodied AI stands at an inflection point. The emergence of Vision-Language Models (VLMs) and Vision-Language-Action (VLA) models has demonstrated that foundation model capabilities can extend beyond text and images into physical action spaces [1, 2]. However, translating these advances into real-world robotic systems remains prohibitively difficult for most research groups.
Three primary barriers impede progress. First, research-grade mobile robot platforms typically cost $5,000-$25,000, restricting experimentation to well-funded laboratories. Second, the standard middleware stack (ROS2 + Nav2 + custom drivers) presents a steep learning curve orthogonal to AI research itself, requiring months of systems engineering before any learning experiment can begin. Third, the sim-to-real transfer gap remains poorly standardized: policies trained in simulation frequently fail on hardware due to undocumented differences in observation spaces, action semantics, and timing behavior between simulators and physical systems.
We introduce 3we (pronounced "three-we"), an open platform designed to eliminate these barriers. Our contributions are:
- An AI-First Python API (
threewepackage) that abstracts ROS2 entirely, exposing perception, navigation, and AI model integration through a minimal async interface. - Open hardware under the CERN Open Hardware Licence (Permissive v2), with a complete bill of materials costing approximately $300 and requiring no custom PCB fabrication.
- A Backend Abstraction Layer with a formally specified consistency contract, ensuring that code written for simulation executes identically on physical hardware with zero modification.
- Standardized benchmarks with public leaderboard infrastructure for reproducible evaluation of navigation and exploration policies.
The remainder of this paper is organized as follows: Section 2 discusses related work; Section 3 presents the system architecture; Section 4 details the Python API design; Section 5 describes the sim-to-real consistency mechanism; Section 6 introduces the benchmark suite; Section 7 covers the hardware design; Section 8 discusses the data ecosystem; and Section 9 concludes with future directions.
2. Related Work¶
TurtleBot Series. The TurtleBot family [3] has served as the de facto educational mobile robot platform for over a decade. However, TurtleBot3 is limited to differential drive kinematics, costs approximately $600-$1,500, ships without AI acceleration hardware, and requires direct ROS2 programming for all tasks. The 3we platform provides omnidirectional motion via mecanum wheels, 13 TOPS of edge AI compute, and a Python API that hides ROS2 complexity.
NVIDIA Isaac Sim. Isaac Sim [4] provides high-fidelity physics simulation with GPU-accelerated parallelism for reinforcement learning. While powerful for training, it is proprietary, requires substantial GPU resources (RTX 3070+), and provides no direct path to affordable physical hardware. 3we treats Isaac Sim as one of several interchangeable backends rather than a standalone platform.
HuggingFace LeRobot. LeRobot [5] establishes an excellent data ecosystem for robot learning, providing standardized dataset formats (Parquet + MP4) and pre-trained VLA model checkpoints. However, its mobile robot support is limited, with primary focus on manipulation tasks. 3we provides direct LeRobot format compatibility through its trajectory recorder while adding navigation-specific capabilities (SLAM, Nav2, exploration).
Habitat and AI2-THOR. These simulation platforms [6, 7] enable large-scale embodied AI experimentation in photorealistic environments but provide no deployment path to physical robots. Policies trained in these environments require separate engineering effort for real-world transfer. 3we bridges this gap through its consistency contract.
Genesis World. Genesis [8] offers generative simulation with differentiable physics, representing a promising future direction. As the ecosystem matures, 3we's modular backend architecture allows integration as an additional simulation backend.
3. System Architecture¶
The 3we platform employs a three-layer architecture separating user-facing AI research code from middleware and hardware concerns:
+----------------------------------------------------------+
| Layer 3: User Application |
| ┌─────────────────────────────────────────────────────┐ |
| │ from threewe import Robot │ |
| │ robot.get_camera_image() / robot.move_to(x,y) │ |
| │ robot.execute_instruction("find the red cup") │ |
| └─────────────────────────────────────────────────────┘ |
+----------------------------------------------------------+
| Layer 2: Backend Abstraction Layer |
| ┌────────┐ ┌──────────┐ ┌──────────────┐ ┌────────────────────┐ |
| │ Mock │ │ Gazebo │ │ Isaac Sim │ │ Real Hardware │ |
| │(No-dep)│ │ (CPU,CI) │ │ (GPU,Train) │ │ (Pi5+ESP32+ROS) │ |
| └────────┘ └──────────┘ └──────────────┘ └────────────────────┘ |
+----------------------------------------------------------+
| Layer 1: Physical / Simulated Substrate |
| ┌──────────────────────────────────────────────────────┐|
| │ ROS2 Jazzy · Nav2 · micro-ROS · Gazebo Harmonic │|
| │ ESP32-S3 · Mecanum Drive · Hailo-8L · SLAM Toolbox │|
| └──────────────────────────────────────────────────────┘|
+----------------------------------------------------------+
Hardware Platform. The physical robot comprises a Raspberry Pi 5 (8GB) as the main compute unit, a Hailo-8L M.2 accelerator providing 13 TOPS of neural network inference, an ESP32-S3 microcontroller running micro-ROS for real-time motor control, four 65mm mecanum wheels enabling omnidirectional motion, HC-SR04 ultrasonic sensors for basic obstacle avoidance (with optional LD06 LiDAR for SLAM), and a USB fisheye camera (170° FoV). Total weight is approximately 3.5 kg with a maximum linear velocity of 0.5 m/s.
Software Stack. The runtime environment uses ROS2 Jazzy with Nav2 for path planning, SLAM Toolbox for map building, and micro-ROS on the ESP32-S3 for 50 Hz motor control loops. Critically, none of these components are exposed to the end user; the threewe Python package mediates all interaction through the Backend Abstraction Layer.
4. AI-First Python API¶
4.1 Design Philosophy¶
The threewe package (v1.0.0) is designed for AI researchers who should never need to learn ROS2 concepts (nodes, topics, services, actions) to conduct embodied AI experiments. The API provides:
- Synchronous sensor access (images, LiDAR, pose, IMU)
- Async navigation primitives (
move_to,explore,follow_path) - Native VLM/VLA integration (
execute_instruction,VLARunner) - Gymnasium-compatible environments for RL training
4.2 Code Comparison¶
A typical navigation task illustrates the complexity reduction:
Traditional ROS2 (60+ lines):
import rclpy
from rclpy.node import Node
from nav2_simple_commander.robot_navigator import BasicNavigator
from geometry_msgs.msg import PoseStamped
# ... 60+ lines of node setup, spin, callback management
3we API (5 lines):
from threewe import Robot
async with Robot(backend="gazebo") as robot:
image = robot.get_camera_image() # (480, 640, 3) uint8
result = await robot.move_to(x=2.0, y=1.0)
print(f"Success: {result.success}")
4.3 Gymnasium Integration¶
The platform registers standard Gymnasium environments for seamless integration with any RL library (Stable-Baselines3, CleanRL, RLlib):
import gymnasium
env = gymnasium.make("3we/Navigation-v1")
obs, info = env.reset()
obs, reward, terminated, truncated, info = env.step(action)
Four environments of increasing difficulty are provided: Navigation-v1 (point-to-point), Exploration-v1 (frontier-based coverage), ObjectNav-v1 (semantic goal navigation), and VLN-v1 (vision-language navigation with instruction-following). A MultiAgentEnv is also available for cooperative multi-robot exploration.
4.4 VLM/VLA Native Support¶
For foundation model research, the API provides first-class integration:
# VLM-driven navigation (one line)
result = await robot.execute_instruction("find the red object")
# VLA model deployment
from threewe.ai import VLARunner
runner = VLARunner.from_pretrained("lerobot/act_3we_nav")
action = runner.predict(image, instruction="go to the kitchen")
5. Sim2Real Consistency¶
5.1 Consistency Contract¶
The Backend Abstraction Layer (BackendBase) enforces a formal contract across all backends:
| Data Type | Format | Convention |
|---|---|---|
| RGB Image | (H, W, 3) uint8 |
BGR color order |
| Depth | (H, W) float32 |
Meters, invalid = 0.0 |
| LiDAR | (N,) float32 |
Meters, uniform angular sampling |
| Pose | (x, y, theta) |
Right-hand, X-forward, Y-left, radians |
| Velocity | (vx, vy, omega) |
m/s linear, rad/s angular |
This contract is verified at the interface level: any backend implementation that violates these specifications fails type checking and runtime assertions.
5.2 Domain Randomization¶
The simulation backends support configurable domain randomization to improve transfer robustness:
- Physics: mass scaling (0.9-1.1x), friction variation (0.8-1.2x), motor torque noise
- Visual: texture randomization, lighting intensity variation (0.5-1.5x), color jitter
- Sensor: LiDAR/IMU/camera noise scaling, odometry slip simulation
5.3 Transfer Validation Protocol¶
To quantify sim-to-real gap, the platform provides a standardized protocol:
- Train policy in simulation with domain randomization
- Evaluate on 50 episodes across 3 simulation seeds
- Deploy identical code on physical hardware
- Evaluate on 10 matched real-world episodes
- Report transfer ratio:
SR_real / SR_sim
6. Benchmarks¶
6.1 Task Definitions¶
Three standardized benchmark tasks are provided:
| Task | Metric | Success Condition | Scenes |
|---|---|---|---|
| PointNav | SPL, Success Rate | Within 0.5m of goal | 7 environments |
| ObjectNav | SPL, Success Rate | Object detected and approached | 4 environments |
| Exploration | Coverage (%) | Percentage of navigable area mapped | 3 environments |
6.2 Metrics¶
We adopt standard embodied AI metrics:
- Success Rate (SR): Fraction of episodes reaching the goal condition.
- SPL (Success weighted by Path Length): SPL = (1/N) * sum(S_i * L_i / max(P_i, L_i)), penalizing inefficient paths [9].
- Coverage: Ratio of explored cells to total navigable cells in the occupancy grid.
6.3 Reproducibility¶
Each benchmark run records: random seed, domain randomization config, hardware profile, backend version, and episode-level trajectories. The public leaderboard accepts submissions via the CLI:
7. Hardware Design¶
7.1 Open Hardware License¶
All hardware designs (schematics, PCB layouts, mechanical drawings) are released under the CERN Open Hardware Licence - Permissive v2 (CERN-OHL-P), allowing unrestricted commercial and academic use.
7.2 Bill of Materials¶
The standard configuration costs approximately $300 (CNY equivalent):
| Component | Specification | Cost (approx.) |
|---|---|---|
| Compute | Raspberry Pi 5 8GB | $82 |
| AI Accelerator | Hailo-8L M.2 (13 TOPS) | $80 |
| AI HAT+ | M.2 carrier for Pi 5 | $27 |
| MCU | ESP32-S3 (micro-ROS) | $5 |
| Camera | USB Fisheye 170deg 1080P | $12 |
| Motors (x4) | N20 Gear Motor 1:50 w/ encoder | $7 each |
| Wheels (x4) | 65mm Mecanum (PU rollers) | $5 each |
| Battery | 2S 18650 packs (x2) | $10 each |
| Structure + misc | Mounts, cables, hub, cooling | $25 |
| Total | ~$300 |
7.3 Key Specifications¶
| Parameter | Value |
|---|---|
| Dimensions | ~250 x 220 x 150 mm |
| Weight | ~2.5 kg |
| Drive type | 4-wheel mecanum (omnidirectional) |
| Max velocity | 1.2 m/s linear, 3.0 rad/s angular |
| AI compute | 13 TOPS (Hailo-8L) + Pi 5 CPU |
| SLAM | 2D LiDAR, SLAM Toolbox |
| Control loop | 50 Hz (ESP32 micro-ROS) |
| Communication | Wi-Fi (DTLS encrypted) |
| Battery life | ~2 hours continuous operation |
8. Data Ecosystem¶
8.1 Trajectory Recording¶
The TrajectoryRecorder class captures observation-action pairs during teleoperation or autonomous execution at the full control rate:
from threewe.data import TrajectoryRecorder
recorder = TrajectoryRecorder(robot)
recorder.start_episode()
# ... robot operates ...
recorder.end_episode()
recorder.export("dataset_v1", format="lerobot")
Each timestep records: RGB image, pose (x, y, theta), velocity, commanded action, and optionally LiDAR and depth data.
8.2 LeRobot Format Compatibility¶
Exported datasets follow the LeRobot specification (Parquet metadata + MP4 video streams), enabling direct use with the LeRobot training pipeline and HuggingFace Hub distribution:
8.3 VLA Deployment Pipeline¶
The complete pipeline from data collection to model deployment:
- Collect demonstrations via teleoperation or VLM-guided execution
- Export to LeRobot format and upload to HuggingFace Hub
- Train VLA model using LeRobot training scripts
- Deploy trained model on-device via
VLARunner.from_pretrained() - Evaluate on standardized benchmarks
The Hailo-8L accelerator enables on-device VLA inference at approximately 10 Hz for typical model architectures (ACT, Diffusion Policy).
9. Conclusion and Future Work¶
We have presented 3we, an open platform that addresses the three primary barriers to embodied AI research: cost, complexity, and sim-to-real consistency. The AI-First Python API reduces implementation effort by an order of magnitude compared to direct ROS2 programming. The open hardware design enables reproduction at approximately $300. The Backend Abstraction Layer with its formal consistency contract ensures that code developed in simulation transfers to physical hardware without modification.
Future directions include: (1) Isaac Sim backend completion for GPU-parallel RL training with thousands of concurrent environments; (2) a Hardware Abstraction Layer (HAL) enabling the threewe API to drive third-party robot platforms (TurtleBot, Unitree, custom hardware); (3) multi-robot fleet coordination for cooperative exploration and task allocation; and (4) integration with emerging generative simulation platforms (Genesis) for automatic environment generation.
We invite the robotics and AI research community to adopt, reproduce, and extend the platform. All source code, hardware designs, and documentation are available at https://github.com/3we-org/3we-robot-platform under permissive open-source licenses.
References¶
[1] A. Brohan et al., "RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control," in Proc. CoRL, 2023.
[2] OpenAI, "GPT-4o: Multimodal Reasoning Across Text, Vision, and Audio," OpenAI Technical Report, 2024.
[3] R. Amsters and P. Slaets, "TurtleBot3: A Mobile Robot Platform for Education and Research," in IROS Workshop on Open Source Robotics, 2020.
[4] NVIDIA, "Isaac Sim: Scalable Robotics Simulation for Synthetic Data Generation and Robot Learning," NVIDIA Developer Documentation, 2023.
[5] C. Cadene et al., "LeRobot: State-of-the-art Machine Learning for Real-World Robotics," HuggingFace, 2024.
[6] M. Savva et al., "Habitat: A Platform for Embodied AI Research," in Proc. ICCV, 2019.
[7] E. Kolve et al., "AI2-THOR: An Interactive 3D Environment for Visual AI," arXiv:1712.05474, 2017.
[8] Genesis Contributors, "Genesis: A Generative and Universal Physics Engine for Robotics and Beyond," arXiv:2409.18051, 2024.
[9] P. Anderson et al., "On Evaluation of Embodied Navigation Agents," arXiv:1807.06757, 2018.
[10] S. Macenski et al., "The Marathon 2: A Navigation System," in Proc. IROS, 2020. (Nav2)
[11] M. Towers et al., "Gymnasium: A Standard Interface for Reinforcement Learning Environments," arXiv:2407.17032, 2024.
[12] J. Schulman et al., "Proximal Policy Optimization Algorithms," arXiv:1707.06347, 2017.
Citation¶
If you use the 3we platform in your research, please cite: