Group 5 — Monday 6–7:30pm · TidyBot2 with bimanual WX250s arms, RealSense depth camera, and a Gemini-powered natural language interface.
Household and laboratory environments contain objects that constantly need to be retrieved, reorganized, or moved on demand. A robot that can understand natural spoken commands and autonomously execute multi-step manipulation tasks—locating, navigating to, and grasping objects—would be a meaningful real-world collaborator.
We built a fully integrated pipeline on the TidyBot2 platform: a holonomic mobile base with two 6-DOF WX250s arms, an Intel RealSense D435 depth camera on a pan-tilt mount, and an onboard Intel NUC running ROS2 Humble.
TidyBot2 with bimanual arms, pan-tilt RealSense camera, and test objects
User says "retrieve the banana." Robot detects the object with YOLO, navigates to it, grasps it with the right arm, and returns to the starting position while holding the object.
User says "pick up the banana and place it in the bin." Robot detects, navigates, grasps, then drops the object 25 cm to its right—into a bowl or bin positioned beside the robot.
Robot detects a red pillow using an HSV-based color detector, navigates to it, and performs a coordinated two-arm grasp to pick it up — demonstrating bimanual manipulation for large or deformable objects a single arm cannot reliably handle.
The system is composed of specialized ROS2 nodes orchestrated by task-specific coordinators. Each node has a single responsibility; the coordinator sequences them via topics and state transitions.
| Node | File | Role |
|---|---|---|
nlp_interface_node.py |
tidybot_control/ | Voice/text interface — Gemini parsing, command confirmation, and runtime object targeting |
task1_coordinator.py |
tidybot_bringup/scripts/ | Task 1 state machine — detect → navigate → pick up → return to start |
task2_coordinator.py |
tidybot_bringup/scripts/ | Task 2 state machine — detect → navigate → pick up → drop in bowl/bin |
coordinator_node_task3.py |
tidybot_bringup/scripts/ | Task 3 state machine — detect → navigate → (redetect) → bimanual pick up |
detect_object_real.py |
tidybot_bringup/scripts/ | YOLOv11 + RealSense depth → 3D object pose in base_link (Tasks 1 & 2) |
detect_object_real_task3.py |
tidybot_bringup/scripts/ | HSV color segmentation for red pillow detection (Task 3) |
navigate_to_object.py |
tidybot_bringup/scripts/ | Proportional controller → standoff 0.4m + 0.15m lateral offset (Tasks 1 & 2) |
navigate_to_object_task3.py |
tidybot_bringup/scripts/ | Proportional controller → standoff 0.3m, no lateral offset (Task 3) |
task1_pickup.py |
tidybot_bringup/scripts/ | Task 1 arm — approach → descend → grasp → lift (holds object) |
task2_pickup.py |
tidybot_bringup/scripts/ | Task 2 arm — approach → descend → grasp → lift → drop 25 cm right |
pickup_task3.py |
tidybot_bringup/scripts/ | Task 3 bimanual — R_APPROACH → R_DESCEND → R_GRASP → L_APPROACH → L_DESCEND → L_GRASP → LIFT → ROTATE → RELEASE |
Task 1 — Object Retrieval
Task 2 — Pick and Place
Task 3 — Bimanual Pillow Retrieval
Waits for a voice command or manual trigger on /coordinator/start.
Tasks 1&2: accumulates 3 confident YOLO detections (≥0.35 confidence), averages x/y into a stable nav goal. Task 3: accumulates 15 HSV pose samples and averages them before locking the navigation target.
Drives to standoff position and aligns yaw; tilts camera down on arrival.
3-second settling delay for stale perception data to clear before pickup.
Camera sweep across 6 pan-tilt positions for close-range re-detection of the pillow.
Triggers the task-specific pickup node. Task 1 holds; Task 2 drops 25 cm right; Task 3 bimanual grasp + rotate.
Drives back to the saved start position (0, 0).
DONE returns to IDLE after 2s. FAILED publishes zero-velocity e-stop and resets immediately.
Tasks 1&2: YOLOv11n on RGB frames, median depth patch, back-project to base_link via TF.
Task 3: HSV color segmentation for red (dual hue bands 0–10 & 170–180), largest contour above 500 px.
YOLOv11 real-time detection on TidyBot2 camera feed
Proportional controller at 50 Hz. Tasks 1&2: 0.4m standoff + 0.15m lateral offset. Task 3: 0.3m standoff, no lateral offset. Stop-and-rotate when >60° misaligned.
Task 1: APPROACH → DESCEND → GRASP → LIFT (holds object).
Task 2: Same + DROP 25 cm right into bowl/bin.
Task 3: R_APPROACH → R_DESCEND → R_GRASP → L_APPROACH → L_DESCEND → L_GRASP → LIFT → ROTATE 90° → RELEASE.
Records audio, transcribes via SpeechRecognition, passes text to Google Gemini.
Returns structured JSON {action, object, target} to dynamically configure the detector.
Task 1 — Voice-commanded object retrieval on real hardware
Task 2 — Sequential pick-and-place on real hardware
Task 3 — Bimanual pillow retrieval on real hardware
Navigation in MuJoCo simulation
Camera Sweep RViz Visualization
Robot Navigation: Return to Origin
Group 5 — Monday 6–7:30pm · TA: Giuse Pham
task1_pickup.py and task2_pickup.py end-to-end, including the full pickup pipelines (approach, descend, grasp, lift, drop)detect_object_real.py — object poses flow directly from YOLO into IK-based grasp targetstask1_coordinator.py and task2_coordinator.py state machines orchestrating the full detect → navigate → pick → return/drop pipelinetask2_pickup.py for robust close-range re-detection before graspingnavigate_to_object.py with standoff positioning, lateral offset, and coordinate transforms for arm-reachable approachcoordinator_node.py) orchestrating the full detect → navigate → pick end-to-end flowtest_bimanual.py for dual-arm testingsafe_movement.py and test utilities for safer motion and debugging.{action, object, target}) through multi-turn conversation contextdetect_object_real_task3.py) with improved 3D pose estimation and a hardcoded-pose toggle for reliable hardware runsnavigate_to_object_task3.py) with three targeted control fixes to prevent overshoot: stop-and-rotate when heading error exceeds 60°, proportional speed ramp in the final 0.2 m to avoid coasting past the standoff, and a tighter heading gate (30° vs 45°) to correct yaw earlier before forward motion; also added a post-arrival face_object alignment phase and fixed standalone mode for immediate pose acceptancepickup_task3.py) with hardcoded grasp positions for consistent two-arm graspingskip_redetect parameter and resolved 3 critical pipeline bugs before hardware testingAll code is open-source and available on GitHub.
github.com/jameszcheng/collaborative-robotics-2026-group5
ROS2 Humble workspace with MuJoCo simulation, full coordinator pipeline, perception, navigation, and manipulation nodes.
scripts/task1_coordinator.py — Task 1 state machinescripts/task2_coordinator.py — Task 2 state machinescripts/coordinator_node_task3.py — Task 3 state machinescripts/detect_object_real.py — YOLOv11 perceptionscripts/detect_object_real_task3.py — HSV pillow detectorscripts/navigate_to_object.py — Navigation (Tasks 1&2)scripts/navigate_to_object_task3.py — Navigation (Task 3)scripts/task1_pickup.py — Task 1 arm (hold)scripts/task2_pickup.py — Task 2 arm (drop)scripts/pickup_task3.py — Task 3 bimanual pickup# Build cd ros2_ws && source /opt/ros/humble/setup.bash && colcon build # Launch robot + task pipeline source setup_env.bash ros2 launch tidybot_bringup real.launch.py use_planner:=true # Terminal 1 ros2 launch tidybot_bringup task1.launch.py # Terminal 2 # Manual trigger (no voice needed) ros2 topic pub /coordinator/start std_msgs/String "data: banana" --once