System Overview
A two-piece tele-operation system that turns any webcam into a remote control for a small robot car. The PC side runs MediaPipe to track the hand in real time, runs a TFLite gesture classifier over the landmark coordinates to map each frame to one of 10 gestures, and pushes the matching command over a TCP socket. The robot side is an ESP32 that hosts its own Wi-Fi access point (192.168.4.1:12345), listens for incoming commands, drives two DC motors through an H-bridge, and mirrors the active command on a 128×64 OLED for feedback. The whole pipeline runs in real time on a laptop with no cloud dependencies.
Tech Stack
Hand Tracking
MediaPipe runs locally on any webcam
10 Gestures
TFLite classifier on landmark coords
Wi-Fi Tele-op
ESP32 AP + TCP control loop
Engineering Features
Trained a TFLite gesture classifier over MediaPipe hand-landmark coordinates to recognize 10 distinct hand gestures.
Built an ESP32 firmware that hosts a Wi-Fi access point and drives two DC motors based on TCP commands over the air.
Added a 128×64 OLED on the robot showing the currently active command for at-a-glance feedback.
End-to-end pipeline runs in real time on a laptop webcam with no cloud dependencies.
Want to see more work?
I have more projects across robotics, embedded systems, and software engineering.
View All Projects
