Legged Locomotion in Challenging Terrains
using Egocentric Vision

Ananye Agarwal*1    ‪Ashish Kumar*2    Jitendra Malik2    Deepak Pathak1
1Carnegie Mellon University          2UC Berkeley CoRL 2022 Oral Presentation


Animals are capable of precise and agile locomotion using vision. Replicating this ability has been a long-standing goal in robotics. The traditional approach has been to decompose this problem into elevation mapping and foothold planning phases. The elevation mapping, however, is susceptible to failure and large noise artifacts, requires specialized hardware, and is biologically implausible. In this paper, we present the first end-to-end locomotion system capable of traversing stairs, curbs, stepping stones, and gaps. We show this result on a medium-sized quadruped robot using a single front-facing depth camera. The small size of the robot necessitates discovering specialized gait patterns not seen elsewhere. The egocentric camera requires the policy to remember past information to estimate the terrain under its hind feet. We train our policy in simulation. Training has two phases - first, we train a policy using reinforcement learning with a cheap-to-compute variant of depth image and then in phase 2 distill it into the final policy that uses depth using supervised learning. The resulting policy transfers to the real world without any fine-tuning and can traverse a large variety of terrain while being robust to perturbations like pushes, slippery surfaces, and rocky terrain.

Stepping Stones and Gaps

The robot is able to step over bar stools in various configurations and adapt the step size to cross large gaps. Since there is no vision near the hind feet, the robot must remember the location of the bar stools and place hind feet accordingly.

Stairs and Curbs

The robot is able to climb stairs upto 24cm high and only 30cm wide while climbing curbs upto 26cm high. Our policy generalizes to different stairs and curbs under a variety of lighting conditions. On out-of-distribution stairs and curbs, the robot initially gets stuck but is eventually able to traverse them using an emergent climbing up behavior.

Unstructured Terrain

The robot can traverse unstructured terrain that does not fall into one of the categories it is trained on. This shows the generalization capabilities of our system.

Locomotion in the dark

The depth camera projects a pattern using infrared light and is able to accurately estimate depth even when there is little to no ambient light.

More Videos


The trained policy is robust to large forces (5kg weight throws from large height) and slippery surfaces (water poured on plastic sheet).

Failure Cases

Robot falls from a high curb since the dip is too large and its extent cannot be seen from the front camera. There is no top-down camera and the front camera points straight ahead.

Rear leg steps in gap due to error in retrieving position of stool from memory (camera can only see stools in the front). This is the only failed run we observed among 16 runs on different stool configurations.