Reinforcement Learning (RL) is recently introduced to biped robot control for improved balancing performance against complex external disturbances that cannot be modelled. Existing model-based RL and model-free RL approaches are both demanding in computational resources and are of low computational efficiency when handling high-degree-of-freedom biped robots.
It is possible to develop a new RL algorithm that utilises the advantages of model-based and model-free RL to reduce the convergence time while retaining the asymptotic stability. The proposed RL algorithm was developed, in which model-based RL is used for offline training to obtain a rough model of the system, and the model-free RL is used to fine-tune the policy. Simulations were conducted with a NAO biped robot on a rotating platform, and the proposed RL algorithm was proven to be able to maintain the balance of the robot.
Personnel - Ao Xi (PhD student)