TY - JOUR
T1 - Reinforcement learning for hexapod robot trajectory control
T2 - a study of Q-learning and SARSA algorithms
AU - Benyoucef, Ahmed
AU - Zennir, Youcef
AU - Belatreche, Ammar
AU - Silva, Manuel F.
AU - Benghanem, Mohamed
PY - 2025/10/25
Y1 - 2025/10/25
N2 - Hexapod robots, with their six-legged design, excel in stability and adaptability on challenging terrain but pose significant control challenges due to their high degrees of freedom. While reinforcement learning (RL) has been explored for robot navigation, few studies have systematically compared on-policy and off-policy methods for multi-legged locomotion. This work presents a comparative study of SARSA and Q-Learning for trajectory control of a simulated hexapod robot, focusing on the influence of learning rate (α), discount factor (γ), and eligibility trace (λ). The evaluation spans eight initial poses, with performance measured through lateral deviation (Ey), orientation error (Eθ), and iteration count. Results show that Q-Learning generally achieves faster convergence and greater stability, particularly with higher γ and λ values, while SARSA can achieve competitive accuracy with careful parameter tuning. The findings demonstrate that eligibility traces substantially improve learning precision and provide practical guidelines for robust RL-based control in multi-legged robotic systems.
AB - Hexapod robots, with their six-legged design, excel in stability and adaptability on challenging terrain but pose significant control challenges due to their high degrees of freedom. While reinforcement learning (RL) has been explored for robot navigation, few studies have systematically compared on-policy and off-policy methods for multi-legged locomotion. This work presents a comparative study of SARSA and Q-Learning for trajectory control of a simulated hexapod robot, focusing on the influence of learning rate (α), discount factor (γ), and eligibility trace (λ). The evaluation spans eight initial poses, with performance measured through lateral deviation (Ey), orientation error (Eθ), and iteration count. Results show that Q-Learning generally achieves faster convergence and greater stability, particularly with higher γ and λ values, while SARSA can achieve competitive accuracy with careful parameter tuning. The findings demonstrate that eligibility traces substantially improve learning precision and provide practical guidelines for robust RL-based control in multi-legged robotic systems.
KW - Hexapod robot
KW - Learning parameters
KW - Q-learning
KW - Reinforcement learning
KW - SARSA
KW - Trajectory control
UR - https://www.scopus.com/pages/publications/105019790942
U2 - 10.1007/s41315-025-00496-6
DO - 10.1007/s41315-025-00496-6
M3 - Article
AN - SCOPUS:105019790942
SN - 2366-5971
JO - International Journal of Intelligent Robotics and Applications
JF - International Journal of Intelligent Robotics and Applications
ER -