Combining off-policy and on-policy reinforcement learning for dynamic control of nonlinear systems

Ahmed, Hani Hazza A.; Fabri, Simon G.; Bugeja, Marvin K.; Camilleri, Kenneth P.

Please use this identifier to cite or link to this item: https://www.um.edu.mt/library/oar/handle/123456789/138993

Title:	Combining off-policy and on-policy reinforcement learning for dynamic control of nonlinear systems
Authors:	Ahmed, Hani Hazza A. Fabri, Simon G. Bugeja, Marvin K. Camilleri, Kenneth P.
Keywords:	Reinforcement learning Machine learning Algorithms -- Mathematical models Nonlinear systems Python (Computer program language)
Issue Date:	2025-10
Publisher:	SCITEVENTS
Citation:	Ahmed, H. H.A., Fabri, S. G., Bugeja, M. K., & Camilleri, K. (2025, October). Combining off-policy and on-policy reinforcement learning for dynamic control of nonlinear systems. ICINCO 2025 - 22nd International Conference on Informatics in Control, Automation and Robotics, Marbella, Spain. 387-394.
Abstract:	This paper introduces QARSA, a novel reinforcement learning algorithm that combines the strengths of off-policy and on-policy methods, specifically Q-learning and SARSA, for the dynamic control of nonlinear systems. Designed to leverage the sample efficiency of off-policy learning while preserving the stability and lower variance of on-policy approaches, QARSA aims to offer a balanced and robust learning framework. The algorithm is evaluated on the CartPole-v1 simulation environment using the OpenAI Gym framework, with performance compared against standalone Q-learning and SARSA implementations. The comparison is based on three critical metrics: average reward, stability, and sample efficiency. Experimental results demonstrate that QARSA outperforms both Q-learning and SARSA, achieving higher average rewards, stability, sample efficiency, and improved consistency in learned policies. These results demonstrate QARSA’s effectiveness in environments were maximizing long-term performance while maintaining learning stability is crucial. The study provides valuable insights for the design of hybrid reinforcement learning algorithms for continuous control tasks.
URI:	https://www.um.edu.mt/library/oar/handle/123456789/138993
Appears in Collections:	Scholarly Works - FacEngSCE

Files in This Item:

File	Description	Size	Format
Combining off policy and on policy reinforcement learning for dynamic control of nonlinear systems 2025.pdf		501.01 kB	Adobe PDF	View/Open

Show full item record Statistics