Multi‐UAV path planning and obstacle avoidance by using deep reinforcement learning

Please use this identifier to cite or link to this item: https://www.um.edu.mt/library/oar/handle/123456789/141844

Title:	Multi‐UAV path planning and obstacle avoidance by using deep reinforcement learning
Authors:	Farrugia, Sean (2025)
Keywords:	Drone aircraft Drone aircraft -- Control systems Reinforcement learning Neural networks (Computer science) Robots -- Control systems Drone aircraft -- Industrial applications
Issue Date:	2025
Citation:	Farrugia, S. (2025). Multi‐UAV path planning and obstacle avoidance by using deep reinforcement learning ( Master’s dissertation).
Abstract:	Unmanned Aerial Vehicles (UAVs) are increasingly applied in areas such as search and rescue, surveillance, and industrial monitoring. However, efficient navigation in unknown environments poses significant challenges due to the dynamic nature of obstacles and limited prior knowledge of the environment. Moreover, solutions that already exist tend to become really complex when the number of UAVs being coordinated increase since it requires more communication between the vehicles. In this thesis we propose a multi‐UAV navigation system that makes use of Deep Reinforcement Learning (DRL) models trained specifically to traverse the UAVs through an unknown dynamic environment. A challenge noted in other literature was that as the number of UAVs being controlled increased, the model’s performance progressively deteriorated. However, rather than training a multi‐agent system like most literature have implemented, it was decided to keep the complexity low regardless the number of UAVs beings used at a time. This was done by first training a DRL model to navigate just a single UAV towards any number of given checkpoints, in an unknown environment with the help of the sensors mounted on the UAV. Then, a separate algorithm was used to efficiently distribute all the checkpoints necessary between all the available UAVs, whilst keeping both the distance travelled and time elapsed as low as possible. Finally, both procedures were combined by deploying multiple instances of the trained DRL model across each UAV, allowing them to traverse the paths generated by the second algorithm. Since reinforcement learning requires a high number of training episodes until it manages to adapt to the given environment, the experiments for this project were conducted in a simulation dedicated to UAVs called Airsim. This simulation is highly realistic and comes with lots of features that make it easy to replicate anything trained in this digital environment onto a real‐life drone. From these experiments, the Proximal Policy Optimization (PPO) algorithm proved to be the best out of the others tested in navigating the single UAV in the simulated environment. Moreover, it was also proven that continuous actions were superior over discrete actions in the DRL model to navigate the UAV, with smoother paths and shorter times. Two local search techniques were used to distribute the checkpoints between the UAVs, those being Tabu Search and Ant Colony Optimization. These findings suggest that the use of deep reinforcement learning in UAV navigation systems can effectively mitigate the challenges posed by dynamic and unknown environments, even if the model was not necessarily trained to control multiple agents.
Description:	M.Sc.(Melit.)
URI:	https://www.um.edu.mt/library/oar/handle/123456789/141844
Appears in Collections:	Dissertations - FacICT - 2025 Dissertations - FacICTAI - 2025

Files in This Item:

File	Description	Size	Format
2519ICTICS520000013648_1.PDF		12.43 MB	Adobe PDF	View/Open

Show full item record Statistics