Self‐play learning for two‐player score-based board games

Please use this identifier to cite or link to this item: https://www.um.edu.mt/library/oar/handle/123456789/115272

Title:	Self‐play learning for two‐player score-based board games
Authors:	Cutajar, Cristina (2023)
Keywords:	Board games Reinforcement learning
Issue Date:	2023
Citation:	Cutajar, C. (2023). Self‐play learning for two‐player score-based board games (Bachelor's dissertation).
Abstract:	Jaipur is a two‐player score‐based board game where players play in turns with the intention of winning by having the most points at the end of the game. The aim of this work is to implement a reinforcement learning algorithm on this board game. This game contains multiple factors which make self‐play learning challenging, such as being partially observable, having stochastic actions, and having a very large action space of 25,469 possible actions. Moreover, two other challenging factors present in this game are that it contains both immediate and long‐term rewards, and the players have the possibility of adopting different strategies as the game is adversarial. Research was performed on how reinforcement learning was tackled in similar games, as there is no publicly available research on how such a problem was solved with Jaipur. The DQN algorithm was implemented on various Atari games on which it achieved state‐of‐the‐art results. Moreover, Actor‐Critic algorithms such as PPO and A2C were implemented on multi‐player board games where they achieved satisfactory results. Techniques such as removing invalid actions or representing multiple actions into a single action were used to handle large action spaces. This work presents the implementation of the rules of Jaipur along with a reinforcement learning environment containing the appropriate observation and action spaces. An action mask was also included in the environment to mask out the invalid actions based on the state of the game. A reinforcement learning library was used to apply the PPO, A2C, DQN and DDQN algorithms to the environment. Each algorithm’s performance was evaluated quantitatively against typical Jaipur scores, and qualitatively by checking which actions each agent was selecting. All the algorithms obtained good quantitative results however, the PPO, DQN and DDQN algorithms obtained the best qualitative results.
Description:	B.Sc. IT (Hons)(Melit.)
URI:	https://www.um.edu.mt/library/oar/handle/123456789/115272
Appears in Collections:	Dissertations - FacICT - 2023 Dissertations - FacICTAI - 2023

Files in This Item:

File	Description	Size	Format
2308ICTICT390900014918_1.PDF Restricted Access		5 MB	Adobe PDF	View/Open Request a copy

Show full item record Statistics