Learning circuit placement techniques through reinforcement learning with adaptive rewards

Vassallo, Luke; Bajada, Josef

Please use this identifier to cite or link to this item: https://www.um.edu.mt/library/oar/handle/123456789/131782

Title:	Learning circuit placement techniques through reinforcement learning with adaptive rewards
Authors:	Vassallo, Luke Bajada, Josef
Keywords:	Printed circuits -- Design and construction Reinforcement learning -- Mathematical models Simulated annealing (Mathematics) Electronic circuit design -- Automation Markov processes
Issue Date:	2024-03
Publisher:	Institute of Electrical and Electronics Engineers
Citation:	Vassallo, L., & Bajada, J. (2024, March). Learning Circuit Placement Techniques Through Reinforcement Learning with Adaptive Rewards. Proceedings of the 2024 Design, Automation and Test in Europe Conference (DATE), IEEE. Valencia, Spain. 1-6.
Abstract:	Placement is the initial step of Printed Circuit Board (PCB) physical design and demands considerable time and domain expertise. Placement quality impacts the performance of subsequent tasks, and the generation of an optimal placement is known to be, at the very least, NP-complete. While stochastic optimisation and analytic techniques have had some success, they often lack the intuitive understanding of human engineers. In this study, we propose a novel end-to-end Machine Learning (ML) approach to learn fundamental placement techniques and use experience to optimise PCB layouts efficiently. To achieve this, we formulate the PCB placement problem as a Markov Decision Process (MDP) and use Reinforcement Learning (RL) to learn general placement techniques. The agent-driven data collection process generates highly diverse and consistent data points sufficient for learning general policies without expert knowledge under the guidance of an adaptive reward signal. Compared to state-of-the-art simulated annealing approaches on unseen circuits, the resulting policies trained with TD3 and SAC, on average, yield 17% and 21% reduction in post-routing wirelength. Qualitative analysis shows that the policies learn fundamental placement techniques and demonstrate an understanding of the underlying problem dynamics. Collectively, they demonstrate emergent collaborative or competitive behaviours and faster placement convergence, sometimes exceeding an order of magnitude.
URI:	https://www.um.edu.mt/library/oar/handle/123456789/131782
Appears in Collections:	Scholarly Works - FacICTAI

Files in This Item:

File	Description	Size	Format
Learning circuit placement techniques through reinforcement learning with adaptive rewards 2024.pdf Restricted Access		768.27 kB	Adobe PDF	View/Open Request a copy

Show full item record Statistics