Energy-Efficient Neuromorphic Computing Systems
Embargo End Date2024-03-09
AdvisorsSalama, Khaled N.
Committee MembersEltawil, Ahmed
Keyes, David E.
Fahmy, Suhaib A.
ProgramElectrical and Computer Engineering
KAUST DepartmentComputer, Electrical and Mathematical Science and Engineering (CEMSE) Division
Access RestrictionsAt the time of archiving, the student author of this dissertation opted to temporarily restrict access to it. The full text of this dissertation will become available to the public after the expiration of the embargo on 2024-03-09.
AbstractNeuromorphic computing has emerged as a new and promising computing principle that emulates how human brains process information. The underlying spiking neural networks (SNNs) are well-known for having higher energy efficiency than artificial neural networks (ANNs). Neuromorphic systems enable highly parallel computation and reduce memory bandwidth limitations, making hardware performance scalable with the ever-increasing model complexities. Inefficiency in designing neuromorphic systems generally originates from redundant parameters, nonoptimized models, lacking computing parallelism, and sequential training algorithms. This dissertation aims to address these problems and propose effective solutions.
Over-parameterization and redundant computations are common problems in neural networks. As the first stage of this dissertation, we introduce various strategies for pruning neurons and weights while training in an unsupervised SNN by exploring neural dynamics and firing activity. Both methods are demonstrated to be effective at network compression and the preservation of good classification performance.
In the second stage of this dissertation, we propose to optimize neuromorphic systems from both algorithmic and hardware perspectives. The network model is optimized from the software level through a biological hyperparameter optimization strategy, resulting in a hardware-friendly network setting. Different computational methods are analyzed to guide hardware implementation. The hardware implementation strategy features distributed neural memory and parallel memory organization. A more than 300× improvement in training speed and 180× reduction in energy are demonstrated in the proposed system compared with a previous study.
Moreover, an efficient on-chip training algorithm is essential for low-energy processing. In the third stage, we dive into the design of local-training-enabled neuromorphic systems, introducing a spatially local backpropagation algorithm. The proposed digital architecture explores spike sparsity, computing parallelism, and parallel training. At the same accuracy level, the design achieves 3.2× lower energy and 1.8× lower latency compared with an ANN. Moreover, the spatially local training mechanism is extended into a temporal dimension using a Backpropagation Through Time–based training algorithm. Local training mechanisms in both dimensions work synergistically to improve algorithmic performance. A significant reduction in computational cost is achieved, including 89.94% in GPU memory, 10.79% in memory access, and 99.64% in MAC operations compared with the standard method.