20th AIAI 2024, 27 - 30 June 2024, Corfu, Greece

Lightweight Inference by Neural Network Pruning: Accuracy, Time and Comparison

Ilias Paralikas, Sotirios Spantideas, Anastasios Giannopoulos, Panagiotis Trakadas

Abstract:

  This paper addresses the application of neural networks in resource constrained edge-devices. The goal is to achieve a speedup both in inference and training time, with minimal accuracy loss. More specifically, it brings to light the need for compressing current models, which are mostly developed with access to more resources that the device that the model will potential run on. With the recent advances of Internet of Things(IoT) the number of devices has and is expected to rise . Not only are these devices computationally limited, but their capabilities are nor homogeneous nor predictable at the time of the development of a model, as new devices can be added anytime. This creates the need to quickly and efficiently produce models that fit each devices specifications. Transfer learning is a very efficient method, in terms of training time, but confines the user to the dimensionality of the pretrained model. Pruning is used as a way to overcome this obstacle and carry over knowledge to a variety of model, that differ in size. The aim of this paper is to serve as an introduction to pruning as a concept, as a template for further research, quantify the efficiency of a variety of methods and expose some of it's limitations. Pruning was performed on a telecommunications anomaly dataset and the results were compared to a baseline, in regards to speed and accuracy.  

*** Title, author list and abstract as seen in the Camera-Ready version of the paper that was provided to Conference Committee. Small changes that may have occurred during processing by Springer may not appear in this window.