hh.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Shrink and Eliminate: A Study of Post-Training Quantization and Repeated Operations Elimination in RNN Models
Halmstad University, School of Information Technology.ORCID iD: 0000-0002-4674-3809
Umeå University, Umeå, Sweden.ORCID iD: 0000-0002-0562-2082
Halmstad University, School of Information Technology.ORCID iD: 0000-0002-4932-4036
2022 (English)In: Information, E-ISSN 2078-2489, Vol. 13, no 4, article id 176Article in journal (Refereed) Published
Abstract [en]

Recurrent neural networks (RNNs) are neural networks (NN) designed for time-series applications. There is a growing interest in running RNNs to support these applications on edge devices. However, RNNs have large memory and computational demands that make them challenging to implement on edge devices. Quantization is used to shrink the size and the computational needs of such models by decreasing weights and activation precision. Further, the delta networks method increases the sparsity in activation vectors by relying on the temporal relationship between successive input sequences to eliminate repeated computations and memory accesses. In this paper, we study the effect of quantization on LSTM-, GRU-, LiGRU-, and SRU-based RNN models for speech recognition on the TIMIT dataset. We show how to apply post-training quantization on these models with a minimal increase in the error by skipping quantization of selected paths. In addition, we show that the quantization of activation vectors in RNNs to integer precision leads to considerable sparsity if the delta networks method is applied. Then, we propose a method for increasing the sparsity in the activation vectors while minimizing the error and maximizing the percentage of eliminated computations. The proposed quantization method managed to com-press the four models more than 85%, with an error increase of 0.6, 0, 2.1, and 0.2 percentage points, respectively. By applying the delta networks method to the quantized models, more than 50% of the operations can be eliminated, in most cases with only a minor increase in the error. Comparing the four models to each other under the quantization and delta networks method, we found that compressed LSTM-based models are the most-optimum solutions at low-error-rates constraints. The compressed SRU-based models are the smallest in size, suitable when higher error rates are acceptable, and the compressed LiGRU-based models have the highest number of eliminated operations. © 2022 by the authors. Licensee MDPI, Basel, Switzerland.

Place, publisher, year, edition, pages
Basel: MDPI, 2022. Vol. 13, no 4, article id 176
Keywords [en]
delta networks, edge devices, quantization, recurrent neural network
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:hh:diva-46752DOI: 10.3390/info13040176ISI: 000786262400001PubMedID: 34789458Scopus ID: 2-s2.0-85128393517OAI: oai:DiVA.org:hh-46752DiVA, id: diva2:1655340
Funder
ELLIIT - The Linköping‐Lund Initiative on IT and Mobile CommunicationsAvailable from: 2022-05-02 Created: 2022-05-02 Last updated: 2022-11-23Bibliographically approved
In thesis
1. Deep Learning on the Edge: A Flexible Multi-level Optimization Approach
Open this publication in new window or tab >>Deep Learning on the Edge: A Flexible Multi-level Optimization Approach
2022 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Recent advances in Deep Learning (DL) research have been adopted in a wide variety of applications, including autonomous driving, AI in health care, and smart homes. In parallel, research in high-performance embedded computing has resulted in advanced hardware platforms that offer enhanced performance and energy efficiency for demanding computations. However, the high demands of DL models for computational and memory resources are still a challenge for embedded computing. Algorithmic optimizations can be used to reduce the computational and memory requirements of DL models. Hardware implementations and architectures can also be tuned to support DL applications’ requirements. This thesis identifies that insufficient coordination between hardware implementations and models’ optimizations limits the efficiency of the resulting implementations. In addition, the implementation methods themselves suffer from poor flexibility in adapting to changes in the model and application constraints. The overarching theme of this thesis is to study and propose methods for the efficient and flexible implementation of DL models on embedded platforms. The work in this thesis bridges the gap between DL models’ algorithmic optimizations and embedded platforms’ hardware-specific optimizations, and investigates the features that need support from DL domain-specific architectures. In addition, a method for multi-objective quantization of DL models is proposed to address both the model error and platform performance metrics. Post-training optimization techniques are employed to facilitate the multiobjective optimization of the models because they do not require retraining after model optimization. This thesis also reviews the optimization methods that are known to have been applied to improve the implementation efficiency of DL models. It highlights the most fruitful optimizations found in existing, highly efficient implementations, and applies them in the proposed methods. A method for mapping Convolution Neural Networks (CNN) on Epiphany, a manycore architecture, is proposed and evaluated. A method for quantization and approximation for RNN models in a post-training fashion is also proposed, and evaluated on four RNN models. The proposed quantization method is used in a hardware-aware multi-objective optimization for RNN models to be deployed on SiLago and Bit fusion architectures.

Place, publisher, year, edition, pages
Halmstad: Halmstad University Press, 2022. p. 65
Series
Halmstad University Dissertations ; 95
National Category
Embedded Systems
Identifiers
urn:nbn:se:hh:diva-48680 (URN)978-91-89587-03-8 (ISBN)978-91-89587-02-1 (ISBN)
Public defence
2022-12-15, Halda, Visionen, Kristian IV:s, väg 3, Halmstad, 11:58 (English)
Opponent
Supervisors
Note

Kompletteras med LibrisID när tillgängligt.

Available from: 2022-11-24 Created: 2022-11-23 Last updated: 2022-11-24Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textPubMedScopus

Authority records

Rezk, Nesma M.Ul-Abdin, Zain

Search in DiVA

By author/editor
Rezk, Nesma M.Nordström, TomasUl-Abdin, Zain
By organisation
School of Information Technology
In the same journal
Information
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar

doi
pubmed
urn-nbn

Altmetric score

doi
pubmed
urn-nbn
Total: 45 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf