hh.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
A Conjugate Residual Solver with Kernel Fusion for massive MIMO Detection
Halmstad University, School of Information Technology, Center for Applied Intelligent Systems Research (CAISR).
2023 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
Abstract [en]

This thesis presents a comparison of a GPU implementation of the Conjugate Residual method as a sequence of generic library kernels against implementations ofthe method with custom kernels to expose the performance gains of a keyoptimization strategy, kernel fusion, for memory-bound operations which is to makeefficient reuse of the processed data.

For massive MIMO the iterative solver is to be employed at the linear detection stageto overcome the computational bottleneck of the matrix inversion required in theequalization process, which is 𝒪(𝑛3) for direct solvers. A detailed analysis of howone more of the Krylov subspace methods that is feasible for massive MIMO can beimplemented on a GPU as a unified kernel is given.

Further, to show that kernel fusion can improve the execution performance not onlywhen the input data is large matrices-vectors as in scientific computing but also inthe case of massive MIMO and possibly similar cases where the input data is a largenumber of small matrices-vectors that must be processed in parallel.In more details, focusing on the small number of iterations required for the solver toachieve a close enough approximation of the exact solution in the case of massiveMIMO, and the case where the number of users matches the size of a warp, twodifferent approaches that allow to fully unroll the algorithm and gradually fuse allthe separate kernels into a single, until reaching a top-down hardcodedimplementation are proposed and tested.

Targeting to overcome the algorithms computational burden which is the matrixvector product, further optimization techniques such as two ways to utilize the faston-chip memories, preloading the matrix in shared memory and preloading thevector in shared memory, are tested and proposed to achieve high efficiency andhigh parallelism.

Place, publisher, year, edition, pages
2023.
Keywords [en]
MIMO, massive MIMO, GPU, CUDA, Software Defined Radio, SDR, MMSE, ZF, zero-forcing, parallel detection, iterative methods, conjugate residual, parallel computing, kernel fusion
National Category
Embedded Systems
Identifiers
URN: urn:nbn:se:hh:diva-50350OAI: oai:DiVA.org:hh-50350DiVA, id: diva2:1751227
Subject / course
Computer science and engineering
Educational program
Master's Programme in Embedded and Intelligent Systems, 120 credits
Supervisors
Examiners
Available from: 2023-04-15 Created: 2023-04-17 Last updated: 2023-04-18Bibliographically approved

Open Access in DiVA

fulltext(1292 kB)194 downloads
File information
File name FULLTEXT02.pdfFile size 1292 kBChecksum SHA-512
d42733d7e519e78e8a86882f7c2018ffbb7a16b17215dedce1c538bc4421914410eb062a9915e621d111b4ff1b92d3dc4265a1a0d4c2f75fd4e1e178e3cfbad2
Type fulltextMimetype application/pdf

By organisation
Center for Applied Intelligent Systems Research (CAISR)
Embedded Systems

Search outside of DiVA

GoogleGoogle Scholar
Total: 194 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 473 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf