Experiences of the GPU Thread Configuration and Shared Memory

DaeHwan Kim

doi:10.24018/ejeng.2018.3.7.788

Research Article

Department of Electronics Engineering, Suwon Science College, 288 Seja-ro, Jeongnam-myun, Hwaseong-si, Gyeonggi-do, Rep. of Korea

* Corresponding author

10.24018/ejeng.2018.3.7.788

Read Counter
439

Downloads
377

Citations

Share

Submitted 2018-06-10
Published 2018-07-17

Read counter = 439 times

Abstract

Nowadays, GPU processors are widely used for general-purpose parallel computation applications. In the GPU programming, thread and block configuration is one of the most important decisions to be made, which increases parallelism and hides instruction latency. However, in many cases, it is often difficult to have sufficient parallelism to hide all the latencies, where the high latencies are often caused by the global memory accesses. In order to reduce the number of those accesses, the shared memory is instead used which is much faster than the global memory being located on a chip. The performance of the proposed thread configuration is evaluated on the GPU 960 processor. The experimental result shows that the best configuration improves the performance by 7.3 times compared to the worst configuration in the experiment. The experiences are also discussed for the shared memory performance when compared to that of the global memory.

Keywords: GPU Performance Thread Shared Memory

References

E. S. Larsen, D. McAllister, “Fast matrix multiplies using graphics hardware,” in Proceedings of Supercomputing 2001, Denver, CO, 2001.
Google Scholar

S. Mittal, and J. S. Vetter. “A survey of cpu-gpu heterogeneous computing techniques,” ACM Computing Survey, 47(4), pp.1–35, July 2015.
Google Scholar

J. D. Owens, D. Luebke, N. Govindaraju, M. Harris, J. Kruger, A. E. Lefohn, and T. J. Purcell, “A survey of general-purpose computation on graphics hardware,” in Proceedings of European Association for Computer Graphics, pp. 21–51, 2005.
Google Scholar

J. D. Owens, D. Luebke, N. Govindaraju, M. Harris, J. Kruger, A. E. Lefohn, and T. J. Purcell, “A Survey of General-Purpose Computation on Graphics Hardware,” in Computer Graphics Forum, Volume 26, number 1, pp. 80-113, 2007
Google Scholar

NVIDA, https://www.nvidia.com/content/gpu-applications/PDF/gpu-applications-catalog.pdf
Google Scholar

NVIDIA, CUDA C Programming Guide 8.0, 2017.
Google Scholar

NVIDIA, CUDA C Best Practices Guide 8.0, 2017.
Google Scholar

A. Munshi, The OpenCL Specification, Khronos OpenCL Working Group, version: 1.0, Document Revision:48, 2009.
Google Scholar

D. Kirk, and W. W. Hwu, University of Illinois, “Programming Massively Parallel Processors,” Urbana-Champaign, 2010.
Google Scholar

NVIDIA, https://www.geforce.com/hardware/desktop-gpus/geforce-gtx-960/specifications
Google Scholar

NVIDIA, https://www.nvidia.com/en-us/geforce/ products/10series/titan-x-pascal/
Google Scholar

Downloads

PDF

How to Cite

[1]

2018. Experiences of the GPU Thread Configuration and Shared Memory. European Journal of Engineering and Technology Research. 3, 7 (July 2018), 12–15. DOI:https://doi.org/10.24018/ejeng.2018.3.7.788.

Issue

Vol. 3 No. 7: JULY 2018

[1] E. S. Larsen, D. McAllister, “Fast matrix multiplies using graphics hardware,” in Proceedings of Supercomputing 2001, Denver, CO, 2001.
Google Scholar

[2] S. Mittal, and J. S. Vetter. “A survey of cpu-gpu heterogeneous computing techniques,” ACM Computing Survey, 47(4), pp.1–35, July 2015.
Google Scholar

[3] J. D. Owens, D. Luebke, N. Govindaraju, M. Harris, J. Kruger, A. E. Lefohn, and T. J. Purcell, “A survey of general-purpose computation on graphics hardware,” in Proceedings of European Association for Computer Graphics, pp. 21–51, 2005.
Google Scholar

[4] J. D. Owens, D. Luebke, N. Govindaraju, M. Harris, J. Kruger, A. E. Lefohn, and T. J. Purcell, “A Survey of General-Purpose Computation on Graphics Hardware,” in Computer Graphics Forum, Volume 26, number 1, pp. 80-113, 2007
Google Scholar

[5] NVIDA, https://www.nvidia.com/content/gpu-applications/PDF/gpu-applications-catalog.pdf
Google Scholar

[6] NVIDIA, CUDA C Programming Guide 8.0, 2017.
Google Scholar

[7] NVIDIA, CUDA C Best Practices Guide 8.0, 2017.
Google Scholar

[8] A. Munshi, The OpenCL Specification, Khronos OpenCL Working Group, version: 1.0, Document Revision:48, 2009.
Google Scholar

[9] D. Kirk, and W. W. Hwu, University of Illinois, “Programming Massively Parallel Processors,” Urbana-Champaign, 2010.
Google Scholar

[10] NVIDIA, https://www.geforce.com/hardware/desktop-gpus/geforce-gtx-960/specifications
Google Scholar

[11] NVIDIA, https://www.nvidia.com/en-us/geforce/ products/10series/titan-x-pascal/
Google Scholar

Experiences of the GPU Thread Configuration and Shared Memory

Article Sidebar

Article Main Content

References