Fp32 vs fp64

1/6/2024

The original software-based compare-and-swap method (CAS) was shown to be inefficient due to high intra-warp thread contention, whereas the improved software-based warp-aggregated method (WAG) and Kahan summation method (KAS) eliminated the thread contention and performed very well on Kepler and Maxwell GPUs, being more than 13 times faster than CAS in our tests. This paper discusses several atomic-add tally methods with reduced numerical errors used throughout ARCHER development. However, the complexity lies in the fact that some GPUs (Nvidia GPUs prior to the Pascal generation all current AMD GPUs) do not readily offer such double-precision function at hardware level, and that software emulation is too slow to use if not optimized properly. To more » mitigate this problem, the least intrusive solution in theory is to replace the single-precision atomic-add tally function with a double-precision version. It has been known that calculation using single-precision is more prone to numerical round-off errors, especially when a single tally data is accumulated 'atomically' and repeatedly by thousands of GPU threads. The majority of these studies adopted single-precision floating point format because of the higher peak floating point operations per second (FLOPS) the GPUs can deliver than double-precision. GPU implementation of Monte Carlo radiation transport for dose calculations has been reported by many investigators. Over the past several years, the graphics processing unit (GPU) technology has rapidly gained ground in scientific computing due to its outstanding performance and programmability. Mathematical, Physical and Engineering Sciences Additional Journal Information: Journal Volume: 476 Journal Issue: 2243 Journal ID: ISSN 1364-5021 Publisher: The Royal Society Publishing Country of Publication: United States Language: English Subject: 97 MATHEMATICS AND COMPUTING half precision arithmetic mixed precision solvers LU factorization iterative refinement GMRES GPU = , (ORNL), Oak Ridge, TN (United States) Sponsoring Org.: USDOE Office of Science (SC) OSTI Identifier: 1787013 Grant/Contract Number: EP/P020720/1 Departtment of Energy 17-SC-20-SC NVIDIA Resource Type: Journal Article: Accepted Manuscript Journal Name: Proceedings of the Royal Society. Publication Date: Wed Nov 25 00:00: Research Org.: Oak Ridge National Lab. Computer Science and Mathematics Division Univ. of Electrical Engineering and Computer Science Oak Ridge National Lab. of Tennessee, Knoxville, TN (United States). of Electrical Engineering and Computer Science

NVIDIA, Santa Clara, CA (United States).On the NVIDIA Quadro GV100 (Volta) GPU, we achieve a 4×-5× performance increase and 5× better energy efficiency versus the standard FP64 implementation while maintaining an FP64 level of numerical stability. We also show how to efficiently handle systems with multiple right-hand sides. The techniques we employ include multiprecision LU factorization, the preconditioned generalized minimal residual algorithm (GMRES), and scaling and auto-adaptive rounding to avoid overflow. We show how the FP16/FP32 Tensor Cores on NVIDIA GPUs can be exploited to accelerate the solution of linear systems of equations Ax = b without sacrificing numerical stability. A primary challenge in high-performance computing is to leverage reduced-precision and mixed-precision hardware. In recent years, machine learning has motivated hardware support for half-precision floating-point arithmetic. Problem complexity and the sheer volume of data coming from various instruments and sensors motivate researchers to mix and match various approaches to optimize compute resources, including different levels of floating-point precision. Double-precision floating-point arithmetic (FP64) has been the de facto standard for engineering and scientific simulations for several decades.

0 Comments

Fp32 vs fp64

Leave a Reply.

Author

Archives

Categories