泥瓦匠 发表于 2025-3-23 11:44:25
Multi-workgroup Tiling to Improve the Locality of Explicit One-Step Methods for ODE Systems with Limited Access Distance on GPUse locality of memory references important. We exploit the limited access distance, which is a property of a large class of right-hand-side functions, to enable hexagonal or trapezoidal tiling across the stages of the ODE method. Since previous work showed that the traditional approach of launching omonopoly 发表于 2025-3-23 15:21:33
Structure-Aware Calculation of Many-Electron Wave Function Overlaps on Multicore Processorselectron wave function overlaps, yielding a considerable reduction of the theoretical cost. The resulting enhanced algorithm is embarrassingly parallel and our comparison against the (embarrassingly parallel version of) original algorithm, on a computer node with 40 physical cores, shows acceleratiobabble 发表于 2025-3-23 19:41:00
http://reply.papertrans.cn/75/7411/741025/741025_13.pngdendrites 发表于 2025-3-23 22:13:11
High Performance Tensor–Vector Multiplication on Shared-Memory Systemsntation of this bandwidth-bound operation. Here, we investigate its efficient, shared-memory implementations. Upon carefully analyzing the design space, we implement a number of alternatives using OpenMP and compare them experimentally. Experimental results on up to 8 socket systems show near peak p烦扰 发表于 2025-3-24 02:43:24
Efficient Modular Squaring in Binary Fields on CPU Supporting AVX and GPUbit-slicing methodology with a view to maximizing the advantage of . (SIMD) and . (SIMT) execution patterns. The developed implementation of modular squaring was adjusted to testing for the irreducibility of binary polynomials of some particular forms.GLUT 发表于 2025-3-24 09:16:36
Parallel Robust Computation of Generalized Eigenvectors of Matrix Pencilsan be solved using substitution. In practice, substitution is vulnerable to floating-point overflow. The robust solvers . in LAPACK prevent overflow by dynamically scaling the eigenvectors. These subroutines are scalar and sequential codes which compute the eigenvectors one by one. In this paper, we使更活跃 发表于 2025-3-24 14:30:03
http://reply.papertrans.cn/75/7411/741025/741025_17.pnginterior 发表于 2025-3-24 18:39:07
http://reply.papertrans.cn/75/7411/741025/741025_18.pngIncisor 发表于 2025-3-24 20:18:38
http://reply.papertrans.cn/75/7411/741025/741025_19.pngBrochure 发表于 2025-3-25 02:45:04
Parallel Performance of an Iterative Solver Based on the Golub-Kahan Bidiagonalizationture. We focus in particular on our recent implementation of the algorithm using the parallel numerical library PETSc. Since the algorithm is a nested solver, we investigate different choices for parallel inner solvers and show its strong scalability for two Stokes test problems. The algorithm is fo