泥瓦匠 发表于 2025-3-23 11:44:25

Multi-workgroup Tiling to Improve the Locality of Explicit One-Step Methods for ODE Systems with Limited Access Distance on GPUse locality of memory references important. We exploit the limited access distance, which is a property of a large class of right-hand-side functions, to enable hexagonal or trapezoidal tiling across the stages of the ODE method. Since previous work showed that the traditional approach of launching o

monopoly 发表于 2025-3-23 15:21:33

Structure-Aware Calculation of Many-Electron Wave Function Overlaps on Multicore Processorselectron wave function overlaps, yielding a considerable reduction of the theoretical cost. The resulting enhanced algorithm is embarrassingly parallel and our comparison against the (embarrassingly parallel version of) original algorithm, on a computer node with 40 physical cores, shows acceleratio

babble 发表于 2025-3-23 19:41:00

http://reply.papertrans.cn/75/7411/741025/741025_13.png

dendrites 发表于 2025-3-23 22:13:11

High Performance Tensor–Vector Multiplication on Shared-Memory Systemsntation of this bandwidth-bound operation. Here, we investigate its efficient, shared-memory implementations. Upon carefully analyzing the design space, we implement a number of alternatives using OpenMP and compare them experimentally. Experimental results on up to 8 socket systems show near peak p

烦扰 发表于 2025-3-24 02:43:24

Efficient Modular Squaring in Binary Fields on CPU Supporting AVX and GPUbit-slicing methodology with a view to maximizing the advantage of . (SIMD) and . (SIMT) execution patterns. The developed implementation of modular squaring was adjusted to testing for the irreducibility of binary polynomials of some particular forms.

GLUT 发表于 2025-3-24 09:16:36

Parallel Robust Computation of Generalized Eigenvectors of Matrix Pencilsan be solved using substitution. In practice, substitution is vulnerable to floating-point overflow. The robust solvers . in LAPACK prevent overflow by dynamically scaling the eigenvectors. These subroutines are scalar and sequential codes which compute the eigenvectors one by one. In this paper, we

使更活跃 发表于 2025-3-24 14:30:03

http://reply.papertrans.cn/75/7411/741025/741025_17.png

interior 发表于 2025-3-24 18:39:07

http://reply.papertrans.cn/75/7411/741025/741025_18.png

Incisor 发表于 2025-3-24 20:18:38

http://reply.papertrans.cn/75/7411/741025/741025_19.png

Brochure 发表于 2025-3-25 02:45:04

Parallel Performance of an Iterative Solver Based on the Golub-Kahan Bidiagonalizationture. We focus in particular on our recent implementation of the algorithm using the parallel numerical library PETSc. Since the algorithm is a nested solver, we investigate different choices for parallel inner solvers and show its strong scalability for two Stokes test problems. The algorithm is fo
页: 1 [2] 3 4
查看完整版本: Titlebook: Parallel Processing and Applied Mathematics; 13th International C Roman Wyrzykowski,Ewa Deelman,Konrad Karczewski Conference proceedings 20