代替 发表于 2025-3-27 00:43:37
http://reply.papertrans.cn/43/4264/426343/426343_31.png前奏曲 发表于 2025-3-27 03:05:43
http://reply.papertrans.cn/43/4264/426343/426343_32.pngArrhythmia 发表于 2025-3-27 05:49:31
SIMD Parallel Sparse Matrix-Vector and Transposed-Matrix-Vector Multiplication in DD Precisioning SIMD AVX2. AVX2 requires changing the memory access pattern to allow four consecutive 64-bit elements to be read at once. In our previous research, DD-SpMV in CRS using AVX2 needed non-continuous memory load, processing for the remainder, and the summation of four elements in the AVX2 register.谄媚于人 发表于 2025-3-27 10:50:09
Accelerating the Conjugate Gradient Algorithm with GPUs in CFD Simulationsrom finite volume discretization, we evaluate and optimize the performance of Conjugate Gradient (CG) routines designed for manycore accelerators and compare against an industrial CPU-based implementation. We also investigate how the recent advances in preconditioning, such as iterative Incomplete Cfebrile 发表于 2025-3-27 16:41:29
Performance Analysis of SA-AMG Method by Setting Extracted Near-Kernel Vectorsce by generating small matrices from the original matrix problem. However, the convergence of the method can be further improved by using near-kernel vectors. Our research investigates the effectiveness of using multiple near-kernel vectors and finds the near-kernel vectors that are most important fPtosis 发表于 2025-3-27 21:22:01
http://reply.papertrans.cn/43/4264/426343/426343_36.png矛盾 发表于 2025-3-28 01:34:50
HPC on the Intel Xeon Phi: Homomorphic Word Searchingomorphic encryption allows to produce a cryptogram that encrypts the result of applying some values to any function, even when the input values are encrypted and without access to the private-key. For example, it is possible to search if any word of a set of encrypted words matches a plaintext referIntegrate 发表于 2025-3-28 05:34:45
A Data Parallel Algorithm for Seismic Raytracingn a 3D earth model to sensors used in seismic experiments. An iterative data parallel algorithm is formulated for seismic tomography based on the Bellman-Ford-Moore (BFM) algorithm. Performance is demonstrated for OpenMP on multicore processors and OpenCL on GPUs.别炫耀 发表于 2025-3-28 09:29:06
http://reply.papertrans.cn/43/4264/426343/426343_39.png法律的瑕疵 发表于 2025-3-28 13:09:38
On the Acceleration of Graph500: Characterizing PCIe Overheads with Multi-GPUsst. In order to maximize performance-per-dollar, systems are now being deployed with multiple GPUs in the same node. However, multiple GPUs exacerbate the PCIe overheads by inflicting additional data-movement performance penalties when moving non-local data..In this paper, we first evaluate the PCIe