condemn 发表于 2025-3-28 15:15:29
Task-Based Sparse Hybrid Linear Solver for Distributed Memory Heterogeneous Architecturesumerical, scientific libraries have been ported on such architectures. In this paper, we propose to extend a sparse hybrid solver for handling distributed memory heterogeneous platforms. As in the original solver, we perform a domain decomposition and associate one subdomain with one MPI process. Ho羞辱 发表于 2025-3-28 22:26:47
Automatic Generation of OpenCL Code for ARM ArchitecturesSoC) makes necessary a very specific knowledge of their hardware in order to harness their full potential. OpenCL is a well known standard for cross-platform usage of accelerator devices. We follow an annotation-based approach for solving the problem of high development cost of OpenCL programming fodefendant 发表于 2025-3-29 00:49:05
Workflow Performance Profiles: Development and Analysisameter sweep manner, collecting performance information about each workflow task, and analysis of the collected data with statistical learning methods. The main goal of this work is to increase the understanding about the performance of studied workflows in a systematic and predictable way. The evalACRID 发表于 2025-3-29 04:34:35
A Data-Parallel ILUPACK for Sparse General and Symmetric Indefinite Linear Systemsnumber of iterative solvers have been developed, among which ILUPACK integrates an inverse-based multilevel ILU preconditioner with appealing numerical properties. In this paper, we enhance the computational performance of ILUPACK by off-loading the execution of several key computational kernels tobonnet 发表于 2025-3-29 09:07:52
http://reply.papertrans.cn/32/3166/316537/316537_45.png广大 发表于 2025-3-29 14:42:38
A Context-Aware Primitive for Nested Recursive Parallelismilizing the concept of tasks have been widely adapted. However, the provided abstract task creation and synchronization interfaces force corresponding implementations to focus their attention to individual task creation and synchronization points – unaware of their relation to each other – thereby lmonopoly 发表于 2025-3-29 16:16:11
Achieving High Parallel Efficiency on Modern Processors for X-Ray Scattering Data Analysise-data (SIMD) parallelisms. The former is typically available through multiple compute cores and the latter through long vector units. In this paper, we consider several compute kernels of a real-world scientific application, X-ray scattering data analysis, to demonstrate and analyze high performancatrophy 发表于 2025-3-29 20:55:09
Exploiting a Parametrized Task Graph Model for the Parallelization of a Sparse Direct Multifrontal Sues of parallel software engineering. One of the most promising approaches consists in abstracting an application as a directed acyclic graph (DAG) of tasks. While this approach has been popularized for shared memory environments by the OpenMP 4.0 standard where dependencies between tasks are automaODIUM 发表于 2025-3-30 00:34:40
http://reply.papertrans.cn/32/3166/316537/316537_49.pngAura231 发表于 2025-3-30 05:45:46
http://reply.papertrans.cn/32/3166/316537/316537_50.png