拖网 发表于 2025-3-28 15:26:35
http://reply.papertrans.cn/24/2336/233539/233539_41.png包租车船 发表于 2025-3-28 20:18:31
Language-Extension-Based Vectorizing Compiling Scheme on SDR-DSP We use LEVCS to vectorize five benchmark kernels: Fast Fourier Transform (FFT), Finite Impulse Responsefilter (FIR) and Infinite Impulse Response filter (IIR), Dot product implementation (Dotprod), Sum of vectors (vecsum). Experiment results show that LEVCS is functional correct and can achieve 2.883–8.074 speedups comparing to TI-DSPs.fulcrum 发表于 2025-3-29 00:26:42
A Dynamic Multi-precision Fixed-Point Data Quantization Strategy for Convolutional Neural Network2% to 5.9% at most, compared with previous static quantization strategy, when 8/4-bit quantization is used. When 16-bit quantization is used, only 0.03% accuracy loss is introduced by our quantization strategy with half memory footprint and bandwidth requirement comparing with 32-bit floating-point implementation.Bucket 发表于 2025-3-29 05:01:12
http://reply.papertrans.cn/24/2336/233539/233539_44.pngFulminate 发表于 2025-3-29 07:22:03
http://reply.papertrans.cn/24/2336/233539/233539_45.png光明正大 发表于 2025-3-29 12:33:50
Monaural Speech Separation on Many Integrated Core Architecturehitecture to meet the requirement of real-time speech separation. This approach conducts parallelism based on the OpenMP technology, and performs the computing intensitive matrix manipulations on a MIC coprocessor. The experimental results confirm the efficiency of our implementation of monaural speech separation on MIC architecture.地名表 发表于 2025-3-29 18:49:30
Single/Double Precision Floating-Point Division and Square Root Unit Based on SRT-8 Algorithmde the latency of look-up table, generating fast addend was used to decrease critical path, and “On-the-fly” conversion was employed for saving area-cost. Experimental results show that our proposed design can achieve low latency and low hardware overhead.大气层 发表于 2025-3-29 23:45:04
A Methodology for Performance Verification of Microprocessorstion and RTL simulation based benchmarks are made at the core-level. Prototyping and counter-based performance analysis systems are built in the system level. An example is given to demonstrate the application and effectiveness of the proposed methodology.朴素 发表于 2025-3-30 02:50:33
http://reply.papertrans.cn/24/2336/233539/233539_49.png草率男 发表于 2025-3-30 05:18:52
A New DVFS Algorithm Design for Multi-core Processor Chiptional single-threshold algorithm, experimental results show that dual-threshold adaptive DVFS can save more power with no obviously performance reduction. The performance of most benchmarks is beyond 90% of the original performance, while the power optimization can be up to 35%.