SEARCH SESSIONS

Search All
 
Refine Results:
 
Year(s)

SOCIAL MEDIA

EMAIL SUBSCRIPTION

 
 

GTC On-Demand

Developer - Algorithms
Presentation
Media
Optimized LU-decomposition with Full Pivot for Small Batched Matrices
Ian Wainwright (High Performance Consulting)
The goal of this session is to show various optimization techniques and their actual performance benefits for matrices of roughly one warp''s width in size along one dimension. Where as a very large matrix can be mapped to one multi-bloc ...Read More

The goal of this session is to show various optimization techniques and their actual performance benefits for matrices of roughly one warp''s width in size along one dimension. Where as a very large matrix can be mapped to one multi-block kernel, and very small matrices can be mapped to one thread, matrices which lie in the space in-between, such as 32x32, require different mapping techniques. We will look at the performance benefits of warp-width-mapping using warp-shuffle, mapping multiple matrices to one warp, benefits of preferring L1 cache instead of shared memory, and aggressing loop unrolling.

  Back
 
Keywords:
Developer - Algorithms, GTC 2013 - ID S3069
Streaming:
Download:
 
 
NVIDIA - World Leader in Visual Computing Technologies
Copyright © 2017 NVIDIA Corporation Legal Info | Privacy Policy