GTC ON-DEMAND

 
SEARCH SESSIONS
SEARCH SESSIONS

Search All
 
Refine Results:
 
Year(s)

SOCIAL MEDIA

EMAIL SUBSCRIPTION

 
 

GTC ON-DEMAND

Presentation
Media
Abstract:
We will present optimizations that increase performance of overlap-and-save calculations of linear convolution using shared memory FFT. The overlap-and-save method is used when convolution of a long signal with many filters is required. We'll explain how we implemented custom FFT, which uses shared memory, to eliminate most of the device memory transfers normally required when calculating convolution. We'll show how we achieved significant impact for certain problem sizes.
We will present optimizations that increase performance of overlap-and-save calculations of linear convolution using shared memory FFT. The overlap-and-save method is used when convolution of a long signal with many filters is required. We'll explain how we implemented custom FFT, which uses shared memory, to eliminate most of the device memory transfers normally required when calculating convolution. We'll show how we achieved significant impact for certain problem sizes.  Back
 
Topics:
Performance Optimization, Accelerated Data Science
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9352
Streaming:
Download:
Share:
 
Abstract:
The Square Kilometre Array is a planned next-generation radio telescope. It will be used to answer fundamental questions such as what is dark energy and dark matter? Is Einstein's theory of general relativity correct? And, are we alone in the universe? To answer such questions the telescope must collect vast amounts of data. This data needs to pass through complex signal processing pipelines for science products to be extracted. This talk will introduce SKA and the science it aims to achieve and discuss how GPUs can be used to achieve this.We'll discuss current advances in the AstroAccelerate software package, which is GPU-Enabled and written in CUDA. AstroAccelerate focuses on enabling real-time processing of time-domain radio-astronomy data.
The Square Kilometre Array is a planned next-generation radio telescope. It will be used to answer fundamental questions such as what is dark energy and dark matter? Is Einstein's theory of general relativity correct? And, are we alone in the universe? To answer such questions the telescope must collect vast amounts of data. This data needs to pass through complex signal processing pipelines for science products to be extracted. This talk will introduce SKA and the science it aims to achieve and discuss how GPUs can be used to achieve this.We'll discuss current advances in the AstroAccelerate software package, which is GPU-Enabled and written in CUDA. AstroAccelerate focuses on enabling real-time processing of time-domain radio-astronomy data.  Back
 
Topics:
Astronomy & Astrophysics
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9286
Streaming:
Download:
Share:
 
Abstract:
We present our implementation of a polyphase filter for real-time data processing in radio astronomy. The polyphase filter is a standard tool in digital signal processing and as such a well established algorithm. We describe our implementation of the polyphase filter algorithm. We have implemented the polyphase filter on three generations of NVIDIA GPU cards (Fermi, Kepler, Maxwell), on the Intel Xeon CPU, and Xeon Phi (Knights Corner) platforms. All of our implementations aim to exploit the potential for data reuse that the algorithm offers. Our GPU implementations explore two different methods for achieving this. The first makes use of L1/Texture cache, the second uses shared memory. We present our results in terms of the sample rate that can be processed per second.
We present our implementation of a polyphase filter for real-time data processing in radio astronomy. The polyphase filter is a standard tool in digital signal processing and as such a well established algorithm. We describe our implementation of the polyphase filter algorithm. We have implemented the polyphase filter on three generations of NVIDIA GPU cards (Fermi, Kepler, Maxwell), on the Intel Xeon CPU, and Xeon Phi (Knights Corner) platforms. All of our implementations aim to exploit the potential for data reuse that the algorithm offers. Our GPU implementations explore two different methods for achieving this. The first makes use of L1/Texture cache, the second uses shared memory. We present our results in terms of the sample rate that can be processed per second.  Back
 
Topics:
Astronomy & Astrophysics, Signal and Audio Processing
Type:
Poster
Event:
GTC Silicon Valley
Year:
2016
Session ID:
P6281
Download:
Share:
 
 
Previous
  • Amazon Web Services
  • IBM
  • Cisco
  • Dell EMC
  • Hewlett Packard Enterprise
  • Inspur
  • Lenovo
  • SenseTime
  • Supermicro Computers
  • Synnex
  • Autodesk
  • HP
  • Linear Technology
  • MSI Computer Corp.
  • OPTIS
  • PNY
  • SK Hynix
  • vmware
  • Abaco Systems
  • Acceleware Ltd.
  • ASUSTeK COMPUTER INC
  • Cray Inc.
  • Exxact Corporation
  • Flanders - Belgium
  • Google Cloud
  • HTC VIVE
  • Liqid
  • MapD
  • Penguin Computing
  • SAP
  • Sugon
  • Twitter
Next