The Support Vector Machine (SVM) is a fundamental machine learning algorithm, effective for many classification problems, but with a high computational cost. Moreover, to obtain the best results for a given problem, the SVM meta-parameters need to be tuned, leading to numerous SVM executions and to a huge execution time. We have developed a semi-automatic solution based on OpenACC that allows the use of multiple GPUs for fast and efficient SVM meta-parameter tuning. We present our results on several handwritten digit classification problems.
CUDA-ready clusters enable developers to: Focus on coding, not maintaining infrastructure (drivers, configs) and toolchains (compilers, libraries) Routinely keep pace with innovation - from the latest in GPU hardware to the CUDA toolkit itself Cross-develop with confidence and ease - maintain, and shift between, highly customized CUDA development environments Exercise their preference in programming GPUs - choose CUDA or OpenCL or OpenACC and combine appropriately (with, for example, the Message Passing Interface, MPI) Exploit the convergence of HPC and Big Data Analytics - make simultaneous use HPC and Hadoop services in GPU applications Make use of private and public clouds - create a CUDA-ready cluster in a cloud or extend an on-site CUDA infrastructure into a cloud In this webinar, participants will learn how Bright Cluster Manager provisions, monitors and manages CUDA-ready clusters for developer advantage. Case studies will be used to illustrate all six advantages for Bright developers. Specific attention will be given to: Cross-developing under CUDA 6.0 and CUDA 6.5 with Kepler-architecture GPUs (e.g., the NVIDIA Tesla K80 GPU accelerator) The challenges and opportunities for making use of private (using OpenStack) and public (using Amazon Web Services) clouds in GPU applications
In this session you will learn how to program multie GPU systems or GPU clusters using the Message Passing Interface (MPI) and OpenACC or CUDA. The session starts by giving a quick introduction to MPI and how it can be combined with OpenACC or CUDA and also covers advanced topics like CUDA-aware MPI and how to overlap communication with computation to hide communication times. The latest improvements with CUDA-aware MPI, the Multi Process Service (MPS aka Hyper-Q for MPI) and MPI support in the NVIDIA performance analysis tools are covered.
Learn about the latest developments in MVAPICH2 library that simplifies the task of porting Message Passing Interface (MPI) applications to supercomputing clusters with NVIDIA GPUs. MVAPICH2 supports MPI communication directly from GPU device memory and optimizes it using various features offered by the CUDA toolkit. Various optimizations are integrated transparently under standard MPI API, for better programmability. Recent advances in MVAPICH2 include designs for MPI-3 RMA using GPUDirect RDMA, usage of fast GDRCOPY library, framework for MPI Datatype processing using CUDA kernels, and more. Performance results with micro-benchmarks and applications will be presented using MPI and CUDA/OpenACC. Impact of processor affinity to GPU and network affecting the performance will be presented.
This panel will discuss the current state of GPU programming using compiler directives, such as OpenACC and OpenMP. This session is a forum for discussing both the successes and shortcomings of using compiler directives to program GPUs. The panel will include users, speakers from compiler and tools vendors, and representatives of open source efforts to support directives. Session participants are encouraged to participate in the discussions of this panel.
This session will showcase the results of the inaugural GPU Hackathon held at the Oak Ridge Leadership Computing Facility. The event hosted six teams paired with mentors over a week where applications where ported to GPUs using OpenACC directives. The talk will describe the progress of each team from beginning to end as well as details about their implementation. Best practices, lessons learned as well as anecdotes from mentors who participated in this training event will be shared.
In this webinar, we will bring CUDA into a compute intensive application usin ...Read More
In this webinar, we will bring CUDA into a compute intensive application using Allinea tools. First of all, we will discover Allinea Performance Reports - a great tool to analyze an existing application and determine whether it is appropriate for GPUs or not. If it is, profiling the application is critical to identify the most compute intensive code regions that need to be replaced with CUDA (or OpenACC) implementations. But as the code is being reworked, errors can be introduced. To resolve those profiling and debugging challenges, professional tools such as Allinea Forge are necessary to produce the correct, working, high performance GPU accelerated code with a minimum level of effort. During this technical session, an Allinea expert will illustrate how Allinea Performance Reports and Allinea Forge can help modernize applications very easily.
R is a free software environment that provides a programming language and built-in libraries of mathematics operations for statistics, data analysis, machine learning and much more. In this talk, I will give an overview of applying GPU in R, and focus on three topics. First, I will introduce accelerating R computations by CUDA libraries, including apply drop-in library (nvblas) with zero coding effort, and step-by-step guide how to call CUDA-accelerated libraries such as cuFFT. Second, I am going to show how to accelerate legacy codes by directives (OpenACC), and write up your own CUDA algorithms in R. Third, I will illustrate the way to use CUDA tool chains with R as diverse as nvprof, cuda-memcheck and cuda-debug. Finally, I will present CUDA-accelerated results of several R benchmark.
This webinar will serve as an introductory tutorial for anyone interested in accelerated computing using compiler directives. Participants will learn about OpenACC and a proven process for accelerating applications using compiler directives. No prior GPU or parallel programming experience is required to attend this webinar, but the ability to read and understand C, C++, and or Fortran code is needed.