Modern graphics processing units, or GPUs, herald the democratization of parallel computing. Today's GPUs not only render video game frames, they also accelerate astrophysics, video transcoding, image processing, protein folding, seismic exploration, computational finance, radio astronomy, heart surgery, self-driving cars - the list goes on and on. It is imperative that we teach students parallel computing: they will inherit a world in which there exists no other kind. Meanwhile, the world of education is being shaken up by massively online open courses, or MOOCs, that offer a democratization of education. Universities and companies suddenly offer high quality courses over the internet - for free! - to anybody in the world. John Owens (UC Davis) and David Luebke (NVIDIA) have been teaching a MOOC focused on GPU computing. The Udacity course has over 40,000 register students from over 130 countries. This session will present their experience and thoughts on GPUs, MOOCs, and parallel computing education.
The rapid expansion of massively parallel computing, from smart phones to super computers, means we must improve and expand pedagogy in this field. CUDA is quickly becoming the go-to platform for teaching parallel programming at over 600 universities worldwide. Come join us at this session to hear from university faculty and industry professionals actively teaching CUDA across a wide spectrum of audiences. Learn what methods and materials work best for them. An "open-mic" Q&A session will follow brief presentations from each speaker, so come share your thoughts on the trends and needs of education for massively parallel computing.
In this study, we investigate the use of a programmable graphics processing unit (GPU) as an embedded processor for real-time recognition of speed limit signs on the road. The input to our system is a video sequence of the road taken from a moving vehicle. We process this video in real-time and determine if there are any speed limit signs present in the scene and, if so, we recognize and output the number indicated by the sign. The main goal of the recognition system is to operate in real time on a resource-constrained embedded system. Therefore, we first examine the merits and demerits of mapping algorithms often used for speed-limit recognition on to the GPU. Through this process, we find techniques that benefit significantly from the GPU architecture and eliminate algorithms that do not map efficiently on it. We then implement and analyze two sign detection schemes: one feature-based, one template-based. From the results of our experiments, we make several important conclusions about the trade-off between recognition rates and performance. We also make an estimate for the amount of hardware resources needed to perform the recognition in real-time."