NVIDIA GPU architecture becomes a very interesting hardware target for complex automotive application. We implemented the same automotive application on several different hardware targets and analyzed the maximum frame rate and the effective CPU charge. This paper shows how real-time applications like pedestrian detection and driving assistance take benefits from a massively parallel "central" architecture like GPU/CUDA. Real-time performance and zero-delay transfers can be achieved using a full asynchronous implementation. The same approach can really multiply the application performance by the number of GPU devices present on the embedded system, at a reasonable power consumption.