Industry trends in the coming years in the race to exascale imply the availability of cluster computing with hundreds to thousands of cores per chip. Programming presents a challenge due to the heterogeneous architecture. Using novel programming models that facilitate this process is necessary. In this talk we present the case of simulation and visualization of crowds. We analyze and compare the use of two programming models: OmpSs and CUDA and show that OmpSs allows us to exploit all the resources combining the use of CPU and GPU taking care of memory management, scheduling, communications and synchronization automatically. We will present experimental results obtained in the Barcelona Supercomputing Center GPU Cluster as well as describe several modes used for visualizing the results.