In this talk, I will highlight the main research challenges facing the field of activity detection in untrimmed videos, as well as, deep learning based methods developed at KAUST to address them. Massive amounts of video data need to be processed for relevant semantic information that predominantly focuses on human activities (i.e. single human, human-to-human, and human-to-object interactions). While this problem is encountered in many real-world applications (e.g. video surveillance, large-scale video summarization, and ad placement in video platforms), automated vision solutions have been hindered by several challenges including the lack of large-scale datasets for learning and the need for real-time processing. I will highlight how deep learning can be used to tackle these challenges.