My research focuses on making lecture videos more navigable, interactive, and accessible by leveraging multimodal analysis. This means creating systems that can simultaneously process and understand the different streams of information present in a video: the visuals from slides or a blackboard, the text from spoken transcripts, and the content extracted from slides using OCR.
To bring these ideas to life, I designed and built an interactive application that automatically structures video content and enhances the user experience by linking these different information sources.
T. Seng, A. Carlier, T. Forgione, V. Charvillat, W.T. Ooi.
International Conference on Document Analysis and Recognition, 2024
Travis Seng
MM '22: Proceedings of the 30th ACM International Conference on Multimedia, Doctoral Symposium, 2022
Arthur Renaudeau, Travis Seng, Axel Carlier, Fabien Pierre, François Lauze, Jean-François Aujol, Jean-Denis Durou
2020 25th International Conference on Pattern Recognition (ICPR), 2020
A nice post for you (just kidding, it's a lorem ipsum...)