My research focuses on making lecture videos more navigable, interactive, and accessible by leveraging multimodal analysis. This means creating systems that can simultaneously process and understand the different streams of information present in a video: the visuals from slides or a blackboard, the text from spoken transcripts, and the content extracted from slides using OCR.
To bring these ideas to life, I designed and built an interactive application that automatically structures video content and enhances the user experience by linking these different information sources.
T. Seng, A. Carlier, W.T. Ooi.
ACM Multimedia 2025, Open Source Track, 2025
T. Seng, A. Carlier, T. Forgione, V. Charvillat, W.T. Ooi.
International Conference on Document Analysis and Recognition, 2024
Travis Seng
MM '22: Proceedings of the 30th ACM International Conference on Multimedia, Doctoral Symposium, 2022
Arthur Renaudeau, Travis Seng, Axel Carlier, Fabien Pierre, François Lauze, Jean-François Aujol, Jean-Denis Durou
2020 25th International Conference on Pattern Recognition (ICPR), 2020
Habit tracker that turns progress into randomized point drops you redeem for your own rewards
AI-powered travel assistant for exploring China with intelligent location and restaurant recommendations
POC: run Whisper in the browser with transformers.js, select words to instantly cut a video segment — all client side
Real-time multiplayer quiz game built while learning JS, React Native, and websockets