Elahe Vahdani

I am a computer vision researcher, specializing in video understanding. I obtained my Ph.D. in Computer Science from The Graduate Center, The City University of New York, in Fall 2023, guided by Professor Yingli Tian. My research primarily focused on 'Deep Learning-Based Human Action Understanding in Videos,' the topic of my Ph.D. thesis. Prior to my doctoral studies, I obtained my Bachelor's degree in mathematics at Sharif University of Technology.

Email  /  Google Scholar  /  GitHub  /  LinkedIn


  • I defended my Ph.D. thesis in Computer Science at The City University of New York.

  • Our paper, 'Multi-Modal Multi-Channel American Sign Language Recognition', was accepted by the International Journal of Artificial Intelligence and Robotics Research, IJAIR 2023.

  • Our paper, 'Deep Learning-based Action Detection in Untrimmed Videos: A Survey', was accepted by the IEEE Transactions on Pattern Analysis and Machine Intelligence, TPAMI 2022.

  • Our paper, 'Cross-Modal Center Loss for 3D Cross-Modal Retrieval', was accepted by the IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2021.

  • I completed an internship at Dataminr as a Research Science Intern in Fall 2021.

  • I completed an internship at Expedia Group as a Data Science Intern in Summer 2021.

  • I received the M.Phil degree in Computer Science from The City University of New York.

Research Projects

Below is a summary of my research projects encompassing various areas of computer vision. These include Action Detection in Untrimmed Videos, Sign Language Understanding, Cross-Modal Retrieval, and Multi-camera Vehicle Tracking and Re-identification.

Deep Learning-based Action Detection in Untrimmed Videos: A Survey
Elahe Vahdani, Yingli Tian
TPAMI , 2022

An extensive overview of deep learning-based algorithms to tackle temporal action detection in untrimmed videos with different supervision levels.

Cross-Modal Center Loss for 3D Cross-Modal Retrieval
Longlong Jing*, Elahe Vahdani*, Jiaxing Tan, Yingli Tian
CVPR , 2021

A novel cross-modal framework, designed to map representations from various modalities — such as images, mesh, and point-cloud — into a unified feature space.

Recognizing American Sign Language Nonmanual Signal Grammar Errors in Continuous Videos
Elahe Vahdani, Longlong Jing, Yingli Tian, Matt Huenerfaut
ICPR , 2020

We developed an educational tool that enables sign language students to automatically process their signing video assignments and receive immediate feedback on their fluency. This tool utilizes deep learning algorithms for the detection of grammatically important elements in continuous signing videos.

Multi-Modal Multi-Channel American Sign Language Recognition
Elahe Vahdani, Longlong Jing, Yingli Tian, Matt Huenerfaut
IJAIR , 2023
PDF / Project Page / Dataset /

A multi-modal, multi-channel framework for the real-time recognition of American Sign Language (ASL) signs from RGB-D videos.

An Isolated-Signing RGBD Dataset of 100 American Sign Language Signs Produced by Fluent ASL Signers
Saad Hassan, Larwan Berke, Elahe Vahdani, Longlong Jing, Yingli Tian, Matt Huenerfaut
LREC , 2020
PDF/ Project Page / Dataset /

We have collected a new dataset consisting of color and depth videos of fluent American Sign Language (ASL) signers performing sequences of 100 ASL signs from a Kinect v2 sensor.

Multi-camera Vehicle Tracking and Re-identification on AI City Challenge 2019
Yucheng Chen, Longlong Jing, Elahe Vahdani, Ling Zhang, Mingyi He, Yingli Tian
CVPR AI City Workshop, 2019
PDF / Slides / Poster

Our team's solutions for the image-based vehicle re-identification track and the multi-camera vehicle tracking track were featured in the AI City Challenge 2019. Our proposed framework significantly outperformed the current state-of-the-art vehicle ReID method, achieving a 16.3% improvement on the Veri dataset.

Gathering Information in Sensor Networks for Synchronized Freshness
Elahe Vahdani, Amotz Bar-Noy, Matthew P. Johnson, Tarek Abdelzaher

An approximation algorithm for the NP-hard optimization problem of scheduling a set of n given jobs, each with specific deadlines, using a minimum number of channels in a sensor network.