Elahe Vahdani

I'm a Ph.D. student in the Media Lab, Dept. Computer Science at The City University of New York, advised by Professor Ying-Li Tian. My current research focuses on Video Understanding, Multimodal Understanding, and Action Detection. I received my Bachelor's degree from Sharif University of Technology.

Email  /  Google Scholar  /  Curriculum Vitae  /  LinkedIn


  • Our paper "Deep Learning-based Action Detection in Untrimmed Videos: A Survey" was accepted to TPAMI 2022.

  • Excited to join Dataminr as a Research Science Intern for Fall 2021.

  • Our paper "Cross-modal Center Loss" was accepted to CVPR 2021!

  • Excited to join Expedia Group as a Data Science Intern for Summer 2021.

  • Received the M.Phil degree (Master of Philosophy) in Computer Science, The City University of New York.

  • Advanced to Ph.D. candidacy.


I'm a PhD candidate in computer vision and my research interests are video understanding, multimodal understanding, and action detection. I have also researched on self-supervised feature learning, facial expression analysis, object detection with cross-modality bridging, and vehicle re-identification.

Deep Learning-based Action Detection in Untrimmed Videos: A Survey
Elahe Vahdani*, Yingli Tian
TPAMI, 2022

An extensive overview of deep learning-based algorithms to tackle temporal action detection in untrimmed videos with different supervision levels.

Cross-modal Center Loss
Longlong Jing*, Elahe Vahdani*, Jiaxing Tan, Yingli Tian
CVPR 2021

We propose a novel cross-modal center loss to map the representations of different settings of modalities (e.g., images, mesh, point-cloud) into a common feature space.

Recognizing American Sign Language Nonmanual Signal Grammar Errors in Continuous Videos
Elahe Vahdani*, Longlong Jing*, Yingli Tian, Matt Huenerfaut
ICPR , 2020

We designed an educational tool for sign language students to automatically process their signing video assignments and send them an immediate feedback regarding the fluency of their signing. The framework is based on deep-learning algorithms for temporal detection of grammatically important elements from continuous signing videos, and checking their correspondence in multiple modalities such as facial expression, head movements and hand gestures.

Recognizing American Sign Language Manual Signs from RGB-D Videos
Longlong Jing*, Elahe Vahdani*, Yingli Tian, Matt Huenerfaut
Under Review , 2020
PDF / Project Page / Dataset /

We propose a 3D ConvNet based multi-stream framework to recognize American Sign Language (ASL) manual signs in real-time from RGB-D videos.

An Isolated-Signing RGBD Dataset of 100 American Sign Language Signs Produced by Fluent ASL Signers
Saad Hassan, Larwan Berke, Elahe Vahdani, Longlong Jing, Yingli Tian, Matt Huenerfaut
LREC , 2020
PDF/ Project Page / Dataset /

We have collected a new dataset consisting of color and depth videos of fluent American Sign Language (ASL) signers performing sequences of 100 ASL signs from a Kinect v2 sensor.

Multi-camera Vehicle Tracking and Re-identification on AI City Challenge 2019
Yucheng Chen, Longlong Jing, Elahe Vahdani, Ling Zhang, Mingyi He, Yingli Tian
CVPR AI City Workshop, 2019
PDF / Slides / Poster

Our solutions to the image-based vehicle re-identification track and multi-camera vehicle tracking track on AI City Challenge 2019 (AIC2019). Our proposed framework outperforms the current state-of-the-art vehicle ReID method by 16.3% on Veri dataset.

Gathering Information in Sensor Networks for Synchronized Freshness
Elahe Vahdani, Amotz Bar-Noy, Matthew P. Johnson, Tarek Abdelzaher

Designed an approximation algorithm for the optimization problem of scheduling a set of n given jobs with their specific deadlines via a minimum number of channels in a sensor network. We proved the problem is NP-hard and provided a O(log n)- approximation algorithm.