Yao Teng - 滕尧

About Me [ CV ]

I am a third-year PHD student in The university of Hong Kong, supervised by Xihui Liu.

I was a M.Sc. student in the MCG Group, Department of Computer Science and Technology, Nanjing University, supervised by Prof. Limin Wang, from 2020 to 2023.

Previously, I obtained the B.Sc. degree from Software Engineering, Xidian University in 2020.

My researches focus on image/video generation and object detection.

Internship Experience

2025.3 - 2026.2, ByteDance. Working on Video VAE and Text-to-Video Generation.
2026.2 - present, Tencent. Working on World Model.

Research [ Google Scholar ]

	Adaptive 1D Video Diffusion Autoencoder Yao Teng, Minxuan Lin, Xian Liu, Shuai Wang, Xiao Yang and Xihui Liu Under review. Diffusion-based video generation on one-dimensional video diffusion tokenizer. [ Paper ]
	SJD++: Improved Speculative Jacobi Decoding for Training-free Acceleration of Discrete Auto-regressive Text-to-Image Generation Yao Teng, Zhihuan Jiang, Han Shi, Xian Liu, Xuefei Ning, Guohao Dai, Yu Wang, Zhenguo Li and Xihui Liu Under review. Improvement over SJD in speed. [ Paper ]
	Speculative Jacobi-Denoising Decoding for Accelerating Autoregressive Text-to-image Generation Yao Teng, Fuyun Wang, Xian Liu, Zhekai Chen, Han Shi, Yu Wang, Zhenguo Li, Weiyang Liu, Difan Zou, and Xihui Liu Neurips 2025. Unifying denoising and autoregressive decoding. [ Paper ]
	Accelerating Auto-regressive Text-to-Image Generation with Training-free Speculative Jacobi Decoding Yao Teng, Han Shi, Xian Liu, Xuefei Ning, Guohao Dai, Yu Wang, Zhenguo Li and Xihui Liu ICLR 2025. The first training-free acceleration method for T2I AR models. [ Paper ] [ Code ]
	DiM: Diffusion Mamba for Efficient High-Resolution Image Synthesis Yao Teng, Yue Wu, Han Shi, Xuefei Ning, Guohao Dai, Yu Wang, Zhenguo Li and Xihui Liu arxiv preprint 2024. A Mamba-based diffusion model for image generation. [ Paper ] [ Code ]
	Drag-A-Video: Non-rigid Video Editing with Point-based Interaction Yao Teng, Enze Xie, Yue Wu, Haoyu Han, Zhenguo Li and Xihui Liu arxiv preprint 2023. The "drag" in video editing. [ Paper ]
	StageInteractor: Query-based Object Detector with Cross-stage Interaction Yao Teng, Haisong Liu, Sheng Guo and Limin Wang ICCV 2023. New label assignment and feature interaction for query-based object detection. [ Paper ] [ Code ]
	Logit Normalization for Long-tail Object Detection Liang Zhao, Yao Teng and Limin Wang International Journal of Computer Vision. A test-time batch normalization to calibrate logits for long-tail object detection. [ Paper ] [ Code ]
	Structured Sparse R-CNN for Direct Scene Graph Generation Yao Teng and Limin Wang CVPR 2022. The first one-stage end-to-end scene graph generation framework without extra object detector for inference. [ Paper ] [ Code ] [ Zhihu ]
	Target Adaptive Context Aggregation for Video Scene Graph Generation Yao Teng, Limin Wang, Zhifeng Li and Gangshan Wu ICCV 2021. The first framework unifying the frame-level and video-level scene graph generation. [ Paper ] [ Code ]

Selected Awards

National Scholarship for Postgraduates, Nanjing University, 2021
Pacemaker to Outstanding Graduate Student, Nanjing University, 2021
1st Prize, Scholarship for Postgraduate Students, Nanjing University, 2021
2nd Award, University Scholarship, Xidian University, 2016-2018
Silver Medal, The 2019 ICPC China Shaanxi Provincial Programming Contest, Xidian University, 2019

Yao Teng (滕尧)

About Me [ CV ]

Internship Experience

Research [ Google Scholar ]

Selected Awards