π§ββοΈ About me
My name is Junhao Cheng (η¨ι§θ±ͺ). I am currently an undergraduate student at Sun Yat-sen University (SYSU). I conduct research at HCP Lab, supervised by Prof. Xiaodan Liang (ζ’ε°δΈΉ). I am currently interning at Tencent PCG Arc Lab. My research interests lie in interactive and generative AI. Now I focus on designing novel applications for image/video generation and other downstream tasks to make AI serve for humans.
πππ I am seeking PhD/MPhil application opportunities and I am also open to any potential discussions or collaboration opportunities. If you are interested in my work or have any collaboration intentions, please feel free to email (howe4884@outlook.com) me without hesitation.
π₯ News
- 2024.10: Β ππ One paper as the first author is accepted by Energy (JCR Q1).
- 2024.06: Β ππ Release AutoStudio (400+Starsβ¨) for comic book generation.
- 2024.05: Β ππ One paper as the second author is accepted by ACL 2024.
- 2024.04: Β ππ Release TheaterGen for benchmarking multi-turn image generation.
π» Internships
- 2024.06 - now, Tencent PCG ARC Lab, Shenzhen.
- 2023.02 - 2024.06, Lenovo, Research Institute, Shenzhen.
- 2023.08 - 2024.02, Pengcheng Laboratory, Shenzhen.
- 2023.03 - 2023.08, Chinese Institute of Brain Research (CIBR), Liu Lab, Beijing.
π Publications

AutoStudio: Crafting Consistent Subjects in Multi-turn Interactive Image Generation
Junhao Cheng, Xi Lu, Hanhui Li, Khun Loun Zai, Baiqiao Yin, Yuhao Cheng, Yiqiang Yan, Xiaodan Liang*
- We propose a training-free multi-agent framework called AutoStudio. This framework stands out for its ability to maintain multi-subject consistency in on-the-fly multi-turn interactions with users, enabling it to accomplish various tasks such as open-ended story/manga book generation and multi-turn editing.

VisDiaHalBench: A Visual Dialogue Benchmark For Diagnosing Hallucination in Large Vision-Language Models
Qingxing Cao, Junhao Cheng, Xiaodan Liang*, Liang Lin*
- To investigate the hallucination problem of LVLMs when given long-term misleading textual history, we propose a novel visual dialogue hallucination evaluation benchmark VisDiaHalBench. The benchmark consists of samples with five-turn questions about an edited image and its original version. The benchmark is released in here.

TheaterGen: Character Management with LLM for Consistent Multi-turn Image Generation
Junhao Cheng, Baiqiao Yin, Kaixin Cai, Minbin Huang, Hanhui Li, Yuxin He, Xi Lu, Yue Li, Yifei Li, Yuhao Cheng, Yiqiang Yan, Xiaodan Liang*
- We propose TheaterGen, which is a training-free framework that utilizes a large language model to drive a text-to-image generation model, effectively addressing the issues of semantic consistency and contextual consistency in multi-turn image generation tasks without specialized training.

Integrating Domain Knowledge into Transformer for Short-Term Wind Power Forecasting
Junhao Cheng, Xing Luo*, Zhi Jin*
- We initially propose the DKFormer forecasting model, which integrates domain knowledge through three constraint modules that are crucial in data pre-processing, model training, and forecasting stages.
π Educations
2021.09 - now, Undergraduate.
School of Intelligent Systems Engineering, Sun Yat-sen University (SYSU), Guangdong.