πŸ§β€β™‚οΈ About me

My name is Junhao Cheng (程钧θ±ͺ). I am currently an undergraduate student at Sun Yat-sen University (SYSU). I conduct research at HCP Lab, supervised by Prof. Xiaodan Liang (撁小丹). I am currently interning at Tencent PCG Arc Lab. My research interests lie in interactive and generative AI. Now I focus on designing novel applications for image/video generation and other downstream tasks to make AI serve for humans.

πŸ‘‹πŸ‘‹πŸ‘‹ I am seeking PhD/MPhil application opportunities and I am also open to any potential discussions or collaboration opportunities. If you are interested in my work or have any collaboration intentions, please feel free to email (howe4884@outlook.com) me without hesitation.

πŸ”₯ News

  • 2024.10: Β πŸŽ‰πŸŽ‰ One paper as the first author is accepted by Energy (JCR Q1).
  • 2024.06: Β πŸŽ‰πŸŽ‰ Release AutoStudio (400+Stars✨) for comic book generation.
  • 2024.05: Β πŸŽ‰πŸŽ‰ One paper as the second author is accepted by ACL 2024.
  • 2024.04: Β πŸŽ‰πŸŽ‰ Release TheaterGen for benchmarking multi-turn image generation.

πŸ’» Internships

πŸ“ Publications

arXiv
sym

AutoStudio: Crafting Consistent Subjects in Multi-turn Interactive Image Generation
Junhao Cheng, Xi Lu, Hanhui Li, Khun Loun Zai, Baiqiao Yin, Yuhao Cheng, Yiqiang Yan, Xiaodan Liang*

GitHub stars [Project] [Code] [Paper]

  • We propose a training-free multi-agent framework called AutoStudio. This framework stands out for its ability to maintain multi-subject consistency in on-the-fly multi-turn interactions with users, enabling it to accomplish various tasks such as open-ended story/manga book generation and multi-turn editing.
ACL 2024
sym

VisDiaHalBench: A Visual Dialogue Benchmark For Diagnosing Hallucination in Large Vision-Language Models
Qingxing Cao, Junhao Cheng, Xiaodan Liang*, Liang Lin*

GitHub stars [Project] [Code] [Paper]

  • To investigate the hallucination problem of LVLMs when given long-term misleading textual history, we propose a novel visual dialogue hallucination evaluation benchmark VisDiaHalBench. The benchmark consists of samples with five-turn questions about an edited image and its original version. The benchmark is released in here.
arXiv
sym

TheaterGen: Character Management with LLM for Consistent Multi-turn Image Generation
Junhao Cheng, Baiqiao Yin, Kaixin Cai, Minbin Huang, Hanhui Li, Yuxin He, Xi Lu, Yue Li, Yifei Li, Yuhao Cheng, Yiqiang Yan, Xiaodan Liang*

GitHub stars [Project] [Code] [Paper]

  • We propose TheaterGen, which is a training-free framework that utilizes a large language model to drive a text-to-image generation model, effectively addressing the issues of semantic consistency and contextual consistency in multi-turn image generation tasks without specialized training.
Energy
sym

Integrating Domain Knowledge into Transformer for Short-Term Wind Power Forecasting
Junhao Cheng, Xing Luo*, Zhi Jin*

[Project] [Paper]

  • We initially propose the DKFormer forecasting model, which integrates domain knowledge through three constraint modules that are crucial in data pre-processing, model training, and forecasting stages.

πŸ“– Educations

2021.09 - now, Undergraduate.

School of Intelligent Systems Engineering, Sun Yat-sen University (SYSU), Guangdong.