Poe2CLP: phrase-level attention and cross-modal semantic alignment for poem generate chinese landscape paintings

XL Peng and TY Sun and QY Hu and ZG Sun and N Xu and JY Peng, NPJ HERITAGE SCIENCE, 13, 656 (2025).

DOI: 10.1038/s40494-025-02238-0

Generating traditional Chinese landscape paintings from classical poetry is a challenging cross-modal task due to the condensed semantics and esthetic abstraction of poetic language. Existing text-to-image models struggle to interpret classical Chinese syntax and reproduce ink-wash artistic styles. We propose Poe2CLP, a phrase-level attention and cross- modal alignment framework that dynamically captures composite semantic units in poems and adaptively fuses global mood with local imagery. Built upon a LoRA-enhanced diffusion model and trained on Poegraph-a new dataset of 5200 poem-painting pairs-Poe2CLP outperforms state-of-the-art methods in FID (130.95), CLIP-T (40.75), and CLIP Style Score (0.348). Our approach advances the digital interpretation of East Asian poetic- visual traditions. The dataset and code are publicly available.

Return to Publications page