Bowen Jin

SDM'25 Tutorial: Integrating Textual and Graph Data: Advancing Knowledge Discovery with Semantic and Structural Insights

Bowen Jin, Yu Zhang, Yunyi Zhang, Jiawei Han
Computer Science Department, University of Illinois at Urbana-Champaign
Time: May 1st, 2025.

Abstract

Graphs and textual data are both fundamental in the realm of data mining, each with its own unique features that often necessitate specialized modeling techniques. Traditionally, technologies for graph mining and text mining have been developed independently. However, in many real-world applications, these two data modalities frequently coexist and complement each other. For example, in e-commerce platforms, user-product interaction graphs and product descriptions provide different but complementary perspectives on product characteristics. Likewise, in scientific research, citation networks, author information, and the textual content of papers collectively shape the understanding of a paper’s impact.

In this tutorial, we will focus on recent advances in graph mining techniques that harness the power of Large Language Models (LLMs), as well as improvements in text mining methods through the integration of graph structure information. Our goal is to present a cohesive framework illustrating how the synergy between graph and text modalities can lead to richer insights and more profound knowledge discovery. The key topics covered will include:

(1) An introduction to how graph and text data are interlinked in real-world applications and how models like graph neural networks and large language models are designed to capture signals from both modalities;

(2) Network mining with language models: methods that utilize language models for representation learning on graphs and pretraining language models with graph data;

(3) Text mining with structural information: approaches to text classification, document retrieval, and question answering, where graph structures serve as auxiliary information;

(4) Progressing towards an integrated paradigm for mining both semantics and structural information in a unified framework.

This comprehensive exploration will provide insights into how combining graph structures and textual data can enhance the capabilities of modern machine-learning techniques. The tutorial will be presented in 2 hours.

Slides

Presenters

Bowen JinBowen Jin, Ph.D. student, Computer Science, UIUC. His research lies at the intersection of large generative models (e.g., large language models and diffusion models), multimodal data and information networks. His current research interest is LLM agent, reasoning and RL. He received Apple PhD Fellowship (2024) and Maxine Pao Memorial Fellowship (2024). He has numerous research publications at KDD, WWW, ICLR, ACL, and SIGIR.








Yu ZhangYu Zhang, Assistant Professor, Computer Science & Engineering, TAMU. He received his Ph.D. degree from UIUC. His research focuses on natural language processing with structured knowledge and its applications in scientific literature understanding. He is the recipient of the UIUC Dissertation Completion Fellowship and the Yunni & Maxine Pao Memorial Fellowship. He has published over 40 conference and journal papers, given 8 conference tutorials, and served as an Area Chair for KDD, ACL, and NeurIPS.








Yunyi ZhangYunyi Zhang, Ph.D. student, Computer Science, UIUC. His research focuses on weakly supervised text mining, text classification, and taxonomy construction. He has numerous research publications at KDD, WWW, WSDM, ACL, and EMNLP and has delivered tutorials in EDBT’23, KDD’23, and KDD’24.








Jiawei HanJiawei Han, Michael Aiken Chair Professor, Computer Science, UIUC. His research areas encompass data mining, text mining, data warehousing and information network analysis, with over 800 research publications. He is Fellow of ACM, Fellow of IEEE, and has received numerous prominent awards, including ACM SIGKDD Innovation Award (2004) and IEEE Computer Society W. Wallace McDowell Award (2009). He has delivered 50+ conference tutorials or keynote speeches.