WSDM'24 Tutorial: Bridging Text Data and Graph Data: Towards Semantics and Structure-aware Knowledge Discovery

Bowen Jin, Yu Zhang, Sha Li, Jiawei Han
Computer Science Department, University of Illinois at Urbana-Champaign
Time: Mar 4th, 2024.

Abstract

Graphs and texts play crucial roles in data mining, each possessing unique characteristics that often require distinct modeling methods. Technologies for mining graph data and text data are usually designed separately. Nevertheless, frequently, data contains a blend of both modalities, with their information frequently complementing each other. For instance, in e-commerce data, the product-user graph and product descriptions provide distinct insights into product features. Similarly, in scientific literature, the citation graph, author information, and the content of papers collectively contribute to modeling the impact of a paper.

In this tutorial, our emphasis will be on exploring the latest advancements in graph mining techniques that leverage the capabilities of Pre-trained Language Models (PLMs), as well as the enhancement of text mining methods through the incorporation of graph structure information. We will present an organized picture of how graphs and texts can mutually benefit each other and lead to deeper knowledge discovery, with the following outline:

(1) An introduction to how graph and text are intertwined in real-life data and how graph neural networks and pre-trained language models are designed to capture signal from graph and text modalities;

(2) Graph construction from text: construct sentence-level graphs, event graphs, reasoning, knowledge graphs from text.

(3) Network mining with language models: language model-based methods for representation learning on graph and language model pretraining on graphs.

(4) Text mining with structure information: text classification, literature retrieval, and question answering with graph structure as auxiliary information.

(5) Towards an integrated semantics and structure mining paradigm.

Slides

Introduction [Slides]
Part I: Introducing network structure into text corpus [Slides]
Part II: Graph Mining with LLMs [Slides]
Part III: Text Mining with Structured Information [Slides]
Summary [Slides]

Presenters

Bowen Jin, Ph.D. student, Computer Science, UIUC. His research focuses on mining text data and graph data. He has numerous research publications at KDD, WWW, ICLR, ACL, and SIGIR.

Yu Zhang, Ph.D. student, Computer Science, UIUC. His research focuses on weakly supervised text mining with structural information. He received the Dissertation Completion Fellowship (2023), the Yunni and Maxine Pao Memorial Fellowship (2022), and WWW Best Poster Award Honorable Mention (2018).

Sha Li Sha Li, Ph.D. student, Computer Science, UIUC. Her research focuses on information extraction and procedure understanding. She has numerous publications in ACL, EMNLP, NAACL, and WWW.

Jiawei Han, Michael Aiken Chair Professor, Computer Science, UIUC. His research areas encompass data mining, text mining, data warehousing and information network analysis, with over 800 research publications. He is Fellow of ACM, Fellow of IEEE, and has received numerous prominent awards, including ACM SIGKDD Innovation Award (2004) and IEEE Computer Society W. Wallace McDowell Award (2009). He has delivered 50+ conference tutorials or keynote speeches.