CAREER: Graph
Information System: Deciphering Complex Networks, funded by
NSF Career IIS-0954125.
Project Summary
Graphs and networks are ubiquitous, encoding
complex relationships ranging from chemical bonds to social interactions. Hidden
in these networks are the answers to many important questions in biology,
business, and sociology. In order to analyze complex networks, users have to
master sophisticated computing and programming skills. It indeed becomes a pain
point for many scientists and engineers.
This project is to change the
state of the art by developing a general graph information system, which is able
to address the needs of searching and mining complex networks. Real-life
networks are complex, not only having topological structures, but also
containing heterogeneous contents and attributes associated with nodes and
edges. The mixture of structures and contents raises two challenges that require
new solutions for smarter and faster graph analysis. First, new types of
graph search and mining operations, such as graph aggregation, graph
association, and graph pattern mining, are emerging. Second, when graphs become
complex and large, most of existing graph mining algorithms cannot scale well.
This project addresses these challenges and performs a comprehensive study of a
general graph information system. The proposed system includes three major
components: complex graph search, graph pattern mining, and graph indexing. It
covers emerging structure queries in social, biological, and information
networks, new graph mining operators such as graph summarization and
association, and innovative indexing methodologies, e.g., differential graph
index.
This research is tightly integrated with education through student
mentoring and curriculum development. Publications, software and course
materials resulted from this project are disseminated on this website.
Graduated Students:
Nan Li (Data Scientist, oDesk, Apple,
now Facebook),
Arijit Khan (PostDoc,
ETH), Shengqi Yang
(Research Scientist, Facebook),
Bo Zong (Research
Scientist, NEC-Labs), Honglei
Liu (Research Scientist, Facebook),
Yu Su (Assistant Professor, Ohio
State Univ.)
Undergraduate Students:
Bruce Liu (Pasadena Community College/UCI)
Tutorials
-
Scalable Construction and Querying of Massive Knowledge Bases
(Tutorial),
by X. Ren, Y. Su, P. Szekely, X. Yan.
WWW'18 (Proc. of the International Conference
on World Wide Web), 2018 [website][slides1][slides2][slides3] - Construction and Querying of
Large-scale Knowledge Bases (Tutorial),
by X. Ren, Y. Su, X. Yan.
CIKM'17(Proc. of the
ACM International Conference on Information and Knowledge Management), 2017
[website][slides]
Curriculum Development
-
Graph Processing and Mining was taught in
CS290D - Advanced Data Mining (Winter,
2010) [schedule]
(multiple lectures)
-
Knowledge Base Question Answering was covered in our recent
CS291K: Deep Learning for Text Mining
and Understanding (Spring, 2017) [schedule]
(one lecture)
Publications
-
DialSQL: Dialogue Based Structured Query Generation,
by I. Gur, S.
Yavuz, Y. Su, X. Yan,
ACL'18
(Proc. of the Annual Meeting of the Association for Computational
Linguistics, 2018) [pdf] - Active Learning of Functional Networks from Spike Trains,
by
H. Liu and B. Wu,
SDM'17
(SIAM Int. Conf. on Data Mining), 2017 [pdf]. - Discovering Enterprise Concepts Using Spreadsheet Tables,
by K. Li, Y. He, and K. Ganjam.
KDD'17 (Proc.
of the 23rd ACM Int. Conf. on Knowledge Discovery and
Data Mining), 2017 [pdf] - On Generating Characteristic-rich Question Sets for QA Evaluation,
by Y.
Su, H. Sun, B. Sadler, M. Srivatsa, I. Gur, Z. Yan, and X. Yan,
EMNLP'16
(Proc. of the
2016 Conf. on Empirical Methods in Natural Language Processing) 2016 [pdf] - Fast Motif Discovery in Short Sequences,
by H. Liu, F. Han, H. Zhou, X. Yan,
K. Kosik,
ICDE'16
(Proc. of Int. Conf. on Data Engineering), 2016. [pdf] -
Entity Disambiguation with Linkless Knowledge Bases,
by Y. Li, S. Tan, H. Sun, J. Han, D. Roth and X. Yan,
WWW'16
(Proc. of the 25th Int. World Wide Web Conference), 2016. [pdf]
- Behavior Query Discovery in System-Generated Temporal Graphs,
by B. Zong, X. Xiao, Z. Li, Z. Wu, Z. Qian, X. Yan, A. Singh, and G.
Jiang,
VLDB'16
(Proc. of the 42th Int. Conf. on Very Large Databases),
2016. [pdf] - Query-Based Outlier Detection in Heterogeneous Information Networks,
by H. Zhuang, J. Zhang,
G. Brova, J. Tang, H. Cam, X. Yan, and J. Han,
EDBT'15 (18th International Conference on Extending Database Technology),
2015 [pdf]
- Mining Query-Based Subnetwork Outliers in Heterogeneous Information
Networks,
by H. Zhuang, J. Zhang, G. Brova, J. Tang, H. Cam, X. Yan, and J.
Han,
ICDM'14
(Proc. 2014 Int. Conf. on Data Mining), Dec 2014. [pdf]
- Towards Scalable Critical Alert Mining,
by B. Zong, Y. Wu, J. Song, A.
Singh, H. Cam, J. Han and X. Yan,
KDD'14
(Proc. of the 20th ACM Int. Conf. on Knowledge
Discovery and Data Mining), Aug 2014. [pdf] - SLQ: A User-friendly Graph Querying System,
by S. Yang, Y. Xie, Y.
Wu, T. Wu, H. Sun, J. Wu, X. Yan,
SIGMOD'14
(Proc. 2014 Int. Conf. on Management of Data) (demo paper), 2014. [pdf]
[demo] - Schemaless and Structureless Graph Querying,
by S. Yang, Y. Wu, H. Sun,
X. Yan,
VLDB'14
(Proc. of the 40th Int. Conf. on Very Large Databases),
2014. [pdf] - A Probabilistic Approach to Uncovering Attributed Graph
Anomalies,
by
N. Li, H. Sun, K. Chipman, J. George, X. Yan,
SDM'14
(Proc. 2014 SIAM Int.
Conf. on Data Mining), 2014. [pdf] -
Cloud Service Placement via Subgraph Matching,
by B. Zong, R.
Raghavendra, M. Srivatsa, X. Yan, A. Singh, and K.-W. Lee,
ICDE'14
(Proc. 2014 Int. Conf. on Data Engineering), 2014
[pdf] - Summarizing Answer Graphs Induced by Keyword Queries,
by Y. Wu, S.
Yang, M. Srivatsa, A. Iyengar, X. Yan,
VLDB'14
(Proc. of the 40th Int. Conf. on Very
Large Databases), 2014.[pdf] - Noise-Resistant Bicluster Recognition,
by H. Sun, G. Miao, X. Yan,
ICDM'13 (Proc.
2013 IEEE Int. Conf. on Data Mining), Dec 2013. [pdf]
[software release] - Mining Evidences for Named Entity Disambiguation,
by Y. Li, C. Wang, F. Han, J. Han, D. Roth, and X. Yan,
KDD'13
(Proc. of the 19th Int. Conf. on Knowledge Discovery and Data Mining), Aug 2013. [pdf]
- Memory Efficient Minimum Substring Partitioning,
by Y. Li, P. Kamousi, F.
Han, S. Yang, X. Yan, S. Suri,
VLDB'13
(Proc. of the 39th Int. Conf. on Very Large
Databases), Aug 2013. [pdf] [software release]
- NeMa: Fast Graph Search with Label Similarity,
by A. Khan, Y. Wu, C.
Aggarwal, X. Yan,
VLDB'13
(Proc. of the 39th Int. Conf. on Very Large Databases ),
Aug 2013. [pdf] [software
release] - Ontology-based Subgraph Querying,
by Y. Wu, S. Yang, X. Yan,
ICDE'13 (Proc. 2013 Int. Conf. on Data Engineering), Apr 2013. [pdf]
[poster](Best
Poster Award)
- Neighborhood Based Fast
Graph Search in Large Networks,
by A. Khan, N. Li, Z. Guan, X. Yan, S.
Chakraborty, and S. Tao,
SIGMOD'11
(Proc. 2011 Int. Conf. on Management of Data), June
2011 [pdf]
- Content-Aware Resolution Sequence Mining for Ticket Routing,
by P. Sun, S.
Tao, X. Yan, N. Anerousis, Y. Chen,
BPM'10(The 8th Int. Conf. on
Business Process Management), Sep. 2010
[pdf]
Dissertations
2013 Nan Li, Ph.D., "Uncovering
Anomalous Patterns in Large Attributed Graphs."
2013 Arijit Khan, Ph.D., "Towards Querying and Mining of
Large-Scale Networks."
2015 Shengqi Yang, Ph.D., "Fast Search in Large Scale Knowledge Graphs."
2015 Bo Zong, Ph.D., "Towards Mining and Managing Large-Scale Temporal Graphs."
2017 Honglei Liu, Ph.D., "Multi-level Knowledge Extraction from Sequence Data."
2018 Yu Su, Ph.D., "Bridging the Gap between Human and Data with AI."