Graph Information System

Xifeng Yan,  University of California at Santa Barbara
Project Summary

CAREER: Graph Information System: Deciphering Complex Networks, funded by NSF Career IIS-0954125.

Project Summary

Graphs and networks are ubiquitous, encoding complex relationships ranging from chemical bonds to social interactions. Hidden in these networks are the answers to many important questions in biology, business, and sociology. In order to analyze complex networks, users have to master sophisticated computing and programming skills. It indeed becomes a pain point for many scientists and engineers.

This project is to change the state of the art by developing a general graph information system, which is able to address the needs of searching and mining complex networks. Real-life networks are complex, not only having topological structures, but also containing heterogeneous contents and attributes associated with nodes and edges. The mixture of structures and contents raises two challenges that require new solutions for smarter and faster graph analysis.  First, new types of graph search and mining operations, such as graph aggregation, graph association, and graph pattern mining, are emerging. Second, when graphs become complex and large, most of existing graph mining algorithms cannot scale well. This project addresses these challenges and performs a comprehensive study of a general graph information system. The proposed system includes three major components: complex graph search, graph pattern mining, and graph indexing. It covers emerging structure queries in social, biological, and information networks, new graph mining operators such as graph summarization and association, and innovative indexing methodologies, e.g., differential graph index.

This research is tightly integrated with education through student mentoring and curriculum development. Publications, software and course materials resulted from this project are disseminated on this website

Graduated Students: Nan Li (Data Scientist, oDesk, Apple, now Facebook), Arijit Khan (PostDoc, ETH), Shengqi Yang (Research Scientist, Facebook), Bo Zong (Research Scientist, NEC-Labs), Honglei Liu (Research Scientist, Facebook), Yu Su (Assistant Professor, Ohio State Univ.)

Undergraduate Students: Bruce Liu (Pasadena Community College/UCI)


  1. Scalable Construction and Querying of Massive Knowledge Bases (Tutorial),
    by X. Ren, Y. Su, P. Szekely, X. Yan.
    (Proc. of the International Conference on World Wide Web), 2018 [website][slides1][slides2][slides3]
  2. Construction and Querying of Large-scale Knowledge Bases (Tutorial),
    by X. Ren, Y. Su, X. Yan.
    (Proc. of the ACM International Conference on Information and Knowledge Management), 2017 [website][slides]

Curriculum Development

  1. Graph Processing and Mining was taught in CS290D - Advanced Data Mining (Winter, 2010) [schedule] (multiple lectures)
  2. Knowledge Base Question Answering was covered in our recent CS291K: Deep Learning for Text Mining and Understanding  (Spring, 2017) [schedule] (one lecture)


  1. DialSQL: Dialogue Based Structured Query Generation,
    by I. Gur, S. Yavuz, Y. Su, X. Yan,
    ACL'18 (Proc. of the Annual Meeting of the Association for Computational Linguistics, 2018) [pdf]
  2. Active Learning of Functional Networks from Spike Trains,
    by H. Liu and B. Wu,
    SIAM Int. Conf. on Data Mining), 2017 [pdf].
  3. Discovering Enterprise Concepts Using Spreadsheet Tables,
    by K. Li, Y. He, and K. Ganjam.
    KDD'17 (
    Proc. of the 23rd ACM Int. Conf. on Knowledge Discovery and Data Mining), 2017 [pdf]
  4. On Generating Characteristic-rich Question Sets for QA Evaluation, 
    by Y. Su, H. Sun, B. Sadler, M. Srivatsa, I. Gur, Z. Yan, and X. Yan,
    EMNLP'16 (Proc. of the 2016 Conf. on Empirical Methods in Natural Language Processing) 2016 [pdf]
  5. Fast Motif Discovery in Short Sequences,
    by H. Liu, F. Han, H. Zhou, X. Yan, K. Kosik,
    ICDE'16 (Proc. of Int. Conf. on Data Engineering), 2016. [pdf]
  6. Entity Disambiguation with Linkless Knowledge Bases,
    by Y. Li, S. Tan, H. Sun, J. Han, D. Roth and X. Yan,
    WWW'16 (Proc. of the 25th Int. World Wide Web Conference), 2016. [pdf]
  7. Behavior Query Discovery in System-Generated Temporal Graphs,
    by B. Zong, X. Xiao, Z. Li, Z. Wu, Z. Qian, X. Yan, A. Singh, and G. Jiang,
    VLDB'16 (Proc. of the 42th Int. Conf. on Very Large Databases), 2016. [pdf]
  8. Query-Based Outlier Detection in Heterogeneous Information Networks,
    by H. Zhuang, J. Zhang, G. Brova, J. Tang, H. Cam, X. Yan, and J. Han,
    EDBT'15 (18th International Conference on Extending Database Technology), 2015 [pdf]
  9. Mining Query-Based Subnetwork Outliers in Heterogeneous Information Networks,
    by H. Zhuang, J. Zhang, G. Brova, J. Tang, H. Cam, X. Yan, and J. Han,
    ICDM'14 (Proc. 2014 Int. Conf. on Data Mining), Dec 2014. [pdf]
  10. Towards Scalable Critical Alert Mining,
    by B. Zong, Y. Wu, J. Song, A. Singh, H. Cam, J. Han and X. Yan,
    KDD'14 (Proc. of the 20th ACM Int. Conf. on Knowledge Discovery and Data Mining), Aug 2014. [pdf]
  11. SLQ: A User-friendly Graph Querying System,
    by S. Yang, Y. Xie, Y. Wu, T. Wu, H. Sun, J. Wu, X. Yan,
    SIGMOD'14 (Proc. 2014 Int. Conf. on Management of Data) (demo paper), 2014. [pdf] [demo]
  12. Schemaless and Structureless Graph Querying,
    by S. Yang, Y. Wu, H. Sun, X. Yan,
    VLDB'14 (Proc. of the 40th Int. Conf. on Very Large Databases), 2014. [pdf]
  13. A Probabilistic Approach to Uncovering Attributed Graph Anomalies,
    by N. Li, H. Sun, K. Chipman, J. George, X. Yan,
    (Proc. 2014 SIAM Int. Conf. on Data Mining), 2014. [pdf]
  14. Cloud Service Placement via Subgraph Matching,
    by B. Zong, R. Raghavendra, M. Srivatsa, X. Yan, A. Singh, and K.-W. Lee,
    ICDE'14 (
    Proc. 2014 Int. Conf. on Data Engineering), 2014 [pdf]
  15. Summarizing Answer Graphs Induced by Keyword Queries,
    by Y. Wu, S. Yang, M. Srivatsa, A. Iyengar, X. Yan,
    VLDB'14 (
    Proc. of the 40th Int. Conf. on Very Large Databases), 2014.[pdf]
  16. Noise-Resistant Bicluster Recognition,
    by H. Sun, G. Miao, X. Yan,
    ICDM'13 (Proc. 2013 IEEE Int. Conf. on Data Mining), Dec 2013. [pdf] [software release]
  17. Mining Evidences for Named Entity Disambiguation,
    by Y. Li, C. Wang, F. Han, J. Han, D. Roth, and X. Yan,
    KDD'13 (Proc. of the 19th Int. Conf. on Knowledge Discovery and Data Mining), Aug 2013. [pdf]
  18. Memory Efficient Minimum Substring Partitioning,
    by Y. Li, P. Kamousi, F. Han, S. Yang, X. Yan, S. Suri,
    VLDB'13 (Proc. of the 39th Int. Conf. on Very Large Databases), Aug 2013. [pdf] [software release]
  19. NeMa: Fast Graph Search with Label Similarity,
    by A. Khan, Y. Wu, C. Aggarwal, X. Yan,
    VLDB'13 (Proc. of the 39th Int. Conf. on Very Large Databases ), Aug 2013. [pdf] [software release]
  20. Ontology-based Subgraph Querying,
    by Y. Wu, S. Yang, X. Yan,
    ICDE'13 (
    Proc. 2013 Int. Conf. on Data Engineering), Apr 2013. [pdf] [poster](Best Poster Award)
  21. Neighborhood Based Fast Graph Search in Large Networks,
    by A. Khan, N. Li, Z. Guan, X. Yan, S. Chakraborty, and S. Tao,
    SIGMOD'11 (Proc. 2011 Int. Conf. on Management of Data), June 2011  [pdf]
  22. Content-Aware Resolution Sequence Mining for Ticket Routing,
    by P. Sun, S. Tao, X. Yan, N. Anerousis, Y. Chen,
    BPM'10(The 8th Int. Conf. on Business Process Management),  Sep. 2010 [pdf]


2013 Nan Li, Ph.D., "Uncovering Anomalous Patterns in Large Attributed Graphs."
2013 Arijit Khan, Ph.D., "Towards Querying and Mining of Large-Scale Networks."
2015 Shengqi Yang, Ph.D., "Fast Search in Large Scale Knowledge Graphs."
2015 Bo Zong, Ph.D.,  "Towards Mining and Managing Large-Scale Temporal Graphs."  
2017 Honglei Liu, Ph.D., "Multi-level Knowledge Extraction from Sequence Data."
2018 Yu Su, Ph.D., "Bridging the Gap between Human and Data with AI."