293S Projects

293S Projects

This page will be revised with more project ideas. Slides based on this page are here

Requirements

You may study algorithmic solutions, or system performance related issues for searching.

Project Timelines

In choosing your own project, you can follow other work, using the recent technical paper(s) published in top-rated information retrieval/search or related conferences (SIGIR, WWW, WSDM, ACL, EMNLP).

Tips for the class presentation:

Computing resource:

Past project reports Some project ideas written for 2021 CS293 course

  1. Document retrieval based neural impact scores while using the inverted index.

  2. Develop a simple multi-threaded key-value store using Linux files in C++/C to serve multiple contextual document embedding requests. The goal is to have a low-latency time to access the embedding representation of each document while providing a reasonable concurrency. This system can be used for document re-ranking: Composite Re-Ranking for Efficient Document Search with BERT WSDM 2022.

  3. Efficient C++ implementation wthout GPU for online-reranking with ColBERT. BERT C++ open-source code using Intel MKL is available here (Makefile may need a small change)

  4. Document Re-ranking based on Pyserini
    With Pyserini, you can derive a list of top document IDs for each tested query. You can gather the related documents for these queries and build a set of new features such as neural text features or knowledge entity features for each document. Rerank top documents for each query.

    You do not have to build an inverted index (which takes time to program). You can leverage Pyserini code and simply retrieve the text of top documents and for each query, build necessary features for top documents saved in memory.

    The instruction on how to run Pyserini on Expanse can be found here.

  5. Inverted index is classified as sparse retrieval. Dense retrieval is another approach which receives an attention recently with the advancement of BERT.
    Approximate nearest neighbor negative contrastive learning for dense text retrievaL ICLR 2021, github: https://github.com/microsoft/ANCE