Intelligent Writing and Text Generation

  • developing controllable and interpretable methods for effective text generation.
  • Xiaomingbot: an intelligent news-writing robot. [Demo]
  • Bayesian sampling methods for controllable text generation: CGMH, MHA, TSMH, controlling the language generation explicitly using various constraints.
  • VAE with hierarchical latent priors. Solving the training problem for VAE with mixture of exponential family distribution. DEMVAE
  • Training better data-to-text generation with both data-text pairs and additional raw text. check out variational template machine that learns infinite templates for generation in the latent space.
  • One embedding is not enough to represent a word! Bayesian Softmax improves text generation.
  • Application in Advertising system: Generating bidwords for sponsored search, News headline editing.

Multilingual Machine Translation

How to develop a unified model that can translation many language pairs well? Existing neural machine translation relies on rich parallel bilingual corpus, which not readily available for many non-English language pairs.

  • Does pre-trained monolingual language models such as BERT/GPT benefit bilingual neural machine translation? check out CTNMT paper
  • Can we build a universal pre-trained neural model that can improve on any pairs of language translation? Even if the languages do not occur in the pre-training corpus? The mRASP and mRASP2 papers try to answer this question.¬†Read the blog post here.
  • Building a model like human who knows two languages: integrating the translation capabilities from one to the other and vice versa, as well as the capabilities to compose sentences in both languages. check out Mirror Generative Neural Machine Translation paper.
  • Prune-tune: a method that continually learns multiple domains of translation. It improves domain-specific translation successively without degenerating on the general domain, avoiding the common catastrophic forgetting problem. check out Prune-tune paper. [project page]
  • Learning language-specific sub-network is possible for multilingual neural machine translation. It improve zero-shot translation.
  • Graformer: connecting pretrained BERT and GPT with a small bridge module to boost performance for multilingual machine translation. It furhter enables easy exploitation of pre-trained models using monolingual corpus in multiple languages.
  • CIAT: designing small adapter subnets for multilingual machine translation. Should the adapter be serial or parallel to main backbone network? This study finds parallel adapter works better to counter interferences in langauges.
  • How to achieve the top performance in multiple language directions in WMT 20 (Chinese-English, German-English, French-German, English-Khmer, English-Pashto). check out our experience in this report, and this report.
  • The algorithms are deployed to production, check out VolcTrans that serves hundred millions of translation requests daily, in 55 languages.

Speech-to-text Translation

Can we build a single unified model that takes voice input in one language and output translation in another language? Existing systems are cascaded, combining an ASR system and a MT system. This project aims to build real working system that can achieve it in end-to-end fashion. The major challenges are two folds. A model should translate from the source language to the other, and it has to convert from one modality (audio) to the other (text). In addition, existing open datasets for speech translation are limited, usually a few hundred hours, much less than that for machine translation (e.g. 4 million pairs of sentences for English-German)

  • Can we utilize the additional transcription text in the source language to help train a better encoder for speech translation? Inspired by human's listen, understand, and translate steps, we propose LUT method that utilizes triplets of source audio, source text, target text, and a pre-trained BERT model to train a better end-to-end speech-to-text translation system. check out LUT paper. [project page]
  • Can we utilize additional large parallel bilingual sentence pairs for machine translation to enhance speech translation? A idea based on consecutive decoding can achieve this. checkout COSTT paper. [project page]
  • Study of human brain reveals that there is common region is the brain responsible for text and speech processing. Can a neural network map input of text and speech to the same semantic space? check out Chimera, a model for speech-to-text translation, utilizes the notion of share-semantic space to further improve speech translation.
  • Training techniques such as progressive multitask training improves speech translation. XSTNet obtains the state-of-the-art translation performance on MuST-C dataset.

AI-powered Drug Discovery

The purpose of this project is to use AI and machine learning to power the whole process of drug discovery, test, trial validation and manufacturing. 

  • Find novel and diverse molecules that are effective in terms of multiple chemical properties and target proteins. check out the MARS paper.


  • LightSeq: A High Performance Training and Inference Library for Transformer models. It is widely used for machine translation, text generation, visual recognition, and more. With the custom CUDA implementation, it achieves 10x speed-up over the original tensorflow seq2seq package, and faster than other implementations.
  • NeurST : A toolbok with readily available models for neural machine translation and speech-to-text translation.
  • BLOG: a probabilistic programming language for machine learning
  • Swift: a compiler for the probabilistic programming language BLOG.
  • DynaMMo: learning toolbox for multi-dimensional co-evolving time series. github page
  • CLDS: complex-valued linear dynamical system
  • PLiF: time-shift-invariant feature extraction for time series
  • BoLeRO: human motion capture occlution recovering
  • paralearn: a parallel algorithm for learning Markov models and linear dynamical systems (i.e. Kalman filter)
  • MLDS: learning dynamical model for tensor time series


  • TTNews: a dataset for Chinese document summarization. 50,000 news articles with summary for training, and 4,000 news articles for testing. [Task description] [Training data] [Testing data and evaluation script] [Reports from NLPCC2017 and NLPCC2018]
  • CNewSum: an extended version of TTNews for Chinese document summarization. It includes 304,307 documents and human-written summaries. It includes additional adequacy-level and deducibility-level labels. [Project URL]
  • MLGSum: a multilingual text summarization corpus with 1.2 million articles in 12 languages. Average length per article is 570 words. [Project URL] [Data]

Past Projects

Probabilistic programming languages and Bayesian inference

Time series learning

Parallel Learning for Sequential Models

Network analysis

  • social network and social media analysis
  • CDEM :fly embryo gene pattern mining. (finished)