EMNLP 2019 Tutorial: Discreteness in Neural Natural Language Processing

Lili Mou, Hao Zhou, and Lei Li
November 4, 2019.


This tutorial provides a comprehensive guide to the process of discreteness in neural NLP.

As a gentle start, we will briefly introduce the background of deep learning based NLP, where we point out the ubiquitous discreteness of natural language and its challenges in neural information processing. Particularly, we will focus on how such discreteness plays a role in the input space, the latent space, and the output space of a neural network. In each part, we will provide examples, discuss machine learning techniques, as well as demonstrate NLP applications.


Part 1, Part 2, Part 3


Part I, Part II

Content and Outline

  1. Tutorial Introduction
    • The role of distributed representation in deep learning
    • Ubiquitous discreteness in natural language processing
    • Challenges of dealing with discreteness in neural NLP
  2. Discrete Input Space
    • Examples of discrete input space
    • Embedding discrete input as distributed vectors
    • Incorporating discrete structures into neural architectures
  3. Discrete Latent Space
    • Definitions & Examples
    • General techniques
      • Maximum likelihood estimation
      • Reinforcement learning
      • Gumbel-softmax
      • Step-by-step Attention
    • Case studies
      • Weakly supervised semantic parsing
      • Unsupervised syntactic parsing
  4. Discrete Output Space
    • Examples of discrete output space
    • Challenges and Solutions of Discrete Output Space
      • From Continuous Outputs to Discrete Outputs: Embedding Matching by Softmax
      • Non-differentiable: Difficult for non-MLE training (e.g., GAN)
        • RL for Generation
        • Gumbel Softmax for Generation
      • Exponential Search Space
        • Hard for Global Inference
        • Hard for Constrained Decoding
    • Case Study
      • Kernelized Bayesian Softmax
      • SeqGAN
      • Constrained Sentence Generation with CGMH
  5. Conclusion and Take Away

Tutorial Presenters

Lili Mou (University of Waterloo)

Lili Mou is currently a postdoctoral fellow at the University of Waterloo. Lili Mou received his BS and PhD degrees in 2012 and 2017, respectively, from School of EECS, Peking University. His research interests include deep learning applied to natural language processing as well as programming language processing. He is currently focusing on neural-symbolic approaches and generative models for NLP. He has publications at top conferences and journals like AAAI, ACL, CIKM, COLING, EMNLP, ICML, IJCAI, INTERSPEECH, and TACL. He has also published a monograph with Springer.

Hao Zhou (ByteDance AI Lab)

Hao Zhou is a researcher at ByteDance AI Lab. His research interests are machine learning and its applications for natural language processing, including syntax parsing, machine translation and text generation. Currently he focuses on deep generative models for NLP. Previously he received his Ph.D. degrees in 2017, from Nanjing University. He has publications in prestigious conferences and journals, including ACL, EMNLP, NIPS, AAAI, TACL and JAIR.

Lei Li (ByteDance AI Lab)

Dr. Lei Li is Director of ByteDance AI Lab. Lei received his B.S. in Computer Science and Engineering from Shanghai Jiao Tong University (ACM class) and Ph.D. in Computer Science from Carnegie Mellon University, respectively. His dissertation work on fast algorithms for mining co-evolving time series was awarded ACM KDD best dissertation (runner up). His recent work on AI writing received 2nd-class award of WU Wenjun AI prize. Before ByteDance, he worked at Baidu’s Institute of Deep Learning in Silicon Valley as a Principal Research Scientist. Before that, he was working in EECS department of UC Berkeley as a Post-Doctoral Researcher. He has served in the Program Committee for ICML 2014, ECML/PKDD 2014/2015, SDM 2013/2014, IJCAI 2011/2013/2016/2019, KDD 2015/2016, 2017 KDD Cup co-Chair, KDD 2018 hands-on tutorial co-chair, EMNLP 2018, AAAI 2019 senior PC, and as a lecturer in 2014 summer school on Probabilistic Programming for Advancing Machine Learning. He has published over 40 technical papers and holds 3 US patents.