Xinyun Chen, UC Berkeley: “Deep Learning for Program Synthesis from Diverse Specifications”

Position: PhD Candidate

Current Institution: UC Berkeley

Abstract: Deep Learning for Program Synthesis from Diverse Specifications

Deep neural networks have achieved remarkable success in natural language processing especially with the advancement of pre-training techniques. Moreover recent works show that by training on a large-scale code corpus sometimes these language models could even generate moderately complicated code from text descriptions. However I would like to argue that despite the remarkable success of these pre-trained models large-scale training does not automatically result in the capability of complex reasoning beyond text pattern matching. My main research focus is neural program synthesis where I propose deep learning techniques to synthesize programs from diverse specifications including input-output examples natural language descriptions and reference programs. The theme of my research lies in two folds. First I design neural network architectures that model the syntactic and semantic information of programs which improves the complexity and generalizability of the synthesized programs. By modeling the structure of the input specification and the output code we have developed neural program synthesizers with superior performance in various application domains including spreadsheet formula prediction code completion with natural language context code translation and code optimization. In terms of modeling the code semantics I design execution-guided techniques for program synthesis from input-output examples and we demonstrate that utilizing and representing partial program execution significantly improves the program synthesis performance. Meanwhile I also design neural-symbolic techniques for improving the generalizability and reasoning capability of neural networks beyond traditional program synthesis applications. In particular standard neural networks struggle to perform tasks that require logical reasoning numerical calculation and compositional generalization to out-of-distribution data. By integrating a symbolic reasoning module that synthesizes and executes programs for the task of interest our neural-symbolic models demonstrate superior compositional reasoning ability for solving a variety of language understanding problems.


Xinyun Chen is a Ph.D. candidate at UC Berkeley working with Prof. Dawn Song. Her research lies at the intersection of deep learning programming languages and security. Her recent research focuses on deep learning for program synthesis neural-symbolic reasoning and adversarial machine learning. Her research aims to tackle the grand challenges of increasing the accessibility of programming to general users and enhancing the security and trustworthiness of machine learning models. She received the Facebook Fellowship in 2020 and was selected for Rising Stars in EECS in 2020.