Synthetic Programming Elicitation

This is the supplementary material for Synthetic Programming Elicitation for Text-to-Code in Very Low-Resource Programming and Formal Languages. For the code and installation instructions, please see the GitHub repository.

Recent advances in large language models (LLMs) for code applications have demonstrated remarkable zero-shot fluency and instruction following on challenging code related tasks ranging from test case generation to self-repair. Unsurprisingly, however, models struggle to compose syntactically valid programs in programming languages unrepresented in pre-training, referred to as very low-resource Programming Languages (VLPLs). VLPLs appear in crucial settings, including domain-specific languages for internal tools, tool-chains for legacy languages, and formal verification frameworks. Inspired by an HCI technique called natural program elicitation, we propose designing an intermediate language that LLMs "naturally" know how to use and which can be automatically compiled to a target VLPL. When LLMs generate code that lies outside of this intermediate language, we use compiler techniques to repair the code into programs in the intermediate language. Overall, we introduce synthetic programming elicitation and compilation (SPEAC), an approach that enables LLMs to generate syntactically valid code even for VLPLs. We empirically evaluate the performance of SPEAC in a case study for the UCLID5 formal verification language and find that, compared to existing retrieval and fine-tuning baselines, SPEAC produces syntactically correct programs more frequently and without sacrificing semantic correctness.

Data

Please find the challenge problems in the docs/data folder of the GitHub repository. Please find the raw outputs of our experiments in the docs/results folder of the GitHub repository.

For example, consider the problem Lee Seshia Ex. 3-13. The model generated by our approach using GPT-3.5-turbo is here and the full interaction with the LLM that produced this output is here. We graded this output 4/5 because it is almost perfectly correct. One small mistake is that it declares the pedestrian input signal as a local variable.