LLM4GW
LLM4GW is the first comprehensive study to assess how effective Large Language Models (LLMs) are for tasks related to GitHub workflows. While LLMs have shown effectiveness in software development tasks like coding and testing, GitHub workflows are distinct from regular code in terms of structure, semantics, and security properties.
We curated a dataset of around 400,000 workflows based on ARGUS dataset, generated prompts with varying levels of detail, and fine-tuned three state-of-the-art LLMs: GPT-3.5, CodeLlama, and StarChat. We evaluated the performance of these LLMs, both off-the-shelf and fine-tuned, on five workflow-related tasks: workflow generation, defect detection (syntactic errors and code injection vulnerabilities), and defect repair. The evaluation encompassed different prompting modes (zero-shot, one-shot) and involved identifying the best-performing temperature value and prompt for each LLM and task.
The study revealed that, unlike regular code generation, LLMs require detailed prompts to generate the desired workflows, but these detailed prompts can lead to invalid workflows with syntactic errors. Additionally, the LLMs were found to produce workflows with code injection vulnerabilities. The research also highlights the need for novel LLM-assisted techniques, as the current LLMs were found to be ineffective at repairing workflow defects.
Paper
Our paper is accepted at ARES '24.
Code
Our code is opensourced on GitHub. Please check out the repository for more details.
Bibtex
@inproceedings{10.1145/3664476.3664497, author = {Zhang, Xinyu and Muralee, Siddharth and Cherupattamoolayil, Sourag and Machiry, Aravind}, title = {On the Effectiveness of Large Language Models for GitHub Workflows}, year = {2024}, isbn = {9798400717185}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, url = {https://doi.org/10.1145/3664476.3664497}, doi = {10.1145/3664476.3664497}, booktitle = {Proceedings of the 19th International Conference on Availability, Reliability and Security}, articleno = {32}, numpages = {14}, location = {Vienna, Austria}, series = {ARES '24} }
LLM4GW | PurS3 Lab at Purdue University | PurSec Lab at Purdue University | WSPR Lab at North Carolina State University