AICGSecEval is an open-source benchmark framework designed to evaluate the security of code generated by artificial intelligence systems. The project was developed to address concerns that AI-assisted programming tools may produce insecure code containing vulnerabilities such as injection flaws or unsafe logic. The framework constructs evaluation tasks based on real-world software repositories and known vulnerability cases derived from CVE records. By simulating realistic development scenarios, the benchmark assesses how well AI code generation systems handle security-sensitive programming tasks. AICGSecEval combines static and dynamic evaluation techniques to analyze generated code for vulnerabilities and functional correctness. The framework includes datasets, test cases, and evaluation metrics that measure how AI programming tools perform across multiple programming languages and vulnerability categories.
Features
- Repository-level benchmark for evaluating AI-generated code security
- Datasets derived from real software projects and CVE vulnerabilities
- Support for multiple programming languages and vulnerability types
- Hybrid evaluation combining static analysis and dynamic testing
- Simulation of real-world AI-assisted development workflows
- Benchmark metrics for measuring security, correctness, and stability