CodeSearchNet

CodeSearchNet is a large-scale dataset and research benchmark designed to advance the development of systems that retrieve source code using natural language queries. The project was created through collaboration between GitHub and Microsoft Research and aims to support research on semantic code search and program understanding. The dataset contains millions of pairs of source code functions and corresponding documentation comments extracted from open-source repositories. These pairs allow machine learning models to learn relationships between natural language descriptions and programming code. The dataset currently covers several widely used programming languages, including Python, JavaScript, Ruby, Go, Java, and PHP. In addition to the dataset itself, the repository includes baseline models, evaluation tools, and instructions for building code retrieval systems that can map user queries to relevant code snippets.

Features

Large dataset containing millions of code and documentation pairs
Support for multiple programming languages including Python and Java
Benchmark tasks for evaluating semantic code search algorithms
Baseline machine learning models and pretrained weights
Evaluation utilities and metrics for comparing retrieval systems
Research platform for studying relationships between code and natural language

Project Samples

Project Activity

See All Activity >

License

MIT License

Follow CodeSearchNet

CodeSearchNet Web Site

Other Useful Business Software

Forever Free Full-Stack Observability | Grafana Cloud

Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.

Create free account

Rate This Project

User Reviews

Be the first to post a review of CodeSearchNet!

Additional Project Details

Programming Language

Python

Related Categories

Python Machine Learning Software

Registered

2026-03-12

Similar Business Software

Vertex AI

Build, deploy, and scale machine learning (ML) models faster, with fully managed ML tools for any use case. Through Vertex AI Workbench, Vertex AI is natively integrated with BigQuery, Dataproc, and Spark. You can use BigQuery ML to create and execute machine learning models in BigQuery...

See Software
Google Cloud BigQuery

BigQuery is a serverless, multicloud data warehouse that simplifies the process of working with all types of data so you can focus on getting valuable business insights quickly. At the core of Google’s data cloud, BigQuery allows you to simplify data integration, cost effectively and securely...

See Software
Google AI Studio

Google AI Studio is a unified development platform that helps teams explore, build, and deploy applications using Google’s most advanced AI models, including Gemini 3. It brings text, image, audio, and video models together in one interactive playground. With vibe coding, developers can use...

See Software
Google Cloud Speech-to-Text

Google Cloud’s Speech API processes more than 1 billion voice minutes per month with close to human levels of understanding for many commonly spoken languages. Powered by the best of Google's AI research and technology, Google Cloud's Speech-to-Text API helps you accurately transcribe speech...

See Software
Teradata VantageCloud

Teradata VantageCloud: The complete cloud analytics and data platform for AI. Teradata VantageCloud is an enterprise-grade, cloud-native data and analytics platform that unifies data management, advanced analytics, and AI/ML capabilities in a single environment. Designed for scalability and...

See Software
Fraud.net

Fraudnet's AI-driven platform empowers enterprises to prevent threats, streamline compliance, and manage risk in real-time. Our sophisticated machine learning models continuously learn from billions of transactions to identify anomalies and predict fraud attacks. Our unified solutions:...

See Software

Report inappropriate content

CodeSearchNet

Datasets, tools, and benchmarks for representation learning of code

Get an email when there's a new version of CodeSearchNet

Features

Project Samples

Project Activity

Categories

License

Follow CodeSearchNet

User Reviews

Additional Project Details

Programming Language

Related Categories

Registered