clustering free download

Showing 153 open source projects for "clustering"

View related business solutions

Python Clear Filters & Widen Search

Build Agents and Models on One Platform
Everything you need to build production-ready agents and models. Access 200+ Google and third-party AI models and tools.

Gemini Enterprise Agent Platform is Google Cloud's comprehensive platform for developers to build, scale, govern, and optimize agents and models. Choose from Google's most advanced models and third-party models like Anthropic's Claude Model Family.

Try It Free
Our Free Plans just got better! | Auth0
With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.

Try free now
1

HDBSCAN

A high performance implementation of HDBSCAN clustering

HDBSCAN - Hierarchical Density-Based Spatial Clustering of Applications with Noise. Performs DBSCAN over varying epsilon values and integrates the result to find a clustering that gives the best stability over epsilon. This allows HDBSCAN to find clusters of varying densities (unlike DBSCAN), and be more robust to parameter selection. In practice this means that HDBSCAN returns a good clustering straight away with little or no parameter tuning -- and the primary parameter, minimum cluster size, is intuitive and easy to select. ...

Downloads: 2 This Week

Last Update: 2026-06-01
See Project
2

Kubez-ansible

To provide quick deployment tools for kubernetes cluster

To provide quick deployment tools for Kubernetes cluster and cloud-native applications. This session has been tested on Rocky 8.5, Debian 11, and Ubuntu 20.04+ which are supported by python3.

Downloads: 0 This Week

Last Update: 2026-06-17
See Project
3

Scanpy

Single-cell analysis in Python

Scanpy is a scalable toolkit for analyzing single-cell gene expression data built jointly with anndata. It includes preprocessing, visualization, clustering, trajectory inference and differential expression testing. The Python-based implementation efficiently deals with datasets of more than one million cells.

Downloads: 1 This Week

Last Update: 2026-04-10
See Project
4

scikit-learn

Machine learning in Python

scikit-learn is an open source Python module for machine learning built on NumPy, SciPy and matplotlib. It offers simple and efficient tools for predictive data analysis and is reusable in various contexts.

Downloads: 14 This Week

Last Update: 2026-06-02
See Project
MongoDB Atlas runs apps anywhere
Deploy in 115+ regions with the modern database for every enterprise.

MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.

Start Free
5

sktime

A unified framework for machine learning with time series

sktime is a library for time series analysis in Python. It provides a unified interface for multiple time series learning tasks. Currently, this includes time series classification, regression, clustering, annotation, and forecasting. It comes with time series algorithms and scikit-learn compatible tools to build, tune and validate time series models. Our objective is to enhance the interoperability and usability of the time series analysis ecosystem in its entirety. sktime provides a unified interface for distinct but related time series learning tasks. ...

Downloads: 0 This Week

Last Update: 2026-06-11
See Project
6

Machine learning algorithms

Minimal and clean examples of machine learning algorithms

...This approach allows learners to study the mathematical and algorithmic details behind widely used models in a transparent and readable way. The repository includes implementations of both supervised and unsupervised learning techniques, along with dimensionality reduction and clustering methods. Many of the algorithms are written in a simplified style that prioritizes clarity and educational value over production-level optimization. Because the code is compact and easy to follow, it is often used as a learning resource by developers who want to understand how machine learning algorithms are constructed.

Downloads: 1 This Week

Last Update: 2026-05-07
See Project
7

model2Vec

Fast State-of-the-Art Static Embeddings

...By using a distillation-based approach, it can produce lightweight models that run efficiently on CPUs, making it suitable for edge applications and large-scale processing pipelines. The resulting models can be used for a wide range of tasks, including semantic search, clustering, classification, and retrieval-augmented generation systems. One of its key advantages is its simplicity, as it requires minimal dependencies and can generate embeddings extremely quickly compared to traditional transformer-based approaches.

Downloads: 0 This Week

Last Update: 2026-05-29
See Project
8

RAPTOR

The official implementation of RAPTOR

...Traditional RAG systems typically retrieve small text chunks independently, which can limit a model’s ability to understand broader document context. RAPTOR addresses this limitation by recursively embedding, clustering, and summarizing documents to create a tree-structured hierarchy of information. Each level of the tree represents summaries at different levels of abstraction, allowing retrieval to operate at both detailed and high-level conceptual layers. During inference, the system can navigate this hierarchical representation to retrieve information that best matches the user’s query while preserving broader contextual understanding. ...

Downloads: 0 This Week

Last Update: 2026-03-06
See Project
9

Engram

A New Axis of Sparsity for Large Language Models

...Engineered with speed and memory efficiency in mind, Engram supports batched indexing, incremental updates, and custom distance metrics so developers can tailor search behaviors to their domain’s needs. In addition to raw similarity search, the project includes tools for clustering, ranking, and filtering results, enabling richer user experiences like “related content”, semantic auto-completion, and contextual filtering.

Downloads: 0 This Week

Last Update: 2026-01-28
See Project
$300 Free Credits for Your Google Cloud Projects
Start building on Google Cloud with $300 in free credits. No commitment, no credit card required until you're ready to scale.

Launch your next project with $300 in free Google Cloud credits—no strings attached. Test, build, and deploy without risk. Use your credits across the entire Google Cloud platform to find what works best for your needs. After your credits are used, continue with always-free tier services. Only pay when you're ready to scale. Sign up in minutes and start exploring.

Start Free Trial
10

Kubespray

Deploy a Production Ready Kubernetes Cluster

Can be deployed on AWS, GCE, Azure, OpenStack, vSphere, Equinix Metal (bare metal), Oracle Cloud Infrastructure (Experimental), or Baremetal. Highly available cluster. Composable (Choice of the network plugin for instance). Supports most popular Linux distributions. Continuous integration tests. The list of available docker versions is 18.09, 19.03, and 20.10. The recommended docker version is 20.10. The kubelet might break on docker's non-standard version numbering (it no longer uses...

Downloads: 0 This Week

Last Update: 2026-04-24
See Project
11

kg-gen

Knowledge Graph Generation from Any Text

...Instead of relying on traditional rule-based extraction techniques, KG-Gen uses language models to identify entities and their relationships, producing higher-quality graph structures from raw text. The framework addresses common problems in automatic knowledge graph construction, particularly sparsity and duplication of entities, by applying a clustering and entity-resolution process that merges semantically similar nodes. This allows the generated graphs to be denser, more coherent, and easier to use for downstream tasks such as retrieval-augmented generation, semantic search, and reasoning systems.

Downloads: 1 This Week

Last Update: 2026-03-09
See Project
12

AWS ParallelCluster Node

Python package installed on the Amazon EC2 instances

aws-parallelcluster-node is the python package installed on the Amazon EC2 instances launched as part of AWS ParallelCluster. AWS ParallelCluster is an AWS-supported Open Source cluster management tool that makes it easy for you to deploy and manage High-Performance Computing (HPC) clusters in the AWS cloud. Built on the Open Source CfnCluster project, AWS ParallelCluster enables you to quickly build an HPC compute environment in AWS. It automatically sets up the required compute resources...

Downloads: 0 This Week

Last Update: 2026-05-11
See Project
13

MicroK8s

Single-package Kubernetes for developers, IoT and edge

Low-ops, minimal production Kubernetes, for devs, cloud, clusters, workstations, Edge and IoT. MicroK8s automatically chooses the best nodes for the Kubernetes datastore. When you lose a cluster database node, another node is promoted. No admin needed for your bulletproof edge. MicroK8s is small, with sensible defaults that ‘just work’. A quick install, easy upgrades and great security make it perfect for micro clouds and edge computing. As the publishers of MicroK8s, we deliver the world’s...

Downloads: 0 This Week

Last Update: 2026-05-20
See Project
14

Orange Data Mining

Orange: Interactive data analysis

...Perform simple data analysis with clever data visualization. Explore statistical distributions, box plots and scatter plots, or dive deeper with decision trees, hierarchical clustering, heatmaps, MDS and linear projections. Even your multidimensional data can become sensible in 2D, especially with clever attribute ranking and selections. Interactive data exploration for rapid qualitative analysis with clean visualizations. Graphic user interface allows you to focus on exploratory data analysis instead of coding, while clever defaults make fast prototyping of a data analysis workflow extremely easy. ...

Downloads: 22 This Week

Last Update: 2025-12-20
See Project
15

Homemade Machine Learning

Python examples of popular machine learning algorithms

...Each algorithm is accompanied by mathematical explanations, visualizations (often via Jupyter notebooks), and interactive demos so you can tweak parameters, data, and observe outcomes in real time. The purpose is pedagogical: you’ll see linear regression, logistic regression, k-means clustering, neural nets, decision trees, etc., built in Python using fundamentals like NumPy and Matplotlib, not hidden behind API calls. It is well suited for learners who want to move beyond library usage to understand how algorithms operate internally—how cost functions, gradients, updates and predictions work.

Downloads: 2 This Week

Last Update: 2025-11-23
See Project
16

machine learning tutorials

machine learning tutorials (mainly in Python3)

...It aims to strike a balance between theoretical explanation and practical coding by demonstrating algorithms both from scratch and using established libraries. The content is organized into multiple sections covering topics such as clustering, regression, dimensionality reduction, recommender systems, and model evaluation.

Downloads: 1 This Week

Last Update: 2026-06-05
See Project
17

Text Embeddings Inference

High-performance inference server for text embeddings models API layer

...It focuses on delivering fast and scalable embedding generation by leveraging optimized inference techniques and modern hardware acceleration. It is built to support transformer-based embedding models, making it suitable for tasks such as semantic search, clustering, and retrieval-augmented systems. It provides an API interface that allows developers to integrate embedding capabilities into applications without managing model internals directly. Text Embeddings Inference is optimized for throughput and low latency, enabling it to handle large volumes of requests reliably. It also emphasizes ease of deployment, often using containerization and configurable runtime options to adapt to different infrastructure setups.

Downloads: 0 This Week

Last Update: 2026-03-23
See Project
18

skfolio

Python library for portfolio optimization built on top of scikit-learn

...By following the familiar scikit-learn API design, the library allows quantitative researchers and developers to apply techniques such as model selection, cross-validation, and hyperparameter tuning to portfolio construction workflows. It supports a wide range of allocation methods, from classical mean-variance optimization to modern techniques that rely on clustering, factor models, and risk-based allocations. The framework also includes tools for evaluating portfolio performance under different market conditions, enabling users to test robustness and reduce the risk of overfitting.

Downloads: 0 This Week

Last Update: 2026-04-21
See Project
19

LOTUS

AI-Powered Data Processing: Use LOTUS to process all of your datasets

...The core concept of the framework is the use of semantic operators, which extend traditional relational database operations to support reasoning over text and other unstructured data. These operators allow tasks such as semantic filtering, ranking, clustering, and summarization to be expressed directly within data processing pipelines. The LOTUS engine automatically optimizes how language models are used during execution, which can significantly improve performance and reduce computational cost.

Downloads: 1 This Week

Last Update: 2026-06-13
See Project
20

Qwen3-VL-Embedding

Multimodal embedding and reranking models built on Qwen3-VL

Qwen3-VL-Embedding (with its companion Qwen3-VL-Reranker) is a state-of-the-art multimodal embedding and reranking model suite built on the open-sourced Qwen3-VL foundation, developed to handle diverse inputs including text, images, screenshots, and videos. The core embedding model maps such inputs into semantically rich vectors in a unified representation space, enabling similarity search, clustering, and cross-modal retrieval. The reranking model then precisely scores relevance between a given query and candidate documents, enhancing retrieval accuracy in complex multimodal tasks. Together, they support advanced information retrieval workflows such as image-text search, visual question answering (VQA), and video-text matching, while providing out-of-the-box support for more than 30 languages.

Downloads: 0 This Week

Last Update: 2026-04-08
See Project
21

ML for Beginners

12 weeks, 26 lessons, 52 quizzes, classic Machine Learning for all

ML-For-Beginners is a structured, project-driven curriculum that teaches foundational machine learning concepts with approachable math and lots of code. Organized as a multi-week course, it mixes short lectures with labs in notebooks so learners practice regression, classification, clustering, and recommendation techniques on real datasets. Each lesson aims to connect the algorithm to a relatable scenario, reinforcing intuition before diving into parameters, metrics, and trade-offs. The repository includes quizzes, solutions, and instructor materials to make the content usable in classrooms or self-study. It emphasizes ethical considerations and model evaluation—accuracy is not the only metric—so students learn to validate and communicate results responsibly. ...

Downloads: 0 This Week

Last Update: 2026-05-26
See Project
22

HyperTools

A Python toolbox for gaining geometric insights

...Functions for plotting high-dimensional datasets in 2/3D. Static and animated plots. Simple API for customizing plot styles. Set of powerful data manipulation tools including hyperalignment, k-means clustering, normalizing and more. Support for lists of Numpy arrays, Pandas dataframes, text or (mixed) lists. Applying topic models and other text vectorization methods to text data. HyperTools is designed to facilitate dimensionality reduction-based visual explorations of high-dimensional data. The basic pipeline is to feed in a high-dimensional dataset (or a series of high-dimensional datasets) and, in a single function call, reduce the dimensionality of the dataset(s) and create a plot.

Downloads: 0 This Week

Last Update: 2026-01-29
See Project
23

AiLearning-Theory-Applying

Quickly get started with AI theory and practical applications

...It includes well-commented notebooks, datasets, and implementation examples that allow learners to reproduce experiments and understand the inner workings of various algorithms. The project also introduces important concepts such as probability theory, linear algebra, regression models, clustering methods, and neural network architectures. Advanced sections explore modern AI topics including transformers, BERT-based natural language processing systems, and practical competition-style machine learning workflows.

Downloads: 0 This Week

Last Update: 2026-06-08
See Project
24

MTEB

MTEB: Massive Text Embedding Benchmark

Text embeddings are commonly evaluated on a small set of datasets from a single task not covering their possible applications to other tasks. It is unclear whether state-of-the-art embeddings on semantic textual similarity (STS) can be equally well applied to other tasks like clustering or reranking. This makes progress in the field difficult to track, as various models are constantly being proposed without proper evaluation. To solve this problem, we introduce the Massive Text Embedding Benchmark (MTEB). MTEB spans 8 embedding tasks covering a total of 58 datasets and 112 languages. Through the benchmarking of 33 models on MTEB, we establish the most comprehensive benchmark of text embeddings to date. ...

Downloads: 0 This Week

Last Update: 4 days ago
See Project
25

Tracking Any Point (TAP)

DeepMind model for tracking arbitrary points across videos & robotics

TAPNet is the official Google DeepMind repository for Tracking Any Point (TAP), bundling datasets, models, benchmarks, and demos for precise point tracking in videos. The project includes the TAP-Vid and TAPVid-3D benchmarks, which evaluate long-range tracking of arbitrary points in 2D and 3D across diverse real and synthetic videos. Its flagship models—TAPIR, BootsTAPIR, and the latest TAPNext—use matching plus temporal refinement or next-token style propagation to achieve state-of-the-art...

Downloads: 0 This Week

Last Update: 5 days ago
See Project