Showing 153 open source projects for "clustering"

View related business solutions
  • Build Agents and Models on One Platform Icon
    Build Agents and Models on One Platform

    Everything you need to build production-ready agents and models. Access 200+ Google and third-party AI models and tools.

    Gemini Enterprise Agent Platform is Google Cloud's comprehensive platform for developers to build, scale, govern, and optimize agents and models. Choose from Google's most advanced models and third-party models like Anthropic's Claude Model Family.
    Try It Free
  • $300 Free Credits for Your Google Cloud Projects Icon
    $300 Free Credits for Your Google Cloud Projects

    Start building on Google Cloud with $300 in free credits. No commitment, no credit card required until you're ready to scale.

    Launch your next project with $300 in free Google Cloud credits—no strings attached. Test, build, and deploy without risk. Use your credits across the entire Google Cloud platform to find what works best for your needs. After your credits are used, continue with always-free tier services. Only pay when you're ready to scale. Sign up in minutes and start exploring.
    Start Free Trial
  • 1
    HDBSCAN

    HDBSCAN

    A high performance implementation of HDBSCAN clustering

    HDBSCAN - Hierarchical Density-Based Spatial Clustering of Applications with Noise. Performs DBSCAN over varying epsilon values and integrates the result to find a clustering that gives the best stability over epsilon. This allows HDBSCAN to find clusters of varying densities (unlike DBSCAN), and be more robust to parameter selection. In practice this means that HDBSCAN returns a good clustering straight away with little or no parameter tuning -- and the primary parameter, minimum cluster size, is intuitive and easy to select. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 2
    Kubez-ansible

    Kubez-ansible

    To provide quick deployment tools for kubernetes cluster

    To provide quick deployment tools for Kubernetes cluster and cloud-native applications. This session has been tested on Rocky 8.5, Debian 11, and Ubuntu 20.04+ which are supported by python3.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    Scanpy

    Scanpy

    Single-cell analysis in Python

    Scanpy is a scalable toolkit for analyzing single-cell gene expression data built jointly with anndata. It includes preprocessing, visualization, clustering, trajectory inference and differential expression testing. The Python-based implementation efficiently deals with datasets of more than one million cells.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 4
    scikit-learn

    scikit-learn

    Machine learning in Python

    scikit-learn is an open source Python module for machine learning built on NumPy, SciPy and matplotlib. It offers simple and efficient tools for predictive data analysis and is reusable in various contexts.
    Downloads: 14 This Week
    Last Update:
    See Project
  • Stop Cyber Threats with VM-Series Next-Gen Firewall on Azure Icon
    Stop Cyber Threats with VM-Series Next-Gen Firewall on Azure

    Native application identity and user-based security for your Azure cloud

    Gain integrated visibility across all traffic in a single pass. Deploy Palo Alto Networks VM-Series to determine application identity and content while automating security policy updates via rich APIs.
    Get a free trial
  • 5
    sktime

    sktime

    A unified framework for machine learning with time series

    sktime is a library for time series analysis in Python. It provides a unified interface for multiple time series learning tasks. Currently, this includes time series classification, regression, clustering, annotation, and forecasting. It comes with time series algorithms and scikit-learn compatible tools to build, tune and validate time series models. Our objective is to enhance the interoperability and usability of the time series analysis ecosystem in its entirety. sktime provides a unified interface for distinct but related time series learning tasks. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    Machine learning algorithms

    Machine learning algorithms

    Minimal and clean examples of machine learning algorithms

    ...This approach allows learners to study the mathematical and algorithmic details behind widely used models in a transparent and readable way. The repository includes implementations of both supervised and unsupervised learning techniques, along with dimensionality reduction and clustering methods. Many of the algorithms are written in a simplified style that prioritizes clarity and educational value over production-level optimization. Because the code is compact and easy to follow, it is often used as a learning resource by developers who want to understand how machine learning algorithms are constructed.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 7
    model2Vec

    model2Vec

    Fast State-of-the-Art Static Embeddings

    ...By using a distillation-based approach, it can produce lightweight models that run efficiently on CPUs, making it suitable for edge applications and large-scale processing pipelines. The resulting models can be used for a wide range of tasks, including semantic search, clustering, classification, and retrieval-augmented generation systems. One of its key advantages is its simplicity, as it requires minimal dependencies and can generate embeddings extremely quickly compared to traditional transformer-based approaches.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    RAPTOR

    RAPTOR

    The official implementation of RAPTOR

    ...Traditional RAG systems typically retrieve small text chunks independently, which can limit a model’s ability to understand broader document context. RAPTOR addresses this limitation by recursively embedding, clustering, and summarizing documents to create a tree-structured hierarchy of information. Each level of the tree represents summaries at different levels of abstraction, allowing retrieval to operate at both detailed and high-level conceptual layers. During inference, the system can navigate this hierarchical representation to retrieve information that best matches the user’s query while preserving broader contextual understanding. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    Engram

    Engram

    A New Axis of Sparsity for Large Language Models

    ...Engineered with speed and memory efficiency in mind, Engram supports batched indexing, incremental updates, and custom distance metrics so developers can tailor search behaviors to their domain’s needs. In addition to raw similarity search, the project includes tools for clustering, ranking, and filtering results, enabling richer user experiences like “related content”, semantic auto-completion, and contextual filtering.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • 10
    Kubespray

    Kubespray

    Deploy a Production Ready Kubernetes Cluster

    Can be deployed on AWS, GCE, Azure, OpenStack, vSphere, Equinix Metal (bare metal), Oracle Cloud Infrastructure (Experimental), or Baremetal. Highly available cluster. Composable (Choice of the network plugin for instance). Supports most popular Linux distributions. Continuous integration tests. The list of available docker versions is 18.09, 19.03, and 20.10. The recommended docker version is 20.10. The kubelet might break on docker's non-standard version numbering (it no longer uses...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    kg-gen

    kg-gen

    Knowledge Graph Generation from Any Text

    ...Instead of relying on traditional rule-based extraction techniques, KG-Gen uses language models to identify entities and their relationships, producing higher-quality graph structures from raw text. The framework addresses common problems in automatic knowledge graph construction, particularly sparsity and duplication of entities, by applying a clustering and entity-resolution process that merges semantically similar nodes. This allows the generated graphs to be denser, more coherent, and easier to use for downstream tasks such as retrieval-augmented generation, semantic search, and reasoning systems.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 12
    AWS ParallelCluster Node

    AWS ParallelCluster Node

    Python package installed on the Amazon EC2 instances

    aws-parallelcluster-node is the python package installed on the Amazon EC2 instances launched as part of AWS ParallelCluster. AWS ParallelCluster is an AWS-supported Open Source cluster management tool that makes it easy for you to deploy and manage High-Performance Computing (HPC) clusters in the AWS cloud. Built on the Open Source CfnCluster project, AWS ParallelCluster enables you to quickly build an HPC compute environment in AWS. It automatically sets up the required compute resources...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    MicroK8s

    MicroK8s

    Single-package Kubernetes for developers, IoT and edge

    Low-ops, minimal production Kubernetes, for devs, cloud, clusters, workstations, Edge and IoT. MicroK8s automatically chooses the best nodes for the Kubernetes datastore. When you lose a cluster database node, another node is promoted. No admin needed for your bulletproof edge. MicroK8s is small, with sensible defaults that ‘just work’. A quick install, easy upgrades and great security make it perfect for micro clouds and edge computing. As the publishers of MicroK8s, we deliver the world’s...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    Orange Data Mining

    Orange Data Mining

    Orange: Interactive data analysis

    ...Perform simple data analysis with clever data visualization. Explore statistical distributions, box plots and scatter plots, or dive deeper with decision trees, hierarchical clustering, heatmaps, MDS and linear projections. Even your multidimensional data can become sensible in 2D, especially with clever attribute ranking and selections. Interactive data exploration for rapid qualitative analysis with clean visualizations. Graphic user interface allows you to focus on exploratory data analysis instead of coding, while clever defaults make fast prototyping of a data analysis workflow extremely easy. ...
    Downloads: 22 This Week
    Last Update:
    See Project
  • 15
    Homemade Machine Learning

    Homemade Machine Learning

    Python examples of popular machine learning algorithms

    ...Each algorithm is accompanied by mathematical explanations, visualizations (often via Jupyter notebooks), and interactive demos so you can tweak parameters, data, and observe outcomes in real time. The purpose is pedagogical: you’ll see linear regression, logistic regression, k-means clustering, neural nets, decision trees, etc., built in Python using fundamentals like NumPy and Matplotlib, not hidden behind API calls. It is well suited for learners who want to move beyond library usage to understand how algorithms operate internally—how cost functions, gradients, updates and predictions work.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 16
    machine learning tutorials

    machine learning tutorials

    machine learning tutorials (mainly in Python3)

    ...It aims to strike a balance between theoretical explanation and practical coding by demonstrating algorithms both from scratch and using established libraries. The content is organized into multiple sections covering topics such as clustering, regression, dimensionality reduction, recommender systems, and model evaluation.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 17
    Text Embeddings Inference

    Text Embeddings Inference

    High-performance inference server for text embeddings models API layer

    ...It focuses on delivering fast and scalable embedding generation by leveraging optimized inference techniques and modern hardware acceleration. It is built to support transformer-based embedding models, making it suitable for tasks such as semantic search, clustering, and retrieval-augmented systems. It provides an API interface that allows developers to integrate embedding capabilities into applications without managing model internals directly. Text Embeddings Inference is optimized for throughput and low latency, enabling it to handle large volumes of requests reliably. It also emphasizes ease of deployment, often using containerization and configurable runtime options to adapt to different infrastructure setups.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    skfolio

    skfolio

    Python library for portfolio optimization built on top of scikit-learn

    ...By following the familiar scikit-learn API design, the library allows quantitative researchers and developers to apply techniques such as model selection, cross-validation, and hyperparameter tuning to portfolio construction workflows. It supports a wide range of allocation methods, from classical mean-variance optimization to modern techniques that rely on clustering, factor models, and risk-based allocations. The framework also includes tools for evaluating portfolio performance under different market conditions, enabling users to test robustness and reduce the risk of overfitting.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    Qwen3-VL-Embedding

    Qwen3-VL-Embedding

    Multimodal embedding and reranking models built on Qwen3-VL

    Qwen3-VL-Embedding (with its companion Qwen3-VL-Reranker) is a state-of-the-art multimodal embedding and reranking model suite built on the open-sourced Qwen3-VL foundation, developed to handle diverse inputs including text, images, screenshots, and videos. The core embedding model maps such inputs into semantically rich vectors in a unified representation space, enabling similarity search, clustering, and cross-modal retrieval. The reranking model then precisely scores relevance between a given query and candidate documents, enhancing retrieval accuracy in complex multimodal tasks. Together, they support advanced information retrieval workflows such as image-text search, visual question answering (VQA), and video-text matching, while providing out-of-the-box support for more than 30 languages.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    ML for Beginners

    ML for Beginners

    12 weeks, 26 lessons, 52 quizzes, classic Machine Learning for all

    ML-For-Beginners is a structured, project-driven curriculum that teaches foundational machine learning concepts with approachable math and lots of code. Organized as a multi-week course, it mixes short lectures with labs in notebooks so learners practice regression, classification, clustering, and recommendation techniques on real datasets. Each lesson aims to connect the algorithm to a relatable scenario, reinforcing intuition before diving into parameters, metrics, and trade-offs. The repository includes quizzes, solutions, and instructor materials to make the content usable in classrooms or self-study. It emphasizes ethical considerations and model evaluation—accuracy is not the only metric—so students learn to validate and communicate results responsibly. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    HyperTools

    HyperTools

    A Python toolbox for gaining geometric insights

    ...Functions for plotting high-dimensional datasets in 2/3D. Static and animated plots. Simple API for customizing plot styles. Set of powerful data manipulation tools including hyperalignment, k-means clustering, normalizing and more. Support for lists of Numpy arrays, Pandas dataframes, text or (mixed) lists. Applying topic models and other text vectorization methods to text data. HyperTools is designed to facilitate dimensionality reduction-based visual explorations of high-dimensional data. The basic pipeline is to feed in a high-dimensional dataset (or a series of high-dimensional datasets) and, in a single function call, reduce the dimensionality of the dataset(s) and create a plot.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    LOTUS

    LOTUS

    AI-Powered Data Processing: Use LOTUS to process all of your datasets

    ...The core concept of the framework is the use of semantic operators, which extend traditional relational database operations to support reasoning over text and other unstructured data. These operators allow tasks such as semantic filtering, ranking, clustering, and summarization to be expressed directly within data processing pipelines. The LOTUS engine automatically optimizes how language models are used during execution, which can significantly improve performance and reduce computational cost.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 23
    AiLearning-Theory-Applying

    AiLearning-Theory-Applying

    Quickly get started with AI theory and practical applications

    ...It includes well-commented notebooks, datasets, and implementation examples that allow learners to reproduce experiments and understand the inner workings of various algorithms. The project also introduces important concepts such as probability theory, linear algebra, regression models, clustering methods, and neural network architectures. Advanced sections explore modern AI topics including transformers, BERT-based natural language processing systems, and practical competition-style machine learning workflows.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    MTEB

    MTEB

    MTEB: Massive Text Embedding Benchmark

    Text embeddings are commonly evaluated on a small set of datasets from a single task not covering their possible applications to other tasks. It is unclear whether state-of-the-art embeddings on semantic textual similarity (STS) can be equally well applied to other tasks like clustering or reranking. This makes progress in the field difficult to track, as various models are constantly being proposed without proper evaluation. To solve this problem, we introduce the Massive Text Embedding Benchmark (MTEB). MTEB spans 8 embedding tasks covering a total of 58 datasets and 112 languages. Through the benchmarking of 33 models on MTEB, we establish the most comprehensive benchmark of text embeddings to date. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    Tracking Any Point (TAP)

    Tracking Any Point (TAP)

    DeepMind model for tracking arbitrary points across videos & robotics

    TAPNet is the official Google DeepMind repository for Tracking Any Point (TAP), bundling datasets, models, benchmarks, and demos for precise point tracking in videos. The project includes the TAP-Vid and TAPVid-3D benchmarks, which evaluate long-range tracking of arbitrary points in 2D and 3D across diverse real and synthetic videos. Its flagship models—TAPIR, BootsTAPIR, and the latest TAPNext—use matching plus temporal refinement or next-token style propagation to achieve state-of-the-art...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • 5
  • Next
Auth0 Logo