Showing 184 open source projects for "data"

View related business solutions
  • Build AI Apps with Gemini 3 on Vertex AI Icon
    Build AI Apps with Gemini 3 on Vertex AI

    Access Google’s most capable multimodal models. Train, test, and deploy AI with 200+ foundation models on one platform.

    Vertex AI gives developers access to Gemini 3—Google’s most advanced reasoning and coding model—plus 200+ foundation models including Claude, Llama, and Gemma. Build generative AI apps with Vertex AI Studio, customize with fine-tuning, and deploy to production with enterprise-grade MLOps. New customers get $300 in free credits.
    Try Vertex AI Free
  • Managed MySQL, PostgreSQL, and SQL Databases on Google Cloud Icon
    Managed MySQL, PostgreSQL, and SQL Databases on Google Cloud

    Get back to your application and leave the database to us. Cloud SQL automatically handles backups, replication, and scaling.

    Cloud SQL is a fully managed relational database for MySQL, PostgreSQL, and SQL Server. We handle patching, backups, replication, encryption, and failover—so you can focus on your app. Migrate from on-prem or other clouds with free Database Migration Service. IDC found customers achieved 246% ROI. New customers get $300 in credits plus a 30-day free trial.
    Try Cloud SQL Free
  • 1
    The Data Engineering Handbook

    The Data Engineering Handbook

    Links to everything you'd ever want to learn about data engineering

    The Data Engineering Handbook is a comprehensive, community-curated repository that aggregates essential learning resources for anyone interested in becoming a professional data engineer. Rather than being a code project itself, it’s a learning handbook that links to books, articles, tutorials, community groups, boot camps, and real-world project examples that collectively form a roadmap to mastering data engineering skills.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 2
    Mimesis

    Mimesis

    High-performance fake data generator for Python

    Mimesis is an open source high-performance fake data generator for Python, able to provide data for various purposes in various languages. It's currently the fastest fake data generator for Python, and supports many different data providers that can produce data related to people, food, transportation, internet and many more. Mimesis is really easy to use, with everything you need just an import away.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 3
    lxml

    lxml

    The lxml XML toolkit for Python

    A Python library for efficient XML and HTML processing, known for speed and compatibility. The lxml XML toolkit is a Pythonic binding for the C libraries libxml2 and libxslt. It is unique in that it combines the speed and XML feature completeness of these libraries with the simplicity of a native Python API, mostly compatible but superior to the well-known ElementTree API. The latest release works with all CPython versions from 3.6 to 3.12. See the introduction for more information about the...
    Downloads: 27 This Week
    Last Update:
    See Project
  • 4
    Matplotlib

    Matplotlib

    matplotlib: plotting with Python

    Matplotlib is a comprehensive library for creating static, animated, and interactive visualizations in Python. Matplotlib makes easy things easy and hard things possible. Matplotlib ships with several add-on toolkits, including 3D plotting with mplot3d, axes helpers in axes_grid1 and axis helpers in axisartist. A large number of third party packages extend and build on Matplotlib functionality, including several higher-level plotting interfaces (seaborn, HoloViews, ggplot, ...), and a...
    Downloads: 11 This Week
    Last Update:
    See Project
  • Easily Host LLMs and Web Apps on Cloud Run Icon
    Easily Host LLMs and Web Apps on Cloud Run

    Run everything from popular models with on-demand NVIDIA L4 GPUs to web apps without infrastructure management.

    Run frontend and backend services, batch jobs, host LLMs, and queue processing workloads without the need to manage infrastructure. Cloud Run gives you on-demand GPU access for hosting LLMs and running real-time AI—with 5-second cold starts and automatic scale-to-zero so you only pay for actual usage. New customers get $300 in free credit to start.
    Try Cloud Run Free
  • 5
    Union Pandera

    Union Pandera

    Light-weight, flexible, expressive statistical data testing library

    The open-source framework for precision data testing for data scientists and ML engineers. Pandera provides a simple, flexible, and extensible data-testing framework for validating not only your data but also the functions that produce them. A simple, zero-configuration data testing framework for data scientists and ML engineers seeking correctness. Access a comprehensive suite of built-in tests, or easily create your own validation rules for your specific use cases. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    Awesome Fraud Detection Research Papers

    Awesome Fraud Detection Research Papers

    A curated list of data mining papers about fraud detection

    A curated list of data mining papers about fraud detection from several conferences.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    hosts

    hosts

    Consolidate and extend hosts files from several well-curated sources

    ...Currently, we offer the following categories: fakenews, social, gambling, and porn. Extensions are optional, and can be combined in various ways with the base hosts file. The combined products are stored in the alternates folder. Data for extensions are stored in the extensions folder. You manage extensions by curating this folder tree, where you will find the data for fakenews, social, gambling, and porn extension data that we maintain and provide for you. Create an optional blacklist file. The contents of this file (containing a listing of additional domains in hosts file format) are appended to the unified hosts file during the update process. ...
    Downloads: 14 This Week
    Last Update:
    See Project
  • 8
    Parsera

    Parsera

    Lightweight library for scraping web-sites with LLMs

    Scrape data from any website with only a link and column descriptions. Parsera is a tool designed to scrape web content, specifically handling poorly structured or messy websites.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 9
    SENAITE LIMS

    SENAITE LIMS

    SENAITE Meta Package

    ...Therefore, it reflects nicely the complexity of the LIMS, while providing a modern, intuitive, and friendly UI/ UX. Amongst other functionalities, SENAITE comes with highly-customizable workflows to drive users through the analytical process, easy-to-use UI for data registration, automatic import of results, data validation, and transition constraints. SENAITE can be easily integrated with instruments by using off-the-shell interfaces for data import and export. Custom interfacing is supported too. Import instrument results and avoid human errors in the carrying-over process. Reduce the turnaround time on results report delivery. ...
    Downloads: 3 This Week
    Last Update:
    See Project
  • Build on Google Cloud with $300 in Free Credit Icon
    Build on Google Cloud with $300 in Free Credit

    New to Google Cloud? Get $300 in free credit to explore Compute Engine, BigQuery, Cloud Run, Vertex AI, and 150+ other products.

    Start your next project with $300 in free Google Cloud credit. Spin up VMs, run containers, query exabytes in BigQuery, or build AI apps with Vertex AI and Gemini. Once your credits are used, keep building with 20+ products with free monthly usage, including Compute Engine, Cloud Storage, GKE, and Cloud Run functions. Sign up to start building right away.
    Start Free Trial
  • 10
    redis-py

    redis-py

    Redis Python client

    redis-py is the official Python client for interacting with Redis, the in-memory data structure store. It supports all Redis commands and data types, making it easy to build caching, messaging, or real-time analytics features in Python applications. With both synchronous and asyncio support, redis-py is suited for modern Python projects and integrates smoothly into web frameworks, task queues, and backend services.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 11
    django-import-export

    django-import-export

    Django application and library for importing and exporting data

    ...Also, the report_skipped option controls whether skipped records appear in the import Result object, and if using the admin whether skipped records will show in the import preview page. Not all data can be easily extracted from an object/model attribute. In order to turn complicated data model into a (generally simpler) processed data structure on export, dehydrate_<fieldname> method should be defined.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    Graphene

    Graphene

    GraphQL in Python Made Easy

    Graphene is a Python library for building GraphQL APIs fast and easily, using a code-first approach. Instead of writing GraphQL Schema Definition Langauge (SDL), Python code is written to describe the data provided by your server. Graphene helps you use GraphQL effortlessly in Python, but what is GraphQL? GraphQL is a data query language developed internally by Facebook as an alternative to REST and ad-hoc webservice architectures. With Graphene you have all the tools you need to implement a GraphQL API in Python, with multiple integrations with different frameworks including Django, SQLAlchemy and Google App Engine.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 13
    PyTorch Geometric

    PyTorch Geometric

    Geometric deep learning extension library for PyTorch

    It consists of various methods for deep learning on graphs and other irregular structures, also known as geometric deep learning, from a variety of published papers. In addition, it consists of an easy-to-use mini-batch loader for many small and single giant graphs, a large number of common benchmark datasets (based on simple interfaces to create your own), and helpful transforms, both for learning on arbitrary graphs as well as on 3D meshes or point clouds. We have outsourced a lot of...
    Downloads: 12 This Week
    Last Update:
    See Project
  • 14
    The Reactive Extensions for Python

    The Reactive Extensions for Python

    Reactive extensions for Python

    ...Reactive Extensions for Python (RxPY) is a set of libraries for composing asynchronous and event-based programs using observable sequences and pipable query operators in Python. Using Rx, developers represent asynchronous data streams with Observables, query asynchronous data streams using operators, and parameterize concurrency in data/event streams using Schedulers. RxPY is a fairly complete implementation of Rx with more than 120 operators, and over 1300 passing unit-tests. RxPY is mostly a direct port of RxJS, but also borrows a bit from RxNET and RxJava in terms of threading and blocking operators.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    Pydantic-Core

    Pydantic-Core

    Core validation logic for pydantic written in rust

    pydantic-core is the Rust-based core validation logic for Pydantic, a widely used data validation library in Python. It offers significant performance improvements over its predecessor, enabling faster and more efficient data parsing and validation.​
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    Tree

    Tree

    tree is a library for working with nested data structures

    Tree (dm-tree) is a lightweight Python library developed by Google DeepMind for manipulating nested data structures (also called pytrees). It generalizes Python’s built-in map function to operate over arbitrarily nested collections — including lists, tuples, dicts, and custom container types — while preserving their structure. This makes it particularly useful in machine learning pipelines and JAX-based workflows, where complex parameter trees or hierarchical state representations are common. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 17
    Darts

    Darts

    A python library for easy manipulation and forecasting of time series

    ...The models can all be used in the same way, using fit() and predict() functions, similar to scikit-learn. The library also makes it easy to backtest models, combine the predictions of several models, and take external data into account. Darts supports both univariate and multivariate time series and models. The ML-based models can be trained on potentially large datasets containing multiple time series, and some of the models offer a rich support for probabilistic forecasting. We recommend to first setup a clean Python environment for your project with at least Python 3.7 using your favorite tool (conda, venv, virtualenv with or without virtualenvwrapper).
    Downloads: 2 This Week
    Last Update:
    See Project
  • 18
    Recommenders 2023

    Recommenders 2023

    Best Practices on Recommendation Systems

    Recommenders objective is to assist researchers, developers and enthusiasts in prototyping, experimenting with and bringing to production a range of classic and state-of-the-art recommendation systems. Recommenders is a project under the Linux Foundation of AI and Data.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 19
    Lightly

    Lightly

    A python library for self-supervised learning on images

    ...We, at Lightly, are passionate engineers who want to make deep learning more efficient. That's why - together with our community - we want to popularize the use of self-supervised methods to understand and curate raw image data. Our solution can be applied before any data annotation step and the learned representations can be used to visualize and analyze datasets. This allows selecting the best core set of samples for model training through advanced filtering. We provide PyTorch, PyTorch Lightning and PyTorch Lightning distributed examples for each of the models to kickstart your project. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 20
    Python-Spider

    Python-Spider

    Python3 web crawler practice

    Python-Spider is a repository intended to teach or provide examples for writing web spiders / crawlers in Python — part of a broader learning and resource collection by its author. The code and documentation are oriented toward beginners or intermediate learners who want to learn how to fetch, parse, and extract data from websites programmatically. As part of the author’s public learning-path repositories, python-spider likely includes examples of HTTP requests, HTML parsing, maybe concurrency or scheduling to crawl multiple pages, and techniques to handle common web-scraping issues. For people wanting to get hands-on with building scrapers, collecting data, or learning how to navigate web programming in Python, this repository acts as a didactic reference or starting point. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    Public APIs

    Public APIs

    A collective list of free APIs

    public-apis is a collaboratively maintained repository that provides an extensive, categorized list of publicly available APIs for developers. Curated by community contributors and the team at APILayer, it serves as a centralized resource for discovering APIs across a wide range of domains, including data, machine learning, weather, entertainment, and finance. The project aims to make API exploration and integration more accessible by offering a single, organized index of open and free-to-use APIs. Developers can leverage this list to enhance their products, prototypes, or research projects without the need to build data sources from scratch. ...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 22
    Werkzeug

    Werkzeug

    The comprehensive WSGI web application library

    ...Includes an interactive debugger that allows inspecting stack traces and source code in the browser with an interactive interpreter for any frame in the stack. Includes a full-featured request object with objects to interact with headers, query args, form data, files, and cookies. Includes a response object that can wrap other WSGI applications and handle streaming data. Includes a routing system for matching URLs to endpoints and generating URLs for endpoints, with an extensible system for capturing variables from URLs. Includes HTTP utilities to handle entity tags, cache control, dates, user agents, cookies, files, and more.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 23
    Stock prediction deep neural learning

    Stock prediction deep neural learning

    Predicting stock prices using a TensorFlow LSTM

    Predicting stock prices can be a challenging task as it often does not follow any specific pattern. However, deep neural learning can be used to identify patterns through machine learning. One of the most effective techniques for series forecasting is using LSTM (long short-term memory) networks, which are a type of recurrent neural network (RNN) capable of remembering information over a long period of time. This makes them extremely useful for predicting stock prices. Predicting stock...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 24
    JC

    JC

    CLI tool and python library

    ...The JC parsers can also be used as python modules. In this case, the output will be a python dictionary, or a list of dictionaries, instead of JSON. Two representations of the data are available. The default representation uses a strict schema per parser and converts known numbers to int/float JSON values. Certain known values of None are converted to JSON null, known boolean values are converted, and, in some cases, additional semantic context fields are added.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    Plaso

    Plaso

    Super timeline all the things

    Plaso (Plaso Langar Að Safna Öllu), or "super timeline all the things," is a Python-based engine designed for automatic creation of timelines in digital forensic investigations. It processes various log files and artifacts to generate a chronological sequence of events, aiding analysts in understanding system activities.​
    Downloads: 7 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • 5
  • Next
MongoDB Logo MongoDB