Showing 444 open source projects for "data"

View related business solutions
  • Cut Data Warehouse Costs up to 54% with BigQuery Icon
    Cut Data Warehouse Costs up to 54% with BigQuery

    Migrate from Snowflake, Databricks, or Redshift with free migration tools. Exabyte scale without the Exabyte price.

    BigQuery delivers up to 54% lower TCO than cloud alternatives. Migrate from legacy or competing warehouses using free BigQuery Migration Service with automated SQL translation. Get serverless scale with no infrastructure to manage, compressed storage, and flexible pricing—pay per query or commit for deeper discounts. New customers get $300 in free credit.
    Try BigQuery Free
  • Managed MySQL, PostgreSQL, and SQL Databases on Google Cloud Icon
    Managed MySQL, PostgreSQL, and SQL Databases on Google Cloud

    Get back to your application and leave the database to us. Cloud SQL automatically handles backups, replication, and scaling.

    Cloud SQL is a fully managed relational database for MySQL, PostgreSQL, and SQL Server. We handle patching, backups, replication, encryption, and failover—so you can focus on your app. Migrate from on-prem or other clouds with free Database Migration Service. IDC found customers achieved 246% ROI. New customers get $300 in credits plus a 30-day free trial.
    Try Cloud SQL Free
  • 1
    OCRmyPDF

    OCRmyPDF

    OCRmyPDF adds an OCR text layer to scanned PDF files

    OCRmyPDF adds an optical character recognition (OCR) text layer to scanned PDF files, allowing them to be searched. PDF is the best format for storing and exchanging scanned documents. Unfortunately, PDFs can be difficult to modify. OCRmyPDF makes it easy to apply image processing and OCR (recognized, searchable text) to existing PDFs.
    Downloads: 97 This Week
    Last Update:
    See Project
  • 2
    TOML

    TOML

    Tom Preston-Werner's obvious, minimal language

    ...TOML aims to be a minimal configuration file format that's easy to read due to obvious semantics. TOML is designed to map unambiguously to a hash table. TOML should be easy to parse into data structures in a wide variety of languages. TOML shares traits with other file formats used for application configuration and data serialization, such as YAML and JSON. TOML and JSON both are simple and use ubiquitous data types, making them easy to code for or parse with machines. TOML and YAML both emphasize human readability features, like comments that make it easier to understand the purpose of a given line. ...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 3
    lxml

    lxml

    The lxml XML toolkit for Python

    A Python library for efficient XML and HTML processing, known for speed and compatibility. The lxml XML toolkit is a Pythonic binding for the C libraries libxml2 and libxslt. It is unique in that it combines the speed and XML feature completeness of these libraries with the simplicity of a native Python API, mostly compatible but superior to the well-known ElementTree API. The latest release works with all CPython versions from 3.6 to 3.12. See the introduction for more information about the...
    Downloads: 27 This Week
    Last Update:
    See Project
  • 4
    Great Expectations

    Great Expectations

    Always know what to expect from your data

    Great Expectations helps data teams eliminate pipeline debt, through data testing, documentation, and profiling. Software developers have long known that testing and documentation are essential for managing complex codebases. Great Expectations brings the same confidence, integrity, and acceleration to data science and data engineering teams. Expectations are assertions for data.
    Downloads: 1 This Week
    Last Update:
    See Project
  • Ship AI Apps Faster with Vertex AI Icon
    Ship AI Apps Faster with Vertex AI

    Go from idea to deployed AI app without managing infrastructure. Vertex AI offers one platform for the entire AI development lifecycle.

    Ship AI apps and features faster with Vertex AI—your end-to-end AI platform. Access Gemini 3 and 200+ foundation models, fine-tune for your needs, and deploy with enterprise-grade MLOps. Build chatbots, agents, or custom models. New customers get $300 in free credit.
    Try Vertex AI Free
  • 5
    Pandas Profiling

    Pandas Profiling

    Create HTML profiling reports from pandas DataFrame objects

    pandas-profiling generates profile reports from a pandas DataFrame. The pandas df.describe() function is handy yet a little basic for exploratory data analysis. pandas-profiling extends pandas DataFrame with df.profile_report(), which automatically generates a standardized univariate and multivariate report for data understanding. High correlation warnings, based on different correlation metrics (Spearman, Pearson, Kendall, Cramér’s V, Phik). Most common categories (uppercase, lowercase, separator), scripts (Latin, Cyrillic) and blocks (ASCII, Cyrilic). ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 6
    OSCAL

    OSCAL

    Open Security Controls Assessment Language (OSCAL)

    ...Public contributions to this project are welcome. With this effort, we are stressing the agile development of a set of minimal formats that are generic enough to capture the breadth of data in scope (controls specifications), while also capable of ad-hoc tuning and extension to support peculiarities of both (industry or sector) standards and new control types. The OSCAL website provides an overview of the OSCAL project, including an XML and JSON schema reference, examples, and other resources.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 7
    Unredact

    Unredact

    A simple tool for reading in poorly redacted documents

    Unredact is a specialized tool that attempts to reconstruct redacted or obscured text in images, PDFs, or screenshots using a combination of image processing and generative AI inference to suggest plausible completions of blurred, black-boxed, or jumbled content. Unlike traditional optical character recognition (OCR), which only reads visible text, Unredact focuses on inferring missing content where redaction has been applied by analyzing surrounding context, font characteristics, and...
    Downloads: 27 This Week
    Last Update:
    See Project
  • 8
    RenderCV

    RenderCV

    LaTeX CV generator from a YAML/JSON input file

    RenderCV is a LaTeX CV/resume framework. It allows you to create a high-quality CV as a PDF from a YAML file with full Markdown syntax support and complete control over the LaTeX code. RenderCV offers built-in LaTeX and Markdown templates ready to produce high-quality CVs. However, the templates are entirely arbitrary and can easily be updated to leverage RenderCV's capabilities with your custom CV themes.
    Downloads: 17 This Week
    Last Update:
    See Project
  • 9
    TikZ

    TikZ

    TikZ figures for concepts in physics/chemistry/ML

    Collection of 111 standalone TikZ figures for illustrating concepts in physics, chemistry, and machine learning. Check out janosh.github.io to search, sort, open in Overleaf, and download figures (PDF/SVG/PNG) from this collection.
    Downloads: 9 This Week
    Last Update:
    See Project
  • Build on Google Cloud with $300 in Free Credit Icon
    Build on Google Cloud with $300 in Free Credit

    New to Google Cloud? Get $300 in free credit to explore Compute Engine, BigQuery, Cloud Run, Vertex AI, and 150+ other products.

    Start your next project with $300 in free Google Cloud credit. Spin up VMs, run containers, query exabytes in BigQuery, or build AI apps with Vertex AI and Gemini. Once your credits are used, keep building with 20+ products with free monthly usage, including Compute Engine, Cloud Storage, GKE, and Cloud Run functions. Sign up to start building right away.
    Start Free Trial
  • 10
    FreeTAKServer

    FreeTAKServer

    Situational Awareness Server compatible with TAK clients

    FTS is a Python3 implementation of a TAK Server for devices like ATAK, WinTAK, and ITAK, it is cross-platform and runs from a multi-node installation on AWS down to the Android edition. It's free and open source (released under the Eclipse Public License. FTS allows you to connect ATAK clients to share geo-information, to chat with all the connected clients, exchange files and more. It intends to support all the major use cases of the original TAK server.
    Downloads: 7 This Week
    Last Update:
    See Project
  • 11
    Pix2Text

    Pix2Text

    Open-Source Python3 tool for recognizing layouts, tables, and math

    An Open-Source Python3 tool for recognizing layouts, tables, math formulas, and text in images, converting them into Markdown format. A free alternative to Mathpix, empowering seamless conversion of visual content into text-based representations. 80+ languages are supported. Pix2Text (P2T) aims to be a free and open-source Python alternative to Mathpix, and it can already accomplish Mathpix's core functionality. Pix2Text (P2T) can recognize layouts, tables, images, text, and mathematical...
    Downloads: 16 This Week
    Last Update:
    See Project
  • 12
    wikmd

    wikmd

    A file based wiki that uses markdown

    It’s a file-based wiki that aims to simplicity. Instead of storing the data in a database I chose to have a file-based system. The advantage of this system is that every file is directly readable inside a terminal etc. Also when you have direct access to the system you can export the files to anything you like.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 13
    pyserde

    pyserde

    Yet another serialization library on top of dataclasses

    Yet another serialization library on top of data classes, inspired by serde-rs. Declare a class with pyserde's @serde decorator.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 14
    Nano PDF Editor

    Nano PDF Editor

    Edit PDF files with Nano Banana

    Nano PDF Editor is a minimalist, portable PDF viewer and toolkit that focuses on simplicity, speed, and ease of integration for applications that need basic PDF rendering without heavy dependencies. It provides core functionality such as page navigation, zooming, text selection, and rendering directly to native graphics surfaces, making it suitable for lightweight PDF viewing scenarios on desktop or embedded platforms. Designed to be easily embedded into larger software projects, Nano-PDF...
    Downloads: 13 This Week
    Last Update:
    See Project
  • 15
    DocArray

    DocArray

    The data structure for multimodal data

    DocArray is a library for nested, unstructured, multimodal data in transit, including text, image, audio, video, 3D mesh, etc. It allows deep-learning engineers to efficiently process, embed, search, recommend, store, and transfer multimodal data with a Pythonic API. Door to multimodal world: super-expressive data structure for representing complicated/mixed/nested text, image, video, audio, 3D mesh data.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    TexText

    TexText

    Re-editable LaTeX/ typst graphics for Inkscape

    Re-editable LaTeX and typst graphics for Inkscape. TexText is a Python extension for the vector graphics editor Inkscape providing the possibility to add and re-edit LaTeX and typst generated SVG elements to your drawing.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 17
    Canmatrix

    Canmatrix

    Converting Can (Controller Area Network) Database Formats

    Canmatrix is a Python package to read and write several CAN (Controller Area Network) database formats. Canmatrix implements a "Python Can Matrix Object" which describes the can-communication and the needed objects (Board units, Frames, Signals, Values, ...) Canmatrix also includes two Tools (can convert and can compare) for converting and comparing CAN databases.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 18
    pdfly

    pdfly

    CLI tool to extract (meta)data from PDF and manipulate PDF files

    A Python library designed for manipulating PDF files with functionalities for extraction, transformation, and document generation.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 19
    Awkward Array

    Awkward Array

    Manipulate JSON-like data with NumPy-like idioms

    Awkward Array is a library for nested, variable-sized data, including arbitrary-length lists, records, mixed types, and missing data, using NumPy-like idioms. Arrays are dynamically typed, but operations on them are compiled and fast. Their behavior coincides with NumPy when array dimensions are regular and generalizes when they're not.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    Mashumaro

    Mashumaro

    Fast and well tested serialization library on top of dataclasses

    When using data classes, you often need to dump and load objects based on the schema you have. Mashumaro not only lets you save and load things in different ways, but it also does it super quickly.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 21
    py-pdf-parser

    py-pdf-parser

    A Python tool to help extracting information from structured PDFs

    py-pdf-parser is a Python tool designed to help extract information from structured PDFs. It provides a simple interface to define parsing rules and extract data from PDF documents. ​
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    Texify

    Texify

    Math OCR model that outputs LaTeX and markdown

    Texify is an OCR model that converts images or pdfs containing math into markdown and LaTeX that can be rendered by MathJax ($$ and $ are delimiters). It can run on CPU, GPU, or MPS.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 23
    granary

    granary

    The social web translator

    The social web translator. Fetches and converts data between social networks, HTML and JSON with microformats2, ActivityStreams/ActivityPub, Atom, JSON Feed, and more. Granary is a library and REST API that fetches and converts between a wide variety of social data sources and formats. Free yourself from silo API chaff and expose the sweet social data foodstuff inside in standard formats and protocols.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    jello

    jello

    CLI tool to filter JSON and JSON Lines data with Python syntax

    ...Processed data can be output as JSON, JSON Lines, bash array lines, or a grep-able schema.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    WriteTeX

    WriteTeX

    An Inkscape extension: Latex/Tex editor for Inkscape

    Due to an incompatible change of the Inkscape extension API, this extension has to split into two versions. For Inkscape versions lower than 1.0, users should use the files in the 0.9.x folder, the other users should use files in the 1.0.x folder.
    Downloads: 2 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • 5
  • Next
MongoDB Logo MongoDB