Showing 7 open source projects for "cleaning"

View related business solutions
  • Try Google Cloud Risk-Free With $300 in Credit Icon
    Try Google Cloud Risk-Free With $300 in Credit

    No hidden charges. No surprise bills. Cancel anytime.

    Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.
    Start Free
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 1
    NYC Taxi Data

    NYC Taxi Data

    Import public NYC taxi and for-hire vehicle (Uber, Lyft)

    ...It collects and preprocesses large-scale trip datasets (fares, pickup/dropoff, timestamps, locations, passenger counts) to enable data analysis, modeling, and visualization efforts. The project includes scripts and notebooks for cleaning and filtering the raw data, memory-efficient processing for large CSV/Parquet files, and aggregation workflows (e.g. trips per hour, heatmaps of pickups/dropoffs). It also contains example analyses—spatial and temporal visualizations like maps, time-series plots, and hotspot detection—highlighting insights such as patterns of demand, peak times, and geospatial distributions. ...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 2
    The Data Engineering Handbook

    The Data Engineering Handbook

    Links to everything you'd ever want to learn about data engineering

    ...Rather than being a code project itself, it’s a learning handbook that links to books, articles, tutorials, community groups, boot camps, and real-world project examples that collectively form a roadmap to mastering data engineering skills. It includes beginner and intermediate boot camps, interview guides, data cleaning and transformation resources, and curated lists of newsletters and industry communities, making it useful both for self-study and technical interview preparation. The repository is actively maintained and widely starred, reflecting its role as a go-to reference for newcomers and experienced practitioners alike.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    LLM Datasets

    LLM Datasets

    Curated list of datasets and tools for post-training

    LLM Datasets curates and standardizes datasets commonly used to train and fine-tune large language models, reducing the overhead of hunting down sources and normalizing formats. The repository aims to make datasets easy to inspect and transform, with scripts for downloading, deduping, cleaning, and converting to formats like JSONL that slot into training pipelines. It highlights instruction-tuning and conversation-style corpora while also pointing to code, math, or domain-specific sets for targeted capabilities. Quality is a recurring theme: examples and utilities help filter low-value samples, enforce length limits, and split train/validation consistently so results are comparable. ...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 4
    Hotkeys JS

    Hotkeys JS

    A robust Javascript library for capturing keyboard input

    ...Because it has no external dependencies and a small footprint, it drops easily into existing codebases. Its focus on developer ergonomics makes defining, managing, and cleaning up shortcuts straightforward.
    Downloads: 0 This Week
    Last Update:
    See Project
  • AI-generated apps that pass security review Icon
    AI-generated apps that pass security review

    Stop waiting on engineering. Build production-ready internal tools with AI—on your company data, in your cloud.

    Retool lets you generate dashboards, admin panels, and workflows directly on your data. Type something like “Build me a revenue dashboard on my Stripe data” and get a working app with security, permissions, and compliance built in from day one. Whether on our cloud or self-hosted, create the internal software your team needs without compromising enterprise standards or control.
    Try Retool free
  • 5
    Java Tablesaw

    Java Tablesaw

    Java dataframe and visualization library

    Tablesaw is a dataframe and visualization library that supports loading, cleaning, transforming, filtering, and summarizing data. If you work with data in Java, it may save you time and effort. Tablesaw also supports descriptive statistics and can be used to prepare data for working with machine learning libraries like Smile, Tribuo, H20.ai, DL4J. Import data from RDBMS, Excel, CSV, TSV, JSON, HTML, or Fixed Width text files, whether they are local or remote (http, S3, etc.)
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    Git Tips

    Git Tips

    Compact knowledge base of Git command tips and workflows

    git-tips is a compact knowledge base of Git command tips and workflows designed to be quickly searchable and easy to memorize. It favors short, actionable examples that solve common problems like amending commits, cleaning branches, rewriting history, bisecting, stashing, or recovering lost work. Each tip shows the exact command and a brief explanation, reducing time spent digging through manual pages. The collection is useful for daily development, code reviews, and release management where precise Git operations matter. Because entries are concise and independent, you can dip in at any point and learn something useful in a minute or two. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    DataScienceR

    DataScienceR

    a curated list of R tutorials for Data Science, NLP

    The DataScienceR repository is a curated collection of tutorials, sample code, and project templates for learning data science using the R programming language. It includes an assortment of exercises, sample datasets, and instructional code that cover the core steps of a data science project: data ingestion, cleaning, exploratory analysis, modeling, evaluation, and visualization. Many of the modules demonstrate best practices in R, such as using the tidyverse, R Markdown, modular scripting, and reproducible workflows. The repository also shows examples of linking R with external resources — APIs, databases, and file formats — and integrating into larger pipelines. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • Next
MongoDB Logo MongoDB