Showing 13 open source projects for "cleaning"

View related business solutions
  • $300 in Free Credit Towards Top Cloud Services Icon
    $300 in Free Credit Towards Top Cloud Services

    Build VMs, containers, AI, databases, storage—all in one place.

    Start your project in minutes. After credits run out, 20+ products include free monthly usage. Only pay when you're ready to scale.
    Get Started
  • Gemini 3 and 200+ AI Models on One Platform Icon
    Gemini 3 and 200+ AI Models on One Platform

    Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

    Build generative AI apps with Vertex AI. Switch between models without switching platforms.
    Start Free
  • 1
    janitor

    janitor

    Simple tools for data cleaning in R

    janitor provides simple, convenient tools for data cleaning, formatting, and exploration in R. It is especially useful for cleaning messy data frames, removing duplicates, formatting column names, and producing frequency tables in a tidy workflow.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 2
    litlyx

    litlyx

    Analytics for developers, setup Analytics in 30 seconds

    The easiest, developer-centric analytics tool. Litlyxis an open-source, self-hostable analytics solution for the modern framework. Litlyx offers a unique eyewear cleaning system that includes a special cleaning solution and reusable microfiber swabs. This system is designed to provide a more thorough and eco-friendly way to clean glasses, lenses, and screens. The brand emphasizes sustainability by reducing single-use plastics and promoting long-term use of their products. Their cleaning kit is compact, portable, and designed to be effective for everyday use, ensuring that users can maintain clear vision without the hassle of disposable wipes or sprays.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    AI Data Science Team

    AI Data Science Team

    An AI-powered data science team of agents

    AI Data Science Team is a Python library and agent ecosystem designed to accelerate and automate common data science workflows by modeling them as specialized AI “agents” that can be orchestrated to perform tasks like data cleaning, transformation, analysis, visualization, and machine learning. It provides a modular agent framework where each agent focuses on a step in the typical data science pipeline — for example, loading data from CSV/Excel files, cleaning and wrangling messy datasets, engineering predictive features, building models with AutoML, connecting to SQL databases, and producing visual outputs — all driven by natural language or programmatic instructions. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    nb-clean

    nb-clean

    Clean Jupyter notebooks of outputs, metadata, and empty cells

    ...Note that the Git filter and pre-commit hook work differently, with different effects on your working directory. The pre-commit hook operates on the notebook on disk, cleaning the copy in your working directory. The Git filter cleans notebooks as they are added to the index, leaving the copy in your working directory dirty.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Fully Managed MySQL, PostgreSQL, and SQL Server Icon
    Fully Managed MySQL, PostgreSQL, and SQL Server

    Automatic backups, patching, replication, and failover. Focus on your app, not your database.

    Cloud SQL handles your database ops end to end, so you can focus on your app.
    Try Free
  • 5
    CSV Lint

    CSV Lint

    CSV Lint plug-in for Notepad++ for syntax highlighting

    CSV Lint plug-in for Notepad++ for syntax highlighting, csv validation, automatic column and datatype detecting fixed width datasets, change datetime format, decimal separator, sort data, count unique values, convert to xml, json, sql etc. A plugin for data cleaning and working with messy data files. Use CSV Lint for metadata discovery, technical data validation, and reformatting on tabular data files. It is not meant to be a replacement for spreadsheet programs like Excel or SPSS, but rather it's a quality control tool to examine, verify or polish up a dataset before further processing.
    Downloads: 21 This Week
    Last Update:
    See Project
  • 6
    Automated Tool for Optimized Modelling

    Automated Tool for Optimized Modelling

    Automated Tool for Optimized Modelling

    During the exploration phase of a machine learning project, a data scientist tries to find the optimal pipeline for his specific use case. This usually involves applying standard data cleaning steps, creating or selecting useful features, trying out different models, etc. Testing multiple pipelines requires many lines of code, and writing it all in the same notebook often makes it long and cluttered. On the other hand, using multiple notebooks makes it harder to compare the results and to keep an overview. On top of that, refactoring the code for every test can be quite time-consuming. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    Java Tablesaw

    Java Tablesaw

    Java dataframe and visualization library

    Tablesaw is a dataframe and visualization library that supports loading, cleaning, transforming, filtering, and summarizing data. If you work with data in Java, it may save you time and effort. Tablesaw also supports descriptive statistics and can be used to prepare data for working with machine learning libraries like Smile, Tribuo, H20.ai, DL4J. Import data from RDBMS, Excel, CSV, TSV, JSON, HTML, or Fixed Width text files, whether they are local or remote (http, S3, etc.)
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    text-dedup

    text-dedup

    All-in-one text de-duplication

    ...This is especially useful for NLP tasks where duplicated training data can skew model performance. text-dedup scales to billions of documents and offers tools for chunking, hashing, and comparing text efficiently with low memory usage. It supports Jaccard similarity thresholding, parallel execution, and flexible deduplication strategies, making it ideal for cleaning web-scraped data, language model training datasets, or document archives.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    DISTOD

    DISTOD

    Distributed discovery of bidirectional order dependencies

    ...They can express that, for example, sorting books by publication date in ascending order also sorts them by age in descending order. The knowledge about order relationships is useful for many data management tasks, such as query optimization, data cleaning, or consistency checking. Because the bODs of a specific dataset are usually not explicitly given, they need to be discovered. The discovery of all minimal bODs (in set-based canonical form) is a task with exponential complexity in the number of attributes.
    Downloads: 0 This Week
    Last Update:
    See Project
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 10

    OGLDataScienceTool

    Opengl tool for data science visualization

    Data visualization tool written in LWJGL Compatible with libgdx and other opengl wrappers The project depends on apache poi, and apache commons, for office files support Planned features for next release: * reading json, and other nosql data structures * jdbc connection for creating dataframes * data heatmaps, and additional plots for questions, contact me kumar.santhi1982@hotmail.com more details: http://www.java-gaming.org/topics/ds/41920/view.html http://datascienceforindia.com/
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    fooltrader

    fooltrader

    Quant framework for stock

    Build a standard data schema, and then implement various connectors to import systems you are familiar with for analysis. fooltrader is a quantitative analysis trading system designed using big data technology, including data capture, cleaning, structuring, calculation, display, backtesting and trading. Its goal is to provide a unified framework for the whole market (stock, futures, bonds, foreign exchange, digital currency, macroeconomics, etc.) for research, backtesting, forecasting, and trading. Its applicable objects include quantitative traders, teachers, and students majoring in finance, people interested in economic data, programmers, and people who like freedom and the spirit of exploration. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    Data Science Specialization

    Data Science Specialization

    Course materials for the Data Science Specialization on Coursera

    ...The repository is designed as a shared space for code examples, datasets, and instructional materials, helping learners follow along with lectures and assignments. It spans essential topics such as R programming, data cleaning, exploratory data analysis, statistical inference, regression models, machine learning, and practical data science projects. By providing centralized resources, the repo makes it easier for students to practice concepts and replicate examples from the curriculum. It also offers a structured view of how multiple disciplines—programming, statistics, and applied data analysis—come together in a professional workflow.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 13
    Datacleaning Open Source
    A group a subprojects for Data Cleaning projects, mainly as a step of a Data Mining Project. Visit www.datacleaningopensource.com to review our current applications or if you want to add yours. NOTE: PROGRAMMING SKILLS ARE REQUIRED.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • Next
MongoDB Logo MongoDB