Join/Login
Business Software
Open Source Software
For Vendors
Blog
About
More

For Vendors Help Create Join Login

Business Software

Open Source Software

SourceForge Podcast

Resources

Articles
Case Studies
Blog

Menu

Help
Create
Join
Login

Home
Open Source Software
Business
Data Management
Data Quality Tools
Search Results

Search Results for "compiler python linux"

x

Sort By:

Relevance

Clear All Filters

OS

Windows 19
Linux 18
Mac 18
More...
BSD 2
ChromeOS 2

Category

Business 19
- Data Management 19
- Enterprise 1
Artificial Intelligence 6
Software Development 4
System 4
Formats and Protocols 2
Database 1

License

OSI-Approved Open Source 17
Other License 1

Programming Language

Python 18
Java 2
COBOL 1

Status

Production/Stable 1
Mature 1
Inactive 1

Showing 19 open source projects for "compiler python linux"

View related business solutions

Data Quality Windows Clear Filters & Widen Search

Build Agents and Models on One Platform
Everything you need to build production-ready agents and models. Access 200+ Google and third-party AI models and tools.

Gemini Enterprise Agent Platform is Google Cloud's comprehensive platform for developers to build, scale, govern, and optimize agents and models. Choose from Google's most advanced models and third-party models like Anthropic's Claude Model Family.

Try It Free
Our Free Plans just got better! | Auth0
With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.

Try free now
1

Apache Airflow Provider

Great Expectations Airflow operator

Due to apply_default decorator removal, this version of the provider requires Airflow 2.1.0+. If your Airflow version is 2.1.0, and you want to install this provider version, first upgrade Airflow to at least version 2.1.0. Otherwise, your Airflow package version will be upgraded automatically, and you will have to manually run airflow upgrade db to complete the migration. This operator currently works with the Great Expectations V3 Batch Request API only. If you would like to use the...

Downloads: 2 This Week

Last Update: 2026-01-28
See Project
2

ydata-profiling

Create HTML profiling reports from pandas DataFrame objects

ydata-profiling primary goal is to provide a one-line Exploratory Data Analysis (EDA) experience in a consistent and fast solution. Like pandas df.describe() function, that is so handy, ydata-profiling delivers an extended analysis of a DataFrame while allowing the data analysis to be exported in different formats such as html and json.

Downloads: 4 This Week

Last Update: 2026-04-22
See Project
3

CleanVision

Automatically find issues in image datasets

CleanVision automatically detects potential issues in image datasets like images that are: blurry, under/over-exposed, (near) duplicates, etc. This data-centric AI package is a quick first step for any computer vision project to find problems in the dataset, which you want to address before applying machine learning. CleanVision is super simple -- run the same couple lines of Python code to audit any image dataset! The quality of machine learning models hinges on the quality of the data used...

Downloads: 2 This Week

Last Update: 2026-01-05
See Project
4

Arize Phoenix

Uncover insights, surface problems, monitor, and fine tune your LLM

Phoenix provides ML insights at lightning speed with zero-config observability for model drift, performance, and data quality. Phoenix is an Open Source ML Observability library designed for the Notebook. The toolset is designed to ingest model inference data for LLMs, CV, NLP and tabular datasets. It allows Data Scientists to quickly visualize their model data, monitor performance, track down issues & insights, and easily export to improve. Deep Learning Models (CV, LLM, and Generative)...

Downloads: 3 This Week

Last Update: 19 hours ago
See Project
Cut Data Warehouse Costs by 54%
Easily migrate from Snowflake, Redshift, or Databricks with free tools.

BigQuery delivers 54% lower TCO with exabyte scale and flexible pricing. Free migration tools handle the SQL translation automatically.

Try Free
5

FiftyOne

The open-source tool for building high-quality datasets

The open-source tool for building high-quality datasets and computer vision models. Nothing hinders the success of machine learning systems more than poor-quality data. And without the right tools, improving a model can be time-consuming and inefficient. FiftyOne supercharges your machine learning workflows by enabling you to visualize datasets and interpret models faster and more effectively. Improving data quality and understanding your model’s failure modes are the most impactful ways to...

Downloads: 4 This Week

Last Update: 2026-07-01
See Project
6

Dagster

An orchestration platform for the development, production

Dagster is an orchestration platform for the development, production, and observation of data assets. Dagster as a productivity platform: With Dagster, you can focus on running tasks, or you can identify the key assets you need to create using a declarative approach. Embrace CI/CD best practices from the get-go: build reusable components, spot data quality issues, and flag bugs early. Dagster as a robust orchestration engine: Put your pipelines into production with a robust...

Downloads: 5 This Week

Last Update: 2026-07-02
See Project
7

Cleanlab

The standard data-centric AI package for data quality and ML

cleanlab helps you clean data and labels by automatically detecting issues in a ML dataset. To facilitate machine learning with messy, real-world data, this data-centric AI package uses your existing models to estimate dataset problems that can be fixed to train even better models. cleanlab cleans your data's labels via state-of-the-art confident learning algorithms, published in this paper and blog. See some of the datasets cleaned with cleanlab at labelerrors.com. This package helps you...

Downloads: 2 This Week

Last Update: 2026-01-13
See Project
8

SDGym

Benchmarking synthetic data generation methods

The Synthetic Data Gym (SDGym) is a benchmarking framework for modeling and generating synthetic data. Measure performance and memory usage across different synthetic data modeling techniques – classical statistics, deep learning and more! The SDGym library integrates with the Synthetic Data Vault ecosystem. You can use any of its synthesizers, datasets or metrics for benchmarking. You also customize the process to include your own work. Select any of the publicly available datasets from the...

Downloads: 3 This Week

Last Update: 2026-06-02
See Project
9

Diffgram

Training data (data labeling, annotation, workflow) for all data types

From ingesting data to exploring it, annotating it, and managing workflows. Diffgram is a single application that will improve your data labeling and bring all aspects of training data under a single roof. Diffgram is world’s first truly open source training data platform that focuses on giving its users an unlimited experience. This is aimed to reduce your data labeling bills and increase your Training Data Quality. Training Data is the art of supervising machines through data. This...

Downloads: 1 This Week

Last Update: 2024-10-14
See Project
Custom VMs From 1 to 96 vCPUs With 99.95% Uptime
General-purpose, compute-optimized, or GPU/TPU-accelerated. Built to your exact specs.

Live migration and automatic failover keep workloads online through maintenance. One free e2-micro VM every month.

Try Free
10

Gretel Synthetics

Synthetic data generators for structured and unstructured text

Unlock unlimited possibilities with synthetic data. Share, create, and augment data with cutting-edge generative AI. Generate unlimited data in minutes with synthetic data delivered as-a-service. Synthesize data that are as good or better than your original dataset, and maintain relationships and statistical insights. Customize privacy settings so that data is always safe while remaining useful for downstream workflows. Ensure data accuracy and privacy confidently with expert-grade reports....

Downloads: 0 This Week

Last Update: 2025-03-17
See Project
11

Pandas Profiling

Create HTML profiling reports from pandas DataFrame objects

pandas-profiling generates profile reports from a pandas DataFrame. The pandas df.describe() function is handy yet a little basic for exploratory data analysis. pandas-profiling extends pandas DataFrame with df.profile_report(), which automatically generates a standardized univariate and multivariate report for data understanding. High correlation warnings, based on different correlation metrics (Spearman, Pearson, Kendall, Cramér’s V, Phik). Most common categories (uppercase, lowercase,...

Downloads: 0 This Week

Last Update: 2026-04-22
See Project
12

gravitino

Unified metadata lake for data & AI assets.

Apache Gravitino is a high-performance, geo-distributed, and federated metadata lake. It manages metadata directly in different sources, types, and regions, providing users with unified metadata access for data and AI assets.

Downloads: 7 This Week

Last Update: 2025-06-08
See Project
13

Muse: Middleware Universal Scripting idE

Automate: WebSphere; WebLogic; JBoss; Glassfish; Tomcat; Linux, WinRM

Simplify... Aggregate... Automate... Simplify... *** OPEN SOURCE - GPL3/EPL. Use Python / Jython to automate WebSphere, WebLogic, JBoss, Glassfish and Tomcat Middleware Estates over JMX, both SSL and non-SSL + Linux SSH (agent-less) + WinRM Target all 5 servers, Linux and WinRM from the same workspace. Familiar Eclipse based Jython and Python Development IDE, pre-configured and ready to go. 4-Click Installer.

Downloads: 0 This Week

Last Update: 2026-04-30
See Project
14

Encord Active

The toolkit to test, validate, and evaluate your models and surface

Encord Active is an open-source toolkit to test, validate, and evaluate your models and surface, curate, and prioritize the most valuable data for labeling to supercharge model performance. Encord Active has been designed as a all-in-one open source toolkit for improving your data quality and model performance. Use the intuitive UI to explore your data or access all the functionalities programmatically. Discover errors, outliers, and edge-cases within your data - all in one open source...

Downloads: 1 This Week

Last Update: 2024-04-19
See Project
15

data-diff

Efficiently diff rows across two different databases

We're excited to announce the launch of a new open-source product, data-diff that makes comparing datasets across databases fast at any scale. data-diff automates data quality checks for data replication and migration. In modern data platforms, data is constantly moving between systems, and at the modern data volume and complexity, systems go out of sync all the time. Until now, there has not been any tooling to ensure that when the data is correctly copied. Replicating data at scale, across...

Downloads: 3 This Week

Last Update: 2024-02-20
See Project
16

dbt-re-data

re_data - fix data issues before your users & CEO would discover them

re_data is an open-source data reliability framework for the modern data stack. Currently, re_data focuses on observing the dbt project (together with underlaying data warehouse - Postgres, BigQuery, Snowflake, Redshift). Data transformations in re_data are implemented and exposed as models & macros in this dbt package. Gather all relevant outputs about your data in one place using our cloud. Invite your team and debug it easily from there. Go back in time, and see your past metadata. Set up...

Downloads: 1 This Week

Last Update: 2023-12-21
See Project
17

SQLBucket

Lightweight library to write, orchestrate and test your SQL ETL

SQLBucket is a lightweight framework to help write, orchestrate and validate SQL data pipelines. It gives the possibility to set variables and introduces some control flow using the fantastic Jinja2 library. It also implements a very simplistic unit and integration test framework where you can validate the results of your ETL in the form of SQL checks. With SQLBucket, you can apply TDD principles when writing data pipelines. To start working, you need to instantiate your SQLBucket core...

Downloads: 0 This Week

Last Update: 2023-06-12
See Project
18

CloverDX

Design, automate, operate and publish data pipelines at scale

Please, visit www.cloverdx.com for latest product versions. Data integration platform; can be used to transform/map/manipulate data in batch and near-realtime modes. Suppors various input/output formats (CSV,FIXLEN,Excel,XML,JSON,Parquet, Avro,EDI/X12,HL7,COBOL,LOTUS, etc.). Connects to RDBMS/JMS/Kafka/SOAP/Rest/LDAP/S3/HTTP/FTP/ZIP/TAR. CloverDX offers 100+ specialized components which can be further extended by creation of "macros" - subgraphs - and libraries, shareable with 3rd...

4 Reviews

Downloads: 0 This Week

Last Update: 2023-05-04
See Project
19

COBOL Data Definitions

Parse, analyze and -- most importantly -- use COBOL data definitions. This gives you access to COBOL data from Python programs. Write data analyzers, one-time data conversion utilities and Python programs that are part of COBOL systems. Really.

Downloads: 0 This Week

Last Update: 2013-04-26
See Project

Previous
You're on page 1
Next

Related Searches

phoenix

sha256sum

gym software

roof

oracle apex sample

data replication

data warehouse

etl

cobol copybook to json convert

windows 8.1 service pack 1

Related Categories

Business

Artificial Intelligence

Software Development

System

Formats and Protocols

SourceForge

Create a Project
Open Source Software
Business Software
Top Downloaded Projects

Company

About
Team
SourceForge Headquarters
1320 Columbia Street Suite 310
San Diego, CA 92101
+1 (858) 422-6466

Resources

Support
Site Documentation
Site Status
SourceForge Reviews

© 2026 Slashdot Media. All Rights Reserved.

Terms Privacy Opt Out Advertise