Showing 20 open source projects for "open any file"

View related business solutions
  • The #1 Embedded Analytics Solution for SaaS Teams. Icon
    The #1 Embedded Analytics Solution for SaaS Teams.

    Qrvey saves engineering teams time and money with a turnkey multi-tenant solution connecting your data warehouse to your SaaS application.

    Qrvey’s comprehensive embedded analytics software enables you to design more customizable analytics experiences for your end users.
    Try Developer Playground
  • Red Hat Enterprise Linux on Microsoft Azure Icon
    Red Hat Enterprise Linux on Microsoft Azure

    Deploy Red Hat Enterprise Linux on Microsoft Azure for a secure, reliable, and scalable cloud environment, fully integrated with Microsoft services.

    Red Hat Enterprise Linux (RHEL) on Microsoft Azure provides a secure, reliable, and flexible foundation for your cloud infrastructure. Red Hat Enterprise Linux on Microsoft Azure is ideal for enterprises seeking to enhance their cloud environment with seamless integration, consistent performance, and comprehensive support.
    Learn More
  • 1
    SDGym

    SDGym

    Benchmarking synthetic data generation methods

    The Synthetic Data Gym (SDGym) is a benchmarking framework for modeling and generating synthetic data. Measure performance and memory usage across different synthetic data modeling techniques – classical statistics, deep learning and more! The SDGym library integrates with the Synthetic Data Vault ecosystem. You can use any of its synthesizers, datasets or metrics for benchmarking. You also customize the process to include your own work. Select any of the publicly available datasets from...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 2
    NBi

    NBi

    NBi is a testing framework (add-on to NUnit)

    ... to compile your test suite. Just create an Xml file and let the framework interpret it and play your tests. The framework is designed as an add-on of NUnit but with the possibility to port it easily to other testing frameworks.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    CleanVision

    CleanVision

    Automatically find issues in image datasets

    CleanVision automatically detects potential issues in image datasets like images that are: blurry, under/over-exposed, (near) duplicates, etc. This data-centric AI package is a quick first step for any computer vision project to find problems in the dataset, which you want to address before applying machine learning. CleanVision is super simple -- run the same couple lines of Python code to audit any image dataset! The quality of machine learning models hinges on the quality of the data used...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    Cleanlab

    Cleanlab

    The standard data-centric AI package for data quality and ML

    ... label issues and other data issues, so you can train reliable ML models. All features of cleanlab work with any dataset and any model. Yes, any model: PyTorch, Tensorflow, Keras, JAX, HuggingFace, OpenAI, XGBoost, scikit-learn, etc. If you use a sklearn-compatible classifier, all cleanlab methods work out-of-the-box.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Deliver secure remote access with OpenVPN. Icon
    Deliver secure remote access with OpenVPN.

    Trusted by nearly 20,000 customers worldwide, and all major cloud providers.

    OpenVPN's products provide scalable, secure remote access — giving complete freedom to your employees to work outside the office while securely accessing SaaS, the internet, and company resources.
    Get started — no credit card required.
  • 5
    ODD Platform

    ODD Platform

    First open-source data discovery and observability platform

    Unlock the power of big data with OpenDataDiscovery Platform. Experience seamless end-to-end insights, powered by unprecedented observability and trust - from ingestion to production - while building your ideal tech stack! Democratize data and accelerate insights. Find data that fits your use case and discover hints left by your peers to leverage existing knowledge. Explore tags, ownership details, links to other sources and other information to shorten and simplify data discovery phase....
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    data-diff

    data-diff

    Efficiently diff rows across two different databases

    We're excited to announce the launch of a new open-source product, data-diff that makes comparing datasets across databases fast at any scale. data-diff automates data quality checks for data replication and migration. In modern data platforms, data is constantly moving between systems, and at the modern data volume and complexity, systems go out of sync all the time. Until now, there has not been any tooling to ensure that when the data is correctly copied. Replicating data at scale, across...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    Encord Active

    Encord Active

    The toolkit to test, validate, and evaluate your models and surface

    Encord Active is an open-source toolkit to test, validate, and evaluate your models and surface, curate, and prioritize the most valuable data for labeling to supercharge model performance. Encord Active has been designed as a all-in-one open source toolkit for improving your data quality and model performance. Use the intuitive UI to explore your data or access all the functionalities programmatically. Discover errors, outliers, and edge-cases within your data - all in one open source toolkit...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    Pandas Profiling

    Pandas Profiling

    Create HTML profiling reports from pandas DataFrame objects

    ..., separator), scripts (Latin, Cyrillic) and blocks (ASCII, Cyrilic). File sizes, creation dates, dimensions, indication of truncated images and existance of EXIF metadata. Mostly global details about the dataset (number of records, number of variables, overall missigness and duplicates, memory footprint). Comprehensive and automatic list of potential data quality issues (high correlation, skewness, uniformity, zeros, missing values, constant values, between others).
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    Dagster

    Dagster

    An orchestration platform for the development, production

    ..., multi-tool engine that scales technically and organizationally. Dagster as a unified control plane: The ‘single plane of glass’ data teams love to use. Rein in the chaos and maintain control over your data as the complexity scales. Centralize your metadata in one tool with built-in observability, diagnostics, cataloging, and lineage. Spot any issues and identify performance improvement opportunities.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Bright Data - All in One Platform for Proxies and Web Scraping Icon
    Bright Data - All in One Platform for Proxies and Web Scraping

    Say goodbye to blocks, restrictions, and CAPTCHAs

    Bright Data offers the highest quality proxies with automated session management, IP rotation, and advanced web unlocking technology. Enjoy reliable, fast performance with easy integration, a user-friendly dashboard, and enterprise-grade scaling. Powered by ethically-sourced residential IPs for seamless web scraping.
    Get Started
  • 10
    SQLBucket

    SQLBucket

    Lightweight library to write, orchestrate and test your SQL ETL

    ... with the project_folder parameter. That folder will contain all your SQL ETL. The python file where you create your SQLBucket object is also a good place to instantiate your command line interface.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    WhyLogs Java Library

    WhyLogs Java Library

    Profile and monitor your ML data pipeline end-to-end

    This is a Java implementation of WhyLogs, with support for Apache Spark integration for large scale datasets. Understanding the properties of data as it moves through applications is essential to keeping your ML/AI pipeline stable and improving your user experience, whether your pipeline is built for production or experimentation. WhyLogs is an open source statistical logging library that allows data science and ML teams to effortlessly profile ML/AI pipelines and applications, producing log...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    Open Source Data Quality and Profiling

    Open Source Data Quality and Profiling

    World's first open source data quality & data preparation project

    This project is dedicated to open source data quality and data preparation solutions. Data Quality includes profiling, filtering, governance, similarity check, data enrichment alteration, real time alerting, basket analysis, bubble chart Warehouse validation, single customer view etc. defined by Strategy. This tool is developing high performance integrated data management platform which will seamlessly do Data Integration, Data Profiling, Data Quality, Data Preparation, Dummy Data Creation...
    Leader badge
    Downloads: 6 This Week
    Last Update:
    See Project
  • 13
    CloverDX

    CloverDX

    Design, automate, operate and publish data pipelines at scale

    Please, visit www.cloverdx.com for latest product versions. Data integration platform; can be used to transform/map/manipulate data in batch and near-realtime modes. Suppors various input/output formats (CSV,FIXLEN,Excel,XML,JSON,Parquet, Avro,EDI/X12,HL7,COBOL,LOTUS, etc.). Connects to RDBMS/JMS/Kafka/SOAP/Rest/LDAP/S3/HTTP/FTP/ZIP/TAR. CloverDX offers 100+ specialized components which can be further extended by creation of "macros" - subgraphs - and libraries, shareable with 3rd...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 14
    apache spark data pipeline osDQ

    apache spark data pipeline osDQ

    osDQ dedicated to create apache spark based data pipeline using JSON

    This is an offshoot project of open source data quality (osDQ) project https://sourceforge.net/projects/dataquality/ This sub project will create apache spark based data pipeline where JSON based metadata (file) will be used to run data processing , data pipeline , data quality and data preparation and data modeling features for big data. This uses java API of apache spark. It can run in local mode also. Get json example at https://github.com/arrahtech/osdq-spark How to run Unzip...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    DataCleaner

    DataCleaner

    Data quality analysis, profiling, cleansing, duplicate detection +more

    DataCleaner is a data quality analysis application and a solution platform for DQ solutions. It's core is a strong data profiling engine, which is extensible and thereby adds data cleansing, transformations, enrichment, deduplication, matching and merging. Website: http://datacleaner.github.io
    Leader badge
    Downloads: 102 This Week
    Last Update:
    See Project
  • 16
    EPRI Open PQ Dashboard

    EPRI Open PQ Dashboard

    Demos new techniques for extracting information from PQ data files

    Open PQ Dashboard version 1.0 provides visual displays to quickly convey the status and location of power quality (PQ) anomalies throughout the electrical power system. Summary displays starts with the choice of a geospatial map-view or annunciator panel, both with unique visualizations for across-the-room visualizations fit for a PQ operations center. Drill-downs are in place for various statistics and guide users all the way down to the waveform level. This version consist of a few proof...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    TTA Lossless Audio Codec
    Lossless compressor for multichannel 8,16 and 24 bits audio data, with the ability of password data protection. Being 'lossless' means that no data/quality is lost in the compression - when uncompressed, the data will be identical to the original.
    Leader badge
    Downloads: 99 This Week
    Last Update:
    See Project
  • 18
    Toolsverse ETL Framework

    Toolsverse ETL Framework

    Open source Extract Transform Load engine written in Java

    ETL Framework is a standalone Extract Transform Load engine written in Java. It includes executables for all major platforms and can be easily integrated into other applications. Key Features: * embeddable, open source and free * fast and scalable * uses target database features to do transformations and loads * manual and automatic data mapping * data streaming * bulk data loads * data quality features using SQL, JavaScript? and regex * data transformations Requirements * Java...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    AMB Data Profiling Data Quality
    AMB New Generation Data Empowerment - offers a comprehensive approach to data governance needs with ground breaking features to locate, identify, discover, manage and protect your overall data infrastructure. Repeatable Process/Exposed Repository.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 20
    COBOL Data Definitions
    Parse, analyze and -- most importantly -- use COBOL data definitions. This gives you access to COBOL data from Python programs. Write data analyzers, one-time data conversion utilities and Python programs that are part of COBOL systems. Really.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • Next