DATPROF
Test Data Management solutions like data masking, synthetic data generation, data subsetting, data discovery, database virtualization, data automation are our core business.
We see and understand the struggles of software development teams with test data. Personally Identifiable Information? Too large environments? Long waiting times for a test data refresh? We envision to solve these issues:
- Obfuscating, generating or masking databases and flat files;
- Extracting or filtering specific data content with data subsetting;
- Discovering, profiling and analysing solutions for understanding your test data,
- Automating, integrating and orchestrating test data provisioning into your CI/CD pipelines and
- Cloning, snapshotting and timetraveling throug your test data with database virtualization.
We improve and innovate our test data software with the latest technologies every single day to support medium to large size organizations in their Test Data Management.
Learn more
Parallel Domain Replica Sim
Parallel Domain Replica Sim enables the creation of high-fidelity, fully annotated, simulation-ready environments from users’ own captured data (photos, videos, scans). With PD Replica, you can generate near-pixel-perfect reconstructions of real-world scenes, transforming them into virtual environments that preserve visual detail and realism. PD Sim provides a Python API through which perception, machine learning, and autonomy teams can configure and run large-scale test scenarios and simulate sensor inputs (camera, lidar, radar, etc.) in either open- or closed-loop mode. These simulated sensor feeds come with full annotations, so developers can test their perception systems under a wide variety of conditions, lighting, weather, object configurations, and edge cases, without needing to collect real-world data for every scenario.
Learn more
YData
Adopting data-centric AI has never been easier with automated data quality profiling and synthetic data generation. We help data scientists to unlock data's full potential. YData Fabric empowers users to easily understand and manage data assets, synthetic data for fast data access, and pipelines for iterative and scalable flows. Better data, and more reliable models delivered at scale. Automate data profiling for simple and fast exploratory data analysis. Upload and connect to your datasets through an easily configurable interface. Generate synthetic data that mimics the statistical properties and behavior of the real data. Protect your sensitive data, augment your datasets, and improve the efficiency of your models by replacing real data or enriching it with synthetic data. Refine and improve processes with pipelines, consume the data, clean it, transform your data, and work its quality to boost machine learning models' performance.
Learn more
DataCebo Synthetic Data Vault (SDV)
The Synthetic Data Vault (SDV) is a Python library designed to be your one-stop shop for creating tabular synthetic data. The SDV uses a variety of machine learning algorithms to learn patterns from your real data and emulate them in synthetic data. The SDV offers multiple models, ranging from classical statistical methods (GaussianCopula) to deep learning methods (CTGAN). Generate data for single tables, multiple connected tables, or sequential tables. Compare the synthetic data to the real data against a variety of measures. Diagnose problems and generate a quality report to get more insights. Control data processing to improve the quality of synthetic data, choose from different types of anonymization, and define business rules in the form of logical constraints. Use synthetic data in place of real data for added protection, or use it in addition to your real data as an enhancement. The SDV is an overall ecosystem for synthetic data models, benchmarks, and metrics.
Learn more