Tofu is a Python library for generating synthetic UK Biobank data. The UK Biobank is a large open-access prospective research cohort study of 500,000 middle-aged participants recruited in England, Scotland and Wales. The study has collected and continues to collect extensive phenotypic and genotypic detail about its participants, including data from questionnaires, physical measures, sample assays, accelerometry, multimodal imaging, genome-wide genotyping and longitudinal follow-up for a wide range of health-related outcomes. Tofu will generate synthetic data which conforms to the structure of the baseline data UK Biobank sends researchers by generating random values. For categorical variables (single or multiple choices), a random value will be picked from the UK Biobank data dictionary for that field. For continuous variables, a random value will be generated based on the distribution of values reported for that field on the UK Biobank showcase.

Features

  • For categorical variables (single or multiple choices), a random value will be picked from the UK Biobank data dictionary for that field
  • For continous variables, a random value will be generated based on the distribution of values reported for that field on the UK Biobank showcase
  • For date and date/time fields, a random date will be generated
  • For all other fields, such as polymorphic fields, no data will be generated
  • The lookups directory contains lookups downloaded from the UK Biobank showcase
  • Data conform to the structure and schema of the baseline file but are otherwise nonsensical: no checks have been implemented across fields

Project Samples

Project Activity

See All Activity >

Follow Tofu

Tofu Web Site

You Might Also Like
Achieve perfect load balancing with a flexible Open Source Load Balancer Icon
Achieve perfect load balancing with a flexible Open Source Load Balancer

Take advantage of Open Source Load Balancer to elevate your business security and IT infrastructure with a custom ADC Solution.

Boost application security and continuity with SKUDONET ADC, our Open Source Load Balancer, that maximizes IT infrastructure flexibility. Additionally, save up to $470 K per incident with AI and SKUDONET solutions, further enhancing your organization’s risk management and cost-efficiency strategies.
Rate This Project
Login To Rate This Project

User Reviews

Be the first to post a review of Tofu!

Additional Project Details

Programming Language

Python

Related Categories

Python Synthetic Data Generation Software

Registered

2023-05-22