Menu

Data loading

Keith Ching

Loading data into CELLX

TCGA
1. fetch files
2. process files
3. parse files
4. load files

[RNA-Seq]

CNV
too many fragments is an indication that the SNP6 chip failed.
remove samples >=3500 segments from the database.

MYSQL
Data cleaned with cleanDatabase.pl and cleanDatabaseMETA.pl
Optimized tables to shrink database size after cleaning.
mysqlcheck -o <db_name> -u <username> -p

MUTATIONS
Mutation formats varied between TCGA releases such that identical, yet unique mutations may be entered. eg. KRAS G12D, p.G12D. Try to catch these and remove. fixMutations.pl

EXPRESSION
Normalized RSEM values from TCGA are converted to log2. Any zero or negative values are set to 0.01
Values are stored in the database as integers. ( expression x 1000). They need to be divided by 1000 after retrieval.


Related

Wiki: Home
Wiki: RNA-Seq

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.