Some inconsistencies were recently discovered that affect
the following mutation data: three data sets DLBC, LGG, and LUSC had some data loaded under the wrong tissue type.
141 samples LGG -> GBM
55 samples LUSC -> LUAD
79 samples DLBC -> LAML
one sample got loaded once under LUSC, then again under LUAD..
the corresponding entries in the cells table also had the wrong tissue type and code (alias)
The errors seem to be due to TCGA files initially named with the wrong tissue code.
Two scripts found under data/load will correct the inconsistencies and have already been applied to the demo site.
verifyCellMETA.pl
verifyCellDatasources.pl
The database files in the svn will be updated in next release of the database.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Some inconsistencies were recently discovered that affect
the following mutation data: three data sets DLBC, LGG, and LUSC had some data loaded under the wrong tissue type.
141 samples LGG -> GBM
55 samples LUSC -> LUAD
79 samples DLBC -> LAML
one sample got loaded once under LUSC, then again under LUAD..
the corresponding entries in the cells table also had the wrong tissue type and code (alias)
The errors seem to be due to TCGA files initially named with the wrong tissue code.
Two scripts found under data/load will correct the inconsistencies and have already been applied to the demo site.
verifyCellMETA.pl
verifyCellDatasources.pl
The database files in the svn will be updated in next release of the database.