This is an implementation of the 14 chromosomes of P.falciparum in the chado schema (see chado.ddl in this directory for ddl used). Please send any comments or problems to: David Emmert (FlyBase) emmert@morgan.harvard.edu The data was taken from the following GenBank RefSeq records: NC_000521 NC_000910 NC_004314 NC_004315 NC_004316 NC_004317 NC_004318 NC_004325 NC_004326 NC_004327 NC_004328 NC_004329 NC_004330 NC_004331 A few comments about this implementation: 1) Interbase locations were used. 1) To save time/pain in parsing, the "gene" and "mRNA" features in the GB records were ignored. The "protein-coding gene" and "messenger RNA" locations in chado_pf were extrapolated from the "CDS" locations. As far as I could tell, the GB "gene" and "mRNA" locations always matched up with the "CDS" , but a consciencous load of the data would probably have a close look at the GB "mRNA" features to see if there are any UTRs before discounting them as I have done. 2) The same "/gene" value was used in several different CDS features in the GB annotations. In order to have unique values, when i enountered these, I appended a number, "_n", to the gene symbol, to keep it unique. When no "/gene" value was given, I named genes using the GB-acc#, eg, "NC_004328_unknown_gene". 3) I didn't get the rRNAs or misc_RNAs in!