From: Margarita R. <rui...@ya...> - 2007-10-03 20:36:26
|
Hello GusDev members, I need to load the cd-hits file result. If somebody already made something similar it could help me, please. My ideia is as follow: The input file format is (cd-hit file results): >Cluster 0 > 0 17392aa, >AAZ14281.1... * > 1 17392aa, >XP_843163.1... at 100% > 2 17392aa, >XP_843163.1... at 100% > >Cluster 1 > 0 10589aa, >AAN35571.1... * > 1 10589aa, >XP_001347658.1... at 100% > 2 10589aa, >XP_001347658.1... at 100% > >Cluster 2 > 0 10287aa, >XP_966264.1... * > 1 10287aa, >XP_966264.1... at 100% > >Cluster 3 > 0 10061aa, >CAD51479.1... * > 1 10061aa, >XP_001351672.1... at 100% > 2 10061aa, >XP_001351672.1... at 100% Then, I think to use three tables for that: Dots.sequencesequencegroup Dots.sequencegroup and Dots.seqgroupexperiment In the Dots.seqgroupexperiment table, I'll put the description of the executed CD-HIT . Ex. (Dots.SeqGroupExperiment.description = "tcruzi vs tcruzi 100%" Dots.SeqGroupExperiment.sequence_source = "tcruzi" Dots.SeqGroupExperiment.percent_identity = "1") For the groups, I'll use Dots.sequencegroup. Ex.(Dots.SequenceGroup.number_of_members = 3 Dots.SequenceGroup.number_of_taxa = 1 Dots.SequenceGroup.min_percent_match = 1 Dots.SequenceGroup.max_percent_match = 1) and in the Dots.sequencesequencegroup, I'll put the sequences for each group. Ex. (sequence_id = aa_sequence_id of the sequence sequence_group_id = the identifier of the group source_table_id = in this case the identifier of the Dots.TranslatedAASequence) Thanks for your help, Margarita Ruiz Oswaldo Cruz Institute Rio de Janeiro, Brazil --------------------------------- ¡Sé un mejor besador! Comparte todo lo que sabes sobre besos en: http://telemundo.yahoo.com/promos/mejorbesador.html |