From: Pablo N. M. <pa...@pa...> - 2004-08-05 15:43:57
|
Hi Arnaud, I don't know if I'll be able to answer all your questions, but I'll try some to the extent of my knowledge on GUS up to now. > I have the sequences in FASTA, what regexp shall I need? > Is only --regex_name parameter required? > Does it matter if I don't give taxon_id parameter ? I've run the plugin with these parameters: [pablo@mkiwi mcl]$ ga GUS::Common::Plugin::InsertNewExternalSequences --external_database_release_id=3D38 --regex_source_id=3D(.*) --table_name=3DDoTS::ExternalAASequence --sequencefile=3Dtbrucei =96commit The regex_*, as you probably noted, are the regular expressions to extract the referred info from the FASTA header (name, source_id, secondary_id, etc.). I've only used the regex_source_id. Also, it seems not to matter if you don't give the taxon_id parameter. But you obviously won't make the associations in DoTS::AASequenceTaxon between sequences and taxa. > Does LoadBlastSimFast module require generateBlastSimilarity.pl script? Where can I get this script ? This module reads similarity results in an especific format like: >479679 (3 subjects) Sum: 479680:1871:1.2e-194:1:353:1:353:1:353:353:353:0: HSP1: 479680:353:353:353:1871:1.2e-194:1:353:1:353:0: Sum: 488460:1826:7.0e-190:1:353:1:353:1:353:342:348:0: HSP1: 488460:342:348:353:1826:7.0e-190:1:353:1:353:0: >479680 (3 subjects) Sum: 479679:1871:1.2e-194:1:353:1:353:1:353:353:353:0: HSP1: 479679:353:353:353:1871:1.2e-194:1:353:1:353:0: The script parseBlastFilesForSimilarity.pl (attached) will do the trick. I don't know if there are multiple versions of this script traveling around the list. [jdai@headnode mclorth]$ ls /scratch/jdai/Cpgus_vs_Pfgus/ | perl parseBlastFilesForSimilarity.pl --regex=3D'(\S+)' --outputFile=3DLm_vs_Lm_parsed=20 --dir=3D/scratch/jdai/Cpgus_vs_Pfgus/ Hope this is useful, Pablo On Thu, 2004-08-05 at 08:49, Arnaud Kerhornou wrote: > Hi >=20 > I want to load BLAST results in GUS. > Before running LoadBlastSimFast module, I want to load Uniprot and EMBL= =20 > databases in DoTS::ExternalAASequence and DoTS::ExternalNASequence.,=20 > just the Ids not the sequences themselves. >=20 > I need some help to use GUS::Common::Plugin::InsertNewExternalSequences= =20 > plugin. >=20 > I have the sequences in FASTA, what regexp shall I need ? Is only =20 > --regex_name parameter required ? I don't know the taxon attached to th= e=20 > sequence entries. Does it matter if I don't give taxon_id parameter ? >=20 > Re. LoadBlastSimFast module, It seems to parse the output in a specific= =20 > format. Does it require generateBlastSimilarity.pl script ? Where can I= =20 > get this script ? >=20 > cheers > Arnaud --=20 ----------------------------- Pablo Nascimento Mendes CTEGD EMF TIPS Fellow Kissinger Lab Department of Genetics University of Georgia C210 Life Sciences Bldg. Athens, Georgia 30602 Phone:706 542-1447 E-mail: pa...@ug... |