I have a question regarding the sequence datafile. I constantly get the following error message: "Error in sequence data file: Z at 1 seq 46.
Make sure to separate the sequence from its name by 2 or more spaces."
I have tried everything, a.o. I have added multiple spaces between the sequencename and tag (^nr) I have added multiple spaces after each sequence, I have added extra blank spaces after each sequence and extra blank spaces after each sequencename.
My names look like: Z01^1 and G62^5 etc... With Z01 indicating the specimen and ^1 referring to the specieslist in the Imap file.
I simply have no other solutions, it keeps asking for extra spaces. Can someone help me with this, any advice?
Regards
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
there are mistakes in your data files. here are a few things i found.
the first locus has 45 sequences, but you specified 46.
there are only 4 loci, but you specify 5 in the bpp control file.
Apart from those, the file format is ok, but the file may still not be readable, as it seems to contain strange invisible characters. You can try a few different things just to see whether any of them makes any difference. I am not sure what editor you are using. Some of the editors are poorly written and can cause problems. try some other editors and save the file in slightly different ways, and see whether it makes a difference. Hidden characters like LF (line feed), CR (catridge return), etc. are a common source of problems, especially if you use different computers running windows, mac, and linux. you can delete the blank spaces and retype the spaces. you can use word to save the file in plain text, with and without line breaks, for example. try those ideas that should not make a difference and hope they will make a difference.
In the control file, use nloci = 1, 2, 3, 4, 5 in which case the program will just read the first few loci and ignore the rest, and check the screen output to make sure that the sequence names for each locus are read correctly. For example, if you say 46 sequences, you will see a line like the following when the program is reading the first locus:
this is wrong as ZV26^1 is the name of the first sequence for locus 2. if you change 46 into 45, you get the correct output as follows:
Reading seq #45: T30^39
also use m d t f c for the five populations (instead of the long names) to make the output easy to read.
best,
ziheng
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Anonymous
Anonymous
-
2015-06-12
Maybe its good to give a short example of my sequence file:
So between the previous sequence and the name a blank line. Between the name and the sequence there are multiple spaces..
ZV16 is the specimen name (identical for multiple sequences) and ^2 the code for the species file.
Thanks!
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi!
I have a question regarding the sequence datafile. I constantly get the following error message: "Error in sequence data file: Z at 1 seq 46.
Make sure to separate the sequence from its name by 2 or more spaces."
I have tried everything, a.o. I have added multiple spaces between the sequencename and tag (^nr) I have added multiple spaces after each sequence, I have added extra blank spaces after each sequence and extra blank spaces after each sequencename.
My names look like: Z01^1 and G62^5 etc... With Z01 indicating the specimen and ^1 referring to the specieslist in the Imap file.
I simply have no other solutions, it keeps asking for extra spaces. Can someone help me with this, any advice?
Regards
hi thijmen,
there are mistakes in your data files. here are a few things i found.
the first locus has 45 sequences, but you specified 46.
there are only 4 loci, but you specify 5 in the bpp control file.
Apart from those, the file format is ok, but the file may still not be readable, as it seems to contain strange invisible characters. You can try a few different things just to see whether any of them makes any difference. I am not sure what editor you are using. Some of the editors are poorly written and can cause problems. try some other editors and save the file in slightly different ways, and see whether it makes a difference. Hidden characters like LF (line feed), CR (catridge return), etc. are a common source of problems, especially if you use different computers running windows, mac, and linux. you can delete the blank spaces and retype the spaces. you can use word to save the file in plain text, with and without line breaks, for example. try those ideas that should not make a difference and hope they will make a difference.
the attached file reads o.k.
http://abacus.gene.ucl.ac.uk/software/bpp3.1a.tgz
here is a version of the program that can print out more rubbish on the screen if you use
noisy = 9
In the control file, use nloci = 1, 2, 3, 4, 5 in which case the program will just read the first few loci and ignore the rest, and check the screen output to make sure that the sequence names for each locus are read correctly. For example, if you say 46 sequences, you will see a line like the following when the program is reading the first locus:
Reading sequences, sequential format..
Reading seq #46: ZV26^1
this is wrong as ZV26^1 is the name of the first sequence for locus 2. if you change 46 into 45, you get the correct output as follows:
Reading seq #45: T30^39
also use m d t f c for the five populations (instead of the long names) to make the output easy to read.
best,
ziheng
Maybe its good to give a short example of my sequence file:
GATACCACGAAAGGCGTTGGTAACTTAAGACAGCAGGACGGTGGCCATGGAAGTCGGA
ZV16^2
GGACCAAGGAGTCTAGCATGTGCGCGAGTCATTGGGACTCTGATAAACCTAAAGGCGCAATGAAAGTGAAGGTCCGCCTTGCGCGGACCGAGGGAGGATGGGCC
So between the previous sequence and the name a blank line. Between the name and the sequence there are multiple spaces..
ZV16 is the specimen name (identical for multiple sequences) and ^2 the code for the species file.
Thanks!