[Pypedal-devel] Gianluca to John
Brought to you by:
wintermind
From: John C. <joh...@gm...> - 2004-09-02 00:37:25
|
Alle 21:36, mercoled=EC 01 settembre 2004, John Cole, ha scritto: > I have attached a file, pyp_newclasses_gianluca.py, with responses to > your questions. They are denoted "JB:". Let me know if I failed to > answer your questions clearly. Please remember that I have only > worked on pyp_newclasses.py a little bit -- the code does NOT work yet > and there are lots of missing pieces. I'm sorry if I appear a little boring, but in this way I can understand you= r programming point of view... > Also please bear in mind that > the code evolved to meet my personal needs, so my design decisions may > not always make sense. There were a lot of deadlines that I had to > meet to get my dissertation done in time to graduate. :-) I am going > to create a developer forum on the Sourceforge site -- I think that it > would be better to have these design discussions there so that I do > not lose any ideas. I'd prefer to use the mailing list. I've still subscribed it. > As far as spaces versus tabs goes, I do not really care. If you > prefer spaces then we will use spaces. You will see that there is a > mix of the two styles in the CVS tree; I keep trying different editors > and cannot find one that I really like. Maybe I should try WingIDE. Did you try eric3? http://www.die-offenbachs.de/detlev/eric3.html It's a little boring to install, 'cause it runs with qt and needs also qscintilla, sip, pyqt, but when I tried it, I really SAW THE LIGHT! # JB: At the moment source is not checked until the file is opened (= if if exists), so for now it assumes that the # user will know what to do when Python throws a file-not-found exception. We can check it here to save a little # time, I guess. # JB: As far as source being multiple files...we would have to do a little more work to make sure that there are no # animalID collisions between the two files but I do not see any reason why pedigree data could not be loaded # from more than one source. This would require multiple calls = to load() but that is doable. When I talk about multiple sources I mean databases, plain text files, gedc= om files, xmls, pickled files and so on. Each one store pedigrees in a differe= nt way. For this reason is better to do a Pedigree class that doesn't care abo= ut pedigree format. if source is a list, you can define a factory function lik= e those: def pedigree_from_file(inputfile, format, ecc.): list=3Dsome_operation_on_input_file() return Pedigree(list, other_arguments) def pedigree_from_db(driver, user, passwd, db, sql_statement): list=3Dsome_operation_on_db() return Pedigree(list, other_arguments) # GS: Here all arguments specified in kw becomes class attributes: (python 2.2 and later) # JB: I have not used the **kw construct before, but I want to. If you look at some of the # procedures in pyp_utils.py you will see that a ridiculous numb= er of aprameters are # passed explicitly. Using **kw is a little bit better way to handle that. you only save some parameter declarations. but in taht way you make your co= de harder to understand # So let's learn how to read things in from a configuration file. #GS: for such kind of operations I prefer to use XML file. Xmls are harder to modify by and, by easier to #GS: modify by a routine. Please, remeber me to send you a little class to manage configuration xml. # JB: This is probably a matter of personal preference. I feel that an XML configuration file is needlessly # complex. In fact, I feel strongly enough about that to insist on using ConfigParser OR to support both # styles of configuration file. You can try and change my mind = if you like, but you probably will not # succeed. :-) I attach two files, an xml and a little class to parse it. To support both options is not a good idea. decide what's the better for you; however this = is a secondary aspects. #GS: From where renumber comes? have you forgotten module reference? # JB: This is NOT working code. The idea is that renumber will = be moved from # pyp_utils.py to a method of this class. Ok, sorry. Regarding this point. I thinked that there are two different strategies in structuring classes of pypedal. One is to build a pedigree class that contains all calculation methods. The other is to build a pedigr= ee class that only store a pedigree and different classes that take a pedigree instance as __init__ argument and perform specific calculation. I make a little schema to better understand myself :-)) class Pedigree: def __init__(self, arguments...): methods and attributes class demog: def __init__(self, PedInstance, ecc) ... def age_distribution(self): pass class metrics: def __init__(self, pedinstance): ecc __init__(self,animalID,sireID,damID,gen=3D'0',by=3D1900,sex=3D'u',fa=3D0.,n= ame=3D'u',alleles=3D['',''],species=3D'u',breed=3D'u',age=3D-999,alive=3D-9= 99): #GS: Why to set by=3D1900? In humans pedigree this can provide some matters. I'd prefer None # JB: This is an arbitrary number. The code was originally written = to work with dog pedigrees that had complete # birthdate information. If we set it to None then we will need to trap that in some other procedures/methods. Than -1 can be a reasonable value? So animals without "by" in sorting pedig= ree comes allways before all others # For example, the Boichard algorithms for effective founder and ancestor numbers require generation information. # When generations are not provided in the pedigree file they ar= e inferred using, in part, birth year. I am open to # changing this. Have you an article or a something else that can be attached in an e-mail t= hat can me explain the Boichard algorithms? I googled for that but I didn't fin= d nothing. # JB: Oh! The pad_id() method uses the birthdate to form the padded ID. A question, what is the pad_id() function intended for? #GS: As soon as I can understand only two alleles can be specified. # JB: Because I only added alleles to support gene dropping. I am open to changing this as well -- # human geneticists probably need to handle more information tha= n that. At the moment there is not need for managing multiple markers information, = but in the (far?) future the class pedigree can be extended to perform some calculation on genotype, for example haplotype extimation, linkage disequilibrium etc... so can be useful a structure like a dictionary: genotype=3D{'markerName':["allele1", "allelee"],'anothermarkerName':["allel= e1", "allelee"],...} if alleles =3D=3D ['',''] and self.founder =3D=3D 'y': _allele_1 =3D '%s%s' % (self.paddedID,'__1') _allele_2 =3D '%s%s' % (self.paddedID,'__2') here is an answer for the question above. #gs: This concerns the feature I ask for. Do you think it is possib= le to use STRING animal's(person's) ID? #gs: perhaps with a mandatoru renumber. # JB: I think that it would not be too hard to do that. We just nee= d to hash the string to an integer somehow. # Once that is done the reorder and renumber routines should handle it from there. # JB: The DEBUG statements should (perhaps) have a different name. = I use them to track what the program # is doing even when it is working correctly. Maybe a verbose switch would be a good replacement here. # the debug statements are called even when there are no eception= s thrown (and therefore no tracebacks). Ok --=20 Dr. John B. Cole, Research Geneticist Animal Improvement Programs Laboratory 10300 Baltimore Avenue BARC-West, Building 005, Room 306 Beltsville, Maryland 20705-2350 |