Fwd: [Pypedal-devel] Gianluca to John
Brought to you by:
wintermind
From: John C. <joh...@gm...> - 2004-09-10 20:05:15
|
I finally realized that I sent this to myself rather than to the list. D'oh! ---------- Forwarded message ---------- From: John B. Cole <jc...@ai...> Date: Thu, 02 Sep 2004 13:59:26 -0400 Subject: Re: [Pypedal-devel] Gianluca to John To: John Cole <joh...@gm...> Gianluca- Your questions do not seem boring. I am sometimes embarassed because a lot of the stuff in PyPedal happened on-the-fly rather than as the result of a planning process. > <>I'm sorry if I appear a little boring, but in this way I can > understand your > programming point of view... No problem. > <>I'd prefer to use the mailing list. I've still subscribed it. I installed it last night buyt have not yet used it much. Its CVS support does not fit well tieh Sourceforge. >Did you try eric3? > > Okay, I see what you mean now. If this is important to you we can look at that. My suggestion would be to put stubs in place while we focus on getting the basics working well. Once we feel like things work well then we can add the plugins (if you will) to handle things like GEDCOM. I did look at GEDCOM a while back but it is much more complicated than I need at the moment. If you are working with human pedigrees, though, we will probably need it in there. >When I talk about multiple sources I mean databases, plain text files, gedcom >files, xmls, pickled files and so on. Each one store pedigrees in a different >way. For this reason is better to do a Pedigree class that doesn't care about >pedigree format. if source is a list, you can define a factory function like >those: > > Ah, yes, the joys of writing good documentation! >you only save some parameter declarations. but in taht way you make your code >harder to understand > > I will look at them when I have some more time. Given that I have not really done anything with the config file yet maybe we should just not worry about it for the moment. I am still not sure that it will help much to have a configuration file, at least for the CLI. >I attach two files, an xml and a little class to parse it. To support both >options is not a good idea. decide what's the better for you; however this is >a secondary aspects. > > What you are proposing is more like, say, Numarray or PyGSL. Well, even R for that matter. Let's think about this some more. From the development point of view it would not be TOO hard to start off by wrapping the individual "libraries" that I have (pyp_io, pyp_nrm, etc.) into classes. Then we would get: myped = Pedigree(<stuff>) mymetrics = Metrics(myped) f_a = mymetrics.effective_founders_lacy() I kind of like that! >Ok, sorry. Regarding this point. I thinked that there are two different >strategies in structuring classes of pypedal. One is to build a pedigree >class that contains all calculation methods. The other is to build a pedigree >class that only store a pedigree and different classes that take a pedigree >instance as __init__ argument and perform specific calculation. > > See the note below on pad_id() >Than -1 can be a reasonable value? So animals without "by" in sorting pedigree >comes allways before all others > > Boichard, D., Maignel, L., and Verrier, E. 1997. The value of using probabilities of gene origin to measure genetic variability in a population. Genetics, Selection, Evolution. 29:5-23. >Have you an article or a something else that can be attached in an e-mail that >can me explain the Boichard algorithms? I googled for that but I didn't find >nothing. > > The pad_id() function is used by the fast_reorder() procedure in pyp_utils. It is basically a hash function that uses an individual's birthdate to insure that parents will precede their offspring when a sort is performed on the padded ID. This makes reordering the pedigree MUCH faster ( O(n) ) than reordering by inspection, which can require multiple passes through the pedigree. That is one of the reasons that there is a default birthyear of 1900. Given that, using a default of "-1" for by will not break this routine because any offspring with known bys will sort after the parents. >A question, what is the pad_id() function intended for? > > Right, which would be used by the (so-far nonexistent) peeling module. >At the moment there is not need for managing multiple markers information, but >in the (far?) future the class pedigree can be extended to perform some >calculation on genotype, for example haplotype extimation, linkage >disequilibrium etc... > > I put a note about your feature request on the website. In short, I think that we can use str(hash(<name>)) to assign an animalID from a name. The name could be retained in the name field. This would only break when using names with spaces and reading from a space-delimited text file. > #gs: This concerns the feature I ask for. Do you think it is possible >to use STRING animal's(person's) ID? > John. -- Dr. John B. Cole, Research Geneticist Animal Improvement Programs Laboratory 10300 Baltimore Avenue BARC-West, Building 005, Room 306 Beltsville, Maryland 20705-2350 Telephone: (301) 504-8666 FAX: (301) 504-8092 E-mail: jc...@ai... -- Dr. John B. Cole, Research Geneticist Animal Improvement Programs Laboratory 10300 Baltimore Avenue BARC-West, Building 005, Room 306 Beltsville, Maryland 20705-2350 |