From: Matt C. <mat...@va...> - 2008-04-01 02:59:03
|
Hi Darren. When we settle on some of these top-level design issues I'll feel more comfortable jumping in and contributing code, so please excuse this long discussion over rather trivial details. It should sync us up (and perhaps future readers) as to the intended purpose and scope of the library. :) Responses inline. Kessner, Darren E. wrote: > data: data file abstraction (file I/O -> basic data structures) > > analysis: anything on top of the basic data structures > > I'd still rather have vendor readers outside msdata, since the data > model is independent of any plug-ins: > > - data > - msdata > - msdata_vendor_readers > > Yes, I agree we should use vendor name suffixes to avoid collisions, so > 'rawfile' -> 'util/vendor_access/Thermo_RAW' or something along those > lines. > > For analysis module naming, we can start with prefixes for reusable libs > (e.g. peaks -> FT_frequency or something like that) and then deepen the > hierarchy as needed. If there are only one or two FT-specific libs, I > don't think there's a need to create a subdir for them -- on the other > hand, I'm not absolutely opposed to an FT subdir. Msdata is independent of the readers, but the readers are not independent of msdata. I think that's a good reason to make the vendor_readers a subdirectory of msdata in the actual file layout. This is somewhat hypocritical because of my previous argument about not having the file layout reflect the dependency hierarchy if the dependencies are documented elsewhere, so I'll amend it. Dependent code that is on the same root branch (data/analysis/utility/build/etc.) as its dependency should be in the dependency's directory or in a subdirectory of it. With regard to prefix designations to indicate technology-specificity (FT_blah), I prefer staying away from that when it's feasible. Why do you incline toward a flat hierarchy with prefixes instead of a conventional hierarchy (/FT/blah)? Does it cause some maintenance issues I haven't considered? It would mean a few extra Jamfiles, but it would also allow those Jamfiles to wrap up the various technology-specific functionality into a single library, although I'm not sure that's actually of any consequence. Also, are you saying that technology-specific code can include both data ("basic data structures") and analysis (anything else on top of that)? Can you give a few concrete examples of that? I'm thinking that having tech-specific data structures shared in the core data directory could possibly mean easier reusability between the various tech-specific analysis libraries, but if we take the conventional hierarchy route we can put that shared code in the tech-specific root directory instead of the core data directory. For instance: - data - msdata - analysis - FT (shared FT-related data structures and functions go in here) - "domain" as combination of freq/transient or separate "frequency" and "transient" directories (analysis code dependent on FT data structures and functions) versus - data - msdata - FT (data structures and functions used by /analysis/FT, and possibly as independently useful code) - analysis - FT (analysis code dependent on /data/FT) If we (can) eliminate the prospect of alternative core "data" code by assuming that such alternatives would be technology-specific code in /analysis/SomeTech, we can flatten /data/msdata to just /msdata. Is there an example of some generic "data" code that would break that assumption and make me look like an ass? ;) > Just a second response regarding the library name... I agree that a > non-proteo name would appeal to more users. > > My ideal course of action would be to get Josh on board fully, get > msconvert to the point where it can effectively replace ReAdW, and then > reincarnate the project as "the larger collaborative effort" with a new > name. Or maybe just a funky symbol and it can be TPFKAPW (the project > formerly known as...). How about we continue to overload Microsoft > names and use MSIL (mass spec interface layer) ;) Great! I'm glad we can come up with a name collaboratively. It's easier and more natural to promote the effort that way (just like mzML tries to be as inclusive as possible and, of course, went through several name changes). > Actually, your proposal is pretty close to the overall dependency level > layout (in docs and poster), so let's make it explicit: > > pwiz > - utility > - as you proposed > - data > - msdata (main data structures, but nothing vendor-specific) > - msdata_readers (vendor-specific Reader plugins for msdata) > - other data file stuff (e.g. transients, peak data, etc.) > - analysis > - as you proposed > - tools > Do we want: proteome vs. proteomics vs. proteomic? I can't decide between the first two. Also: - analysis - proteomics (analysis algorithms specific to proteomics) - genomics (mystery category) - tools - proteomics (tools specific to proteomics) - genomics (guess what goes here!) or - analysis (analysis algorithms that are theoretically generic to all mass spec, e.g. peak picking and deisotoping) - tools (tools that are theoretically generic to all mass spec) - proteomics - analysis (analysis algorithms specific to proteomics) - tools (tools specific to proteomics) - genomics - analysis - tools The first approach above would probably keep our project tidier in the long run (especially to people who don't want the analysis bits, only the basic I/O). -Matt |