Re: [Psidev-pi-dev] [HUPO-PSI/mzTab] Protein group (#20)

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Certainly for reporting quant data, it is essential that you keep one row per protein group in mzTab, otherwise it ruins downstream statistical processing. If same-set or subset proteins are reported on different lines, the quant data will be repeated, leading to incorrect downstream processing and results.

Even for ident data, I think it is better to keep one row per protein group. It is then completely obvious - how many proteins have been identified? Count the rows. This was a mistake we made in mzid 1.1 of not making the distinction between protein accessions and protein groups sufficiently clear. This is an opportunity to get it right for mzTab, so we shouldn't bend the encoding to fit in with one particular software's preferred way of exporting their data.

If you really want to report extra detail about group members, I would recommend keeping a single row (for ident and quant), but then adding a complicated cell at the end contain key-value pairs for all the extra data.

---
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/HUPO-PSI/mzTab/issues/20#issuecomment-214670405