From: Juan A. V. <not...@gi...> - 2016-04-27 08:00:49
|
As @jgriss said, when we developed the format we wanted to simplify the reporting of the protein inference. The current encoding was designed from the very beginning and actually it did not change during the process. I am in favour of keeping the concept behind mzTab as it is now. And the idea was never to replace mzIdentML (apart from the protein inference, mzTab is looking more and more a flattened version of mzIdentML) or mzQuantML (in this case, mzTab is not that comprehensive by far). However, as it usually happens, life for readers is more complex than for writers. I agree in that some guidelines need to be provided, although this will not avoid the issues of people producing "wrong" files. In PRIDE, we need to be able to interpret the information correctly. So, basically, in the context of protein groups we have two options: - Keep things like they are now. There is one mechanism to "avoid" the fact that the protein accession is unique, by adding [1], [2], .... after the accession number, if this is needed. This was added for quantification purposes mainly (the case explained by @mvaudel but also if different proteoforms were reported), but it can only be applied to identification. Make clear in the guidelines that only one anchor protein and the corresponding ambiguity members need to be reported per row, and avoid the rest of the complexity. The format is lossy in that respect. There is not the need to change the specification, but maybe create a version 1.0.1, amending that paragraph highlighted by Yasset, and adding a new section to clarify this in detail. Of course, there is no way to enforce this in practise, but as not too many people are writing the files yet, I think that we could probably manage that most people would write it in the right way. - If after some time, we see that this is not enough, and there is the need to support Protein Groups, as Andy mentioned before, a new section just for Protein Groups could be added. That extra section would solve properly the problems related to the modelling of protein inference, but the changes would need to be agreed, it would take some time, etc etc. --- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/HUPO-PSI/mzTab/issues/20#issuecomment-215001536 |