Dear SAINT team,
I would like to get your help to resolve my issue. Currently, I’m struggling with input dataset from mouse tissue, where the input counts are extremely high (max spectral counts 17,032,556,773).
# Input interaction file
$ grep Q9QYR6 inter.txt
BirA-3 BirA Q9QYR6 411824140
BirA-2 BirA Q9QYR6 665427471
BirA-1 BirA Q9QYR6 487128295
Bait-3 Bait Q9QYR6 318940967
Bait-2 Bait Q9QYR6 338349685
Bait-1 Bait Q9QYR6 323758541
# Output `list.txt`
$ column -ts $'\t' list.txt | head -n 3
Bait Prey PreyGene Spec SpecSum AvgSpec NumReplicates ctrlCounts AvgP MaxP TopoAvgP TopoMaxP SaintScore logOddsScore FoldChange BFDR boosted_by
Bait Q5SWU9 Acaca 0|0|0 0 0.00 3 7924|60744|33296 0.00 0.00 0.00 0.00 0.00 -inf 0.00 0.47
Bait Q9QYR6 Map1a 42791|52853|10701 40809 13603.00 3 61452|40463|64743 0.00 0.00 0.00 0.00 0.00 -inf 1.15 0.43
In the example above, the output table showed completely wrong values in the columns “spec”, “specsum”, “ctrlcounts”, and etc no matter what the prey is (my example prey is Uniprot ID "Q9QYR6" but actually the spectral counts of the prey “Q5SWU9” were not zero in interaction file). This issue was found in both SAINTexpress versions, v3.6.1 and v3.6.3 and not fixed when I re-ran using an input interaction file where gene symbol was given instead of uniprot ID.
After some discussion and testings, I figured out that SAINT correctly calculated everything with million-fold downscaled (counts * 1/10^6) input counts, as shown below:
# INPUT interaction file - 1/10^6 downscaled
$ grep Q05920 inter_1e+06_downscaled.txt
BirA-1 BirA Q05920 1677
BirA-2 BirA Q05920 2163
BirA-3 BirA Q05920 1524
DPP6-1 DPP6 Q05920 17033
DPP6-2 DPP6 Q05920 16874
DPP6-3 DPP6 Q05920 13736
# SAINT OUTPUT
$ grep Q05920 uniprot_v3.6.3_inter_1e+06_downscaled_list.txt
DPP6 Q05920 Pc 17033|16874|13736 47643 15881.00 3 1677|2163|1524 1.00 1.00 1.00 1.00 1.00 93.22 8.88 0.00
FYI, “Q05920” was validated as the most counted prey. However, this manipulation was not optimal due to the fact that considerable number (> 1,400) of interactions returned zero count after downscaling.
My alternatives were transforming each count to log-scale or leaving one or two decimal digits after million-fold downscaling to avoid making zero counts. In both cases, the input datasets become float. I tried testing the input with SAINTexpress-int instead of SAINTexpress-spc. Actually, I'm not fully sure whether it's acceptable. Unfortunately, both of the alternatives also gave disagreed (transformed) counts between the input interaction file and the SAINT output table.
# Log-transformed INPUT with 1 decimal digits for uniprot ID Q91YT8
$ grep Q91YT8 inter_1_downscaled.txt
BirA-1 BirA Q91YT8 9.9
BirA-2 BirA Q91YT8 9.8
BirA-3 BirA Q91YT8 9.8
Bait-1 Bait Q91YT8 11.5
Bait-2 Bait Q91YT8 11.4
Bait-3 Bait Q91YT8 11.9
$ grep Q91YT8 uniprot_v3.6.3_inter_1_downscaled_list.txt
Bait Q91YT8 Tmem63a 0.296|0.279|0.376 0.951 0.317 3 0.104|0.097|0.097 0.927 1.000 0.927 1.000 0.927 1.403 3.184 0.010
The example above is one of the preys.
I tried with default setting (SAINTexpress-spc for integers or SAINTexpress-int for floats) but adding -L3 and -R3 ended up being identical results to the default command.
I'm suspecting the root cause is high spectral counts since SAINTexpress had absolutely no problem with my previous dataset from cell line (max spectral counts < 5,000).
Looking forward to hearing your suggestions.
Best,
Mira
By any chance, is it possible to edit my ticket? I'd like to hide my bait name in the middle. Please delete my ticket if it's impossible so I can resubmit after editing.