saint-apms / Bugs / #17 SAINTexpress with extremely high spectral count

#17 SAINTexpress with extremely high spectral count

Milestone: v1.0 (example)

Status: open

Owner: nobody

Labels: None

Priority: 1

Updated: 2023-04-10

Created: 2023-04-10

Creator: Mira Sohn

Private: No

Dear SAINT team,

I would like to get your help to resolve my issue. Currently, I’m struggling with input dataset from mouse tissue, where the input counts are extremely high (max spectral counts 17,032,556,773).

# Input interaction file
$ grep Q9QYR6 inter.txt
BirA-3  BirA    Q9QYR6  411824140
BirA-2  BirA    Q9QYR6  665427471
BirA-1  BirA    Q9QYR6  487128295
Bait-3  Bait    Q9QYR6  318940967
Bait-2  Bait    Q9QYR6  338349685
Bait-1  Bait    Q9QYR6  323758541

# Output `list.txt`
$ column -ts $'\t' list.txt | head -n 3
Bait  Prey        PreyGene       Spec               SpecSum  AvgSpec   NumReplicates  ctrlCounts         AvgP  MaxP  TopoAvgP  TopoMaxP  SaintScore  logOddsScore  FoldChange  BFDR  boosted_by
Bait  Q5SWU9      Acaca          0|0|0              0        0.00      3              7924|60744|33296   0.00  0.00  0.00      0.00      0.00        -inf          0.00        0.47
Bait  Q9QYR6      Map1a          42791|52853|10701  40809    13603.00  3              61452|40463|64743  0.00  0.00  0.00      0.00      0.00        -inf          1.15        0.43

In the example above, the output table showed completely wrong values in the columns “spec”, “specsum”, “ctrlcounts”, and etc no matter what the prey is (my example prey is Uniprot ID "Q9QYR6" but actually the spectral counts of the prey “Q5SWU9” were not zero in interaction file). This issue was found in both SAINTexpress versions, v3.6.1 and v3.6.3 and not fixed when I re-ran using an input interaction file where gene symbol was given instead of uniprot ID.

After some discussion and testings, I figured out that SAINT correctly calculated everything with million-fold downscaled (counts * 1/10^6) input counts, as shown below:

# INPUT interaction file - 1/10^6 downscaled
$ grep Q05920 inter_1e+06_downscaled.txt
BirA-1  BirA    Q05920  1677
BirA-2  BirA    Q05920  2163
BirA-3  BirA    Q05920  1524
DPP6-1  DPP6    Q05920  17033
DPP6-2  DPP6    Q05920  16874
DPP6-3  DPP6    Q05920  13736

# SAINT OUTPUT
$ grep Q05920 uniprot_v3.6.3_inter_1e+06_downscaled_list.txt
DPP6    Q05920  Pc      17033|16874|13736       47643   15881.00        3       1677|2163|1524  1.00    1.00    1.00    1.00    1.00    93.22   8.88    0.00

FYI, “Q05920” was validated as the most counted prey. However, this manipulation was not optimal due to the fact that considerable number (> 1,400) of interactions returned zero count after downscaling.

My alternatives were transforming each count to log-scale or leaving one or two decimal digits after million-fold downscaling to avoid making zero counts. In both cases, the input datasets become float. I tried testing the input with SAINTexpress-int instead of SAINTexpress-spc. Actually, I'm not fully sure whether it's acceptable. Unfortunately, both of the alternatives also gave disagreed (transformed) counts between the input interaction file and the SAINT output table.

# Log-transformed INPUT with 1 decimal digits for uniprot ID Q91YT8
$ grep Q91YT8 inter_1_downscaled.txt
BirA-1  BirA    Q91YT8  9.9
BirA-2  BirA    Q91YT8  9.8
BirA-3  BirA    Q91YT8  9.8
Bait-1  Bait    Q91YT8  11.5
Bait-2  Bait    Q91YT8  11.4
Bait-3  Bait    Q91YT8  11.9

$ grep Q91YT8 uniprot_v3.6.3_inter_1_downscaled_list.txt
Bait    Q91YT8  Tmem63a 0.296|0.279|0.376       0.951   0.317   3       0.104|0.097|0.097       0.927   1.000   0.927   1.000   0.927   1.403   3.184   0.010

The example above is one of the preys.

I tried with default setting (SAINTexpress-spc for integers or SAINTexpress-int for floats) but adding -L3 and -R3 ended up being identical results to the default command.

I'm suspecting the root cause is high spectral counts since SAINTexpress had absolutely no problem with my previous dataset from cell line (max spectral counts < 5,000).

Looking forward to hearing your suggestions.

Best,

Mira

Discussion

Mira Sohn - 2023-04-10

By any chance, is it possible to edit my ticket? I'd like to hide my bait name in the middle. Please delete my ticket if it's impossible so I can resubmit after editing.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

SAINTexpress with extremely high spectral count

Group

Searches

Help

#17 SAINTexpress with extremely high spectral count

Discussion