Compare ProteinProphet

2024-04-25

Premium Feature — Available with all Premium Editions of LabKey Server. Learn more or contact LabKey.

This topic describes how to interpret run comparison results based on the protein assignments made by ProteinProphet.

There are a number of options for how to perform the comparison.

Protein Group Filters
Peptide Filters
Inclusion Criteria
Protein Group Normalization

Protein Group Filters

These filters allow you to filter data based on protein group criteria, such as ProteinProphet probability. You can also create a custom grid view for filter groups based on other data, like quantitation ratios or other protein group properties.

Peptide Filters

These filters allow you to exclude protein groups based on the peptides that have been assigned to each group. A protein group must have at least one peptide that meets the criteria to qualify for the comparison. You may choose not to filter, to filter based on a PeptideProphet probability, or to create a custom grid view to filter on other peptide properties, like charge state, scoring engine specific scores, quantitation ratios, and more.

Inclusion Criteria

This setting lets you choose whether to see protein results for a run, even if the results don't meet your filter criteria for that run. Consider a scenario in which run A has protein P1 with ProteinProphet probability 0.97, and protein P2 with probability 0.71, and run B has protein P1 with ProteinProphet probability 0.86, and P2 with probability 0.25. Assume that you set a protein group probability filter of 0.9. Protein P2 will not be shown in the comparison because it doesn't meet the filter in either run. P1 will be included because it meets the threshold in run A. This option lets you choose if it is also shown in run B, where it didn't meet the probability threshold. Depending on your analysis, you may wish to see it, or to exclude it.

Protein Group Normalization

This option allows you to normalize protein groups across runs, where there may be runs that do not share identical ProteinProphet protein/protein group assignments. Consider the following scenario:

Run name	Protein group	Proteins	Probability
A	1	a	1.0
A	2	b, c	1.0
A	3	d	0.95
A	4	e, f, g	0.90
B	1	a, b	1.0
B	2	d	1.0
B	3	e	0.94
B	4	h	0.91

If you do not choose to normalize protein groups, the comparison result will show one row per protein, even if there are multiple proteins assigned to a single protein group. This has the advantage of unambiguously aligning results from different runs, but has the disadvantage of presenting what is likely an inflated set of protein identifications. The results would look like this:

Protein	Run A Group	Run A Prob	Run B Group	Run B Prob
a	1	1.0	1	1.0
b	2	1.0	1	1.0
c	2	1.0
d	3	0.95	2	1.0
e	4	0.90	3	0.94
f	4	0.90
g	4	0.90
h			4	0.91

Note that this result presents proteins e, f, and g as three separate rows in the result, even though based on the ProteinProphet assignments, it is likely that only one of them was identified in run A, and only e was identified in run B.

If you choose to normalized protein groups, LabKey Server will align protein groups across runs A and B based on any shared protein assignments. That is, if a group in run A contains any of the same proteins as a group in run B, it will be shown as a single, unified row in the comparison. This has the advantage of aligning what were likely the same identifications in different runs, with the disadvantage of potentially misaligning in some cases. The results would look like this:

Proteins	Run A Group Count	Run A First Group	Run A Prob	Run B Group Count	Run B First Group	Run B Prob
a, b, c	2	1	1.0	1	1	1.0
d	1	3	0.95	2	2	1.0
e, f, g	1	4	0.90	1	3	0.94
h				1	4	0.91

The group count column shows how many protein groups were combined from each run to make up the normalized group. For example, run A had two groups, 1 and 2, that shared the proteins a and b with group 1 from run B, so those groups were normalized together. Normalization will continue to combine groups until there are no more overlapping identifications within the set of runs to be compared.