Christie Hunter


Computing protein confidence with improved accuracy by reassessing peptide confidence during protein grouping

ProteinPilot software 4.0 and higher releases include a new method for calculating protein confidences that improves reliability at the end of protein lists. Figure 1 shows a simulated example that demonstrates how more accurate protein confidence is computed.

Figure 1. This example simulation describes how more accurate protein confidence is computed.

Other protein confidence calculations in other search engines (including versions of the Pro Group Algorithm prior to ProteinPilot software 4.0) deliver only the results shown on the left in Figure 1. That is, they determine peptide confidences after peptide identification and consider them fixed for protein grouping purposes.

With ProteinPilot software 4.0 and higher, however, correct peptides are assigned to proteins as the grouping process proceeds, as shown in the simulation, which shrinks the distribution of correct peptides that are not yet assigned to proteins, while the incorrect distribution stays relatively fixed (assuming we begin with assigning multi-hit proteins). This means that the value of a peptide with a given score or confidence at the beginning of protein inference is not the same at all points throughout the process, as illustrated in the progression at the right of Figure 1. If we used one of the remaining peptides from the score = 19 bin to make a new protein when we have already assigned 90% of the peptides, we would only have 68% confidence in an identification from that bin, rather than 95% confidence if it was initially assigned after peptide identification.

Effectively, the assignment of good peptides to good proteins depletes the right answers remaining at any given bin. Taking this effect into account can reduce single-hit protein errors at the end of the reported protein list.

Consider the example in Figure 1. After the peptide identification stage, there are 100 peptides with 95% confidence (so 95 correct peptides and 5 incorrect). As the protein grouping proceeds, by the time the 50th protein group is formed, 80 of these peptides have been assigned across the first 50 proteins. As these are all multi-hit proteins, only correct peptides have likely been assigned, which means that of the 100 peptides in the original set that had 95% confidence, only 20 of these are left unassigned to a protein (15 correct peptides and 5 incorrect) at the 50th protein group. This means that at this stage in grouping, we can only have 75% confidence in any of the remaining 20 peptides that originally were given 95% confidence during peptide identification. A new protein in the 51st round of protein inference that cites one of these 95% confidence peptides as evidence should only count as having 75% peptide confidence in the computation of the ProtScore.


Join the discussion

Comment below on this article and our team will answer your questions.


Submit a Comment