There are 2 key protein scores that are computed in ProteinPilot™ Software: Total ProtScore and Unused ProtScore. Total ProtScore is simply a sum of all the peptide evidence related to a given protein, while Unused ProtScore reflects the amount of total, unique peptide evidence related to a given protein. Because proteins can share homology (similarity at the sequence level), there are often peptides identified in a database search that point to multiple proteins. The Pro Group™ Algorithm tries to resolve the complexity of reporting identified proteins by reporting only proteins that are truly present and not just artifacts of the complexity of protein inference.
After peptide identification is complete and the Pro Group Algorithm has been run to assemble the list of proteins, the Total ProtScore of every protein is computed and ranked. Starting from the top of the list, all the peptide evidence is assigned to the first protein, making the Unused ProtScore the same as the Total ProtScore. Then, each protein in the list is analyzed, and when shared peptide evidence is found for a protein that has already been used in a protein higher up the list, the score for that peptide is removed from the protein that is lower on the list and the Unused ProtScore is recalculated. Therefore, the Unused ProtScore will be less than the Total ProtScore and will reflect only the unique evidence that supports the presence of that protein. For every iteration of this recalculation process, the protein list is re-sorted according to the Unused ProtScore values.
Why is this important? Imagine a scenario where you have a set of protein isoforms that share some amount of sequence homology, and that 9 peptide identifications have been found during the search that associate with these proteins (Figure below). In this example, there are 8 peptides that point to protein A, and because this has the most evidence, it is ranked higher on the protein list. While protein B has 4 peptides associated with it, all of these peptides were already claimed by protein A, so protein B has no unique peptides and an Unused ProtScore of 0. Protein C has 5 peptides associated with its protein sequence; however, as before, 4 of these peptides were previously claimed by protein A. The fifth peptide is unique to protein C, and therefore our confidence in whether protein C has been detected will depend on the confidence of that unique peptide hit, and the Unused ProtScore will be based only on that peptide. Protein C will be ranked much lower on the protein list.
For more detailed information, refer to the help documentation installed with ProteinPilot Software.
Another key element in this process is that peptide confidences are recalculated during the grouping process. As correct peptides are assigned to proteins, the distribution of correct peptides is depleted relative to the distribution of incorrect peptides, which requires a recalculation of confidence to maintain accuracy.