Table of Contents

guest
2024-05-02
Examples of Label-Free Quantitation Approaches
   Preparing for MS2 Searches
   Sample Data

Examples of Label-Free Quantitation Approaches


This project and its two subfolders (demoset and ms1ms2matching ) contain sample data loaded into LabKey/CPAS to illustrate label-free quantitation approaches. This data was published with the PEPPeR study from the Broad Institute; see the details page for a description of the data sets and acknowlegement of the sources.

"MS2" (MS/MS) analysis searches were done on all 50 mzXML files using the same X!Tandem search parameters, followed by running the TPP tools PeptideProphet, XPRESS, and ProteinProphet.

The demoset folder demonstrates a label-free approach using spectra counts.  In that folder are 50 MS2 runs, evenly divided into an Alpha group and a Beta group based on the input sample.  (For a complete description of this example, see the documentation topic on label-free quantitation. )In this folder is a 6-run subset from the same two samples, shown in the list below. 

 To demonstrate spectral counting using either the 6-run set or the 50-run set, do the following:

  1. Select the checkbox for "all runs in list" in the upper left of the MS2 Runs grid.
  2. Press the Compare button at the bottom of the grid.  Choose the "Spectra Count" option.
  3. Choose the third Grouping Option, "Peptide sequence, ProteinProphet protein assignment".
  4. Choose the middle filtering option, " All peptides with PeptideProphet probability >= " and fill in 0.75 as the value.
  5. Press go.
  6. When the result data set is visible, press the Views button and select DoExampleRunScoring.
  7. In this folder, the example spectra counting calculation will begin immediately.  In the demoset folder, the calculation runs as a pipeline job.  Press Start Job to start this, and wait for the job status messages.
  8. When the results are visible, scroll down to see the graph.
  9. To demonstrate the implementation of a published counting algorithm, try the DoSASPECT view in step 7. (There is no graph in the result.)

 The documentation topic on the ms1/ms2 matching approach to this example data has not yet been written, but the underlying MS1 facilities are documented under the topic MS1 Pipelines .




Preparing for MS2 Searches


The mzXML files provided with the PePPER paper included both MS1 and MS2 scan data. The first challenging task was to get an MS2 search protocol that correctly identified the 12 proteins spiked into the samples. The published data did not include the fasta file to use as the basis of the search, so that had to be created from the descriptions in the paper. The paper did provide the search parameters used by the authors, but these were given for the SpectrumMill search engine, which is not freely available nor accessible from CPAS. So the SpectrumMill parameters were translated into their approximate equivalents on the X!Tandem search engine that is included with CPAS.

Creating the right FASTA file

The PePPER paper gives the following information about the protein database against which they conducted their search:

Data from the Scale Mixes and Variability Mixes were searched against a small protein database consisting of only those proteins that composed the mixtures and common contaminants... Data from the mitochondrial preparations were searched against the International Protein Index (IPI) mouse database version 3.01 and the small database mentioned above.

It proved difficult to replicate this search database combination. The spiked proteins are identified in the paper by names that do not generally resolve to a single protein sequence when using the common search engines such as Expasy and Entrez . Below is a link to the list of the proteins mixture as described in the paper, along with the SwissProt identifiers used in the target search fasta file:

Pepper SpikedProteins.tsv

As in the PEPPeR study, the total search database consisted of

  1. the spiked proteins as listed in the table, using SwissProt identifiers
  2. the Mouse IPI fasta database, using IPI identifiers
  3. the cRAP list of common contaminants from thegpm.org, minus the proteins that overlapped with the spiked proteins (including other species versions of those spiked proteins. This list used a different format of Swiss-prot identifiers.

Using different identifier formats for the three sets of sequences in the search database had the side effect of making it very easy to distinguish expected from unexpected proteins.

 




Sample Data


The sample data in this project is from the sample data published with the following paper:

Jacob D. Jaffe, D. R. Mani, Kyriacos C. Leptos, George M. Church, Michael A. Gillette, and Steven A. Carr, "PEPPeR, a Platform for Experimental Proteomic Pattern Recognition", Molecular and Cellular Proteomics; 5: 1927 - 1941, October 2006

The data itself consists of 50 mzXML files described in the paper as the "Variability Mix" and downloaded from the Tranche service of the Proteome Commons at the following address:

http://www.proteomecommons.org/data/show.jsp?id=716

The data sets are derived from two sample protein mixes, alpha and beta, with varied concentrations of a specific list of 12 proteins. The samples were run on a Thermo Fisher Scientific LTQ FT Ultra Hybrid mass spectrometer. The resulting datafiles were converted to the mzXML format that was downloaded from Tranche.

We are indebted to the PePPER team at the Broad Institute for this sample data, as it is very useful for demonstrating and comparing multiple approaches to label-free quantitation.

NOTE: An earlier version of this page described the mix as "spiked into a background of mitochondrial mouse proteins". The Broad Institute informed me that the mzXML files posted on Tranche were mixes of the 12 proteins only.