Table of Contents |
guest 2024-05-02 |
This project and its two subfolders (demoset and ms1ms2matching ) contain sample data loaded into LabKey/CPAS to illustrate label-free quantitation approaches. This data was published with the PEPPeR study from the Broad Institute; see the details page for a description of the data sets and acknowlegement of the sources.
"MS2" (MS/MS) analysis searches were done on all 50 mzXML files using the same X!Tandem search parameters, followed by running the TPP tools PeptideProphet, XPRESS, and ProteinProphet.
The demoset folder demonstrates a label-free approach using spectra counts. In that folder are 50 MS2 runs, evenly divided into an Alpha group and a Beta group based on the input sample. (For a complete description of this example, see the documentation topic on label-free quantitation. )In this folder is a 6-run subset from the same two samples, shown in the list below.
To demonstrate spectral counting using either the 6-run set or the 50-run set, do the following:
The documentation topic on the ms1/ms2 matching approach to this example data has not yet been written, but the underlying MS1 facilities are documented under the topic MS1 Pipelines .
The mzXML files provided with the PePPER paper included both MS1 and MS2 scan data. The first challenging task was to get an MS2 search protocol that correctly identified the 12 proteins spiked into the samples. The published data did not include the fasta file to use as the basis of the search, so that had to be created from the descriptions in the paper. The paper did provide the search parameters used by the authors, but these were given for the SpectrumMill search engine, which is not freely available nor accessible from CPAS. So the SpectrumMill parameters were translated into their approximate equivalents on the X!Tandem search engine that is included with CPAS.
The PePPER paper gives the following information about the protein database against which they conducted their search:
Data from the Scale Mixes and Variability Mixes were searched against a small protein database consisting of only those proteins that composed the mixtures and common contaminants... Data from the mitochondrial preparations were searched against the International Protein Index (IPI) mouse database version 3.01 and the small database mentioned above.
It proved difficult to replicate this search database combination. The spiked proteins are identified in the paper by names that do not generally resolve to a single protein sequence when using the common search engines such as Expasy and Entrez . Below is a link to the list of the proteins mixture as described in the paper, along with the SwissProt identifiers used in the target search fasta file:
As in the PEPPeR study, the total search database consisted of
Using different identifier formats for the three sets of sequences in the search database had the side effect of making it very easy to distinguish expected from unexpected proteins.
The sample data in this project is from the sample data published with the following paper:
Jacob D. Jaffe, D. R. Mani, Kyriacos C. Leptos, George M. Church, Michael A. Gillette, and Steven A. Carr, "PEPPeR, a Platform for Experimental Proteomic Pattern Recognition", Molecular and Cellular Proteomics; 5: 1927 - 1941, October 2006
The data itself consists of 50 mzXML files described in the paper as the "Variability Mix" and downloaded from the Tranche service of the Proteome Commons at the following address:
http://www.proteomecommons.org/data/show.jsp?id=716
The data sets are derived from two sample protein mixes, alpha and beta, with varied concentrations of a specific list of 12 proteins. The samples were run on a Thermo Fisher Scientific LTQ FT Ultra Hybrid mass spectrometer. The resulting datafiles were converted to the mzXML format that was downloaded from Tranche.
We are indebted to the PePPER team at the Broad Institute for this sample data, as it is very useful for demonstrating and comparing multiple approaches to label-free quantitation.
NOTE: An earlier version of this page described the mix as "spiked into a background of mitochondrial mouse proteins". The Broad Institute informed me that the mzXML files posted on Tranche were mixes of the 12 proteins only.