Libra

Libra is a module within the trans-proteomic pipeline to perform quantification on MS/MS spectra that have multi-reagent labeled peptides. More specifically, at ISB we use Libra on MS/MS spectra of iTRAQ labeled samples.

 

Patrick Pedrioli -- original code author of Quantitation
Andrew Keller -- peptide assignment to proteins within pipeline
Nichole King -- code statistics/maintenance/additions/corrections and satellite applications

 
  • Command line syntax for using Libra
  • The condition file
  • Details about what Libra does
  • How accurate is the quantitation?
  •  

    Command line syntax to use Libra in the pipeline

    To run the pipeline using PeptideProphet, ProteinProphet, and Libra, specify your input files (PepXML files or summary html files), 'Prophet flags, and Libra flag. The libra flag is of form: -L[conditionFile]

     

    For example, to run PeptideProphet retaining identifications with P less than 0.05 (-p0), run PropteinProphet (-Op), and run Libra to return quantitation use:
        xinteract -p0 -Op *.html -Lcondition.xml

     
    You'll find the integrated intensities of the reagent m/z fragment ion lines in your interact.xml file, and you'll find the protein quantitation in your interact-prot.xml file.
    Additionally, there's a tab separated file called quantitation.tsv which you can load into Excel or favorite spreadsheet tool to do your own math if you like. Your web interface to your interact files provides export options too. (If you wanted to tailor the peptide to protein assignment yourselves, use export options on your interact (PepXML) file.)
     

    The condition file

    The condition file specifies the reagent M/Z values, mass tolerance, isotopic correction coefficients, method of centroiding, method of normalization, minimum threshhold intensity (not required), target MS level, and output type.

     
    	The condition.xml elements and attributes (in sequential order) 
            have the following function:
    
    	1. <fragmentMasses> 
    	       <reagent> 
               specifies the m/z values to be used in the analysis.
    
    	2. <isotopicContributions>
               specifies the isotopic contributions of one line to it's adjacent
               lines (specified in fractions).  
    	   The order is:
    		contribution of channel 1 to channel 2
    		contribution of channel 2 to channel 1 and then to channel 3
    		contribution of channel 3 to channel 2 and then to channel 4
    		contribution of channel 4 to channel 3
    
    	   If you do not want to apply isotopic corrections input -2.
    
    	3. Specifies the mass tolerance. If in field one you have choosen
    	   m/z = 114 and you set a mass tolerance of 1, libra would look
    	   for the most intense m/z value in the interval 113 to 115.
    
    	4. Specifies centroiding preferences.
    	   0: none
    	   1: mathematical average
    	   2: intensity weighted average
    	   For 1 and 2 a number of iterations must be specified as well.
    
    	5. Specifies normalization preferences.
                1 ... n: corresponding m/z value in the sorted (ascending order) 
                         list of m/z values specified in field one.
    
                The default is channel 1 if not specified, but you'll want to specify.
    
    
    	6. Specifies the level of the MS scans to use in the analysis.
    	   1: MS
    	   2: MS2
    	   n: MSn
    
    	7. Switch between printing the scan number or the retention time in
    	   the output file.
    	   Retention time can be usefull when a link to the native data is 
    	   required, but the scan numbers in the mzXML format and the native
    	   output from the MS instrument do not correspond.
               [NOTE: haven't checked this feature, it might not be working.]
    
    	8. Name of output file with tab separated information (not active yet,
               default filename is quantitation.tsv)
    
    	9. Minimum threshhold intensity (not required).  If you have low S/N spectra
               you'll want to set this to ignore noise below a certain integrated count
               for a line. 
    
            
    If you would like to generate a condition.xml file, please use http://db.systemsbiology.net/webapps/conditionFileApp/
     
    Here's an example condition file: [NOTE: this changed June, 24, 2006 so please update your condition file using the web application above or information here.]
     
    <SUMmOnCondition description="iTraq">
      <fragmentMasses>
        <reagent mz="114.1">
        <reagent mz="115.1">
        <reagent mz="116.1">
        <reagent mz="117.1">
      </fragmentMasses>
      <isotopicContributions>
        <contributingMz value="1">
            <affected mz="2" correction="0.063"/>
        </contributingMz>
        <contributingMz value="2">
            <affected mz="1" correction="0.02"/>
            <affected mz="3" correction="0.06"/>
        </contributingMz>
        <contributingMz value="3">
            <affected mz="2" correction="0.03"/>
            <affected mz="4" correction="0.049"/>
        </contributingMz>
        <contributingMz value="4">
            <affected mz="3" correction="0.04"/>
        </contributingMz>
      </isotopicContributions>
      <massTolerance value="0.2"/>
      <centroiding type="2" iterations="1"/>
      <normalization type="4"/>
      <targetMs level="2"/>
      <output type="1"/>
      <quantitationFile name="quantitation.tsv"/>
      <minimumThreshhold value="20"/>
    </SUMmOnCondition>
              
     

    Details about what Libra does

     

    Given conditions, Libra integrates the intensities of the reagent m/z lines (a.k.a. channels throughout documentation) in an MS/MS spectrum and stores the values at the peptide level in the interact.xml file. ProteinProphet, within the trans-proteomic pipeline, infers the simplest list of proteins consistent with the identified peptides. (Note that peptides with PeptideProphet probabilities less than 0.5 are excluded in Libra.)

    Protein quantitation is derived from the group of peptides associated with the protein. Each peptide integrated intensity is normalized by the sum of it's channel intensities, the normalized channels are averaged over all peptides of a protein, the standard deviation of the mean is determined for each normalized channel of a peptide, normalized channels more than 2 sigma from the mean are removed, the average channels of the protein are recalculated for those channels surviving outlier removing, and the 1-sigma standard errors are calculated using the standard deviation. If the user has specified a reference normalization channel, the protein quantitation is normalized w.r.t. that channel, and the errors become the channel error and the reference channel error added in quadrature. The value 99.99 indicates that a protein's quantition was calculated using only peptide, and so the standard error is infinite. The value -9.0 indicates that no peptides of the protein survived the threshhold filter and outlier removal, so the protein quantitation is undefined. (One day, would like to use intensity weighted mean and errors in calcs.)

    When a reagent m/z (channel) wasn't found in the peptide spectrum, that reagent m/z is replaced with the default value. When the intensity of a reagent line is less than or equal to zero, it's value is replaced with zero. Note, there are still a few loose ends to tie up.

    * Be wary of quantitation from very poor S/N spectra. Is your integrated intensity for a peptide channel less than 20 counts, for example?

     
    An detailed example of the steps going into the quantitation follows. ProteinA has 8 peptides.
     
    The interact.xml file shows the peptide integrated intensities for each channel:
          libra 114   libra 115  libra 116   libra 117
    pep1    67.100      39.153    49.651      47.567
    pep2  2311.460  167071.800  1847.637    1762.466
    pep3  2311.460    1670.718  1847.637    1762.466
    pep4  2311.460    1670.718  1847.637    1762.466
    pep5  2311.460    1670.718  1847.637    1762.466
    pep6  2311.460    1670.718  1847.637    1762.466
    pep7   224.920     231.700   246.938     241.900
    pep8   287.600     293.121   263.173     268.105 
    
     
    Libra normalizes each peptide channel by the sum of that peptide's channels:
          libra 114   libra 115  libra 116   libra 117
    pep1  0.330       0.192      0.244       0.234
    pep2  0.013       0.966      0.011       0.010
    pep3  0.304       0.220      0.243       0.232
    pep4  0.304       0.220      0.243       0.232
    pep5  0.304       0.220      0.243       0.232
    pep6  0.304       0.220      0.243       0.232
    pep7  0.238       0.245      0.261       0.256
    pep8  0.259       0.264      0.237       0.241
    
     
    Determines the mean and standard deviation of the mean:
            libra 114   libra 115  libra 116   libra 117
    mean    0.257       0.318      0.216       0.209
    st dev  0.103       0.262      0.083       0.081
    
    
     
    Removes those that deviate from the mean by more than 2 sigma, which are those outside of the range below in this example:
            libra 114   libra 115   libra 116   libra 117
            0.05-0.46   0.00-0.81  0.06-0.37   0.06-0.36
    
     
    Recalculates the mean and standard deviation (outliers have been removed):
            libra 114   libra 115  libra 116   libra 117
    mean    0.292       0.226      0.245       0.237
    st dev  0.032       0.023      0.008       0.009
    
     
    Re-normalizes with respect to the user selected channel, channel 4 in this example:
                    libra 114   libra 115  libra 116   libra 117
              mean    1.232       0.953      1.034       1.000
    standard error    0.013       0.009      0.004       0.005
    
     

    How accurate is the quantitation?

    Test datasets were created by Anne-Claude Gingras, Patrick Pedrioli, (and Hookeun Lee?) from a 9 protein mix labeled with iTRAQ. The mix treatment was varied slightly, and several measurements were obtained on the Q-TOF, the Qstar, and the TOF-TOF. The preliminary results presented below are from one sample run on the Q-TOF. The expected numbers in the table are the concentrations normalized to channel 4.

     
    Peptides deviating from the mean by more than 2 standard deviations were removed.
    A minimum intensity thresshold of 20 counts was used.  The software additionally
    removes peptides in which a channel mass is not found.  IC is an abbreviation for
    isotopic correction provided in the condition.xml file.
    
                                                              P0.9                  P0.9
                          |     EXPECTED            | LIBRA (w/IC, thresh=20) | Libra (w/IC, thresh=0.01)
                          |                         |                         |
    Name          Species |  114    115   116   117 |  114    115   116   117 |  114    115   116   117
    --------------------- |-------------------------|-------------------------|-------------------------
    cytochrome c          |  .25    1     .25    1  |  0.21   0.88  0.26  1.0 | 0.28   0.91  0.26  1.00
    ovalbumin     Chicken |  1      1      1     1  |  0.87   0.93  1.07  1.0 | 0.90   1.04  1.20  1.00
    transferrin   Bovine  |  8      8      1     1  |  6.16   5.64  1.20  1.0 | 6.01   5.63  1.06  1.00
    beta lactoglobulin    | .125  .125     1     1  |  0.10   0.11  1.08  1.0 | 0.10   0.08  1.10  1.00
    serum albumin Bovine  |  4      1      4     1  |  3.12   0.93  3.91  1.0 | 3.13   0.88  4.19  1.00
    catalase      Bovine  |  0     100    10     1  |  0.98  82.57 11.42  1.0 | 1.15  20.37  4.79  1.00
    
    
     

    Please add to/edit this section. From the table above, we can see a handful of experimental errors. There may be errors in the purity of the protein purchased, errors in the protein concentration placed in the sample, errors in peptide concentration due to incomplete digestion or method of digestion, and errors introduced in peptide acquisition and measurement in mass spectrometer. [The later measurement errors are seen as the standard deviation in the top of this document. These are the smallest uncertainties.] Guesstimating a rough accuracy in the 10 - 25% range using the expected numbers and the measurements above. This is preliminary as haven't analyzed the gazillion other test datasets. ** Note, can see above that will have difficulties with zero intensities. For these cases, please check your quantitation.tsv file until have time to modify code to handle those cases...

     

    last updated Nov 19, 2006, Nichole King