Princeton LC-MS/MS Data Viewer
v.0.0.1 (alpha)


NOTE: We are currently working on PVIEW2 which has an integrated MS/MS ID algorithm. It's a lot easier to use. Please email Zia Khan if you want to give it a try.

This software implements the algorithms described in the following paper:

Please note this implementation is an alpha version. In its current form, it is for developers only. Significant usability improvements are necessary and will be integrated within the year. We are looking for collaborators to help improve this software and make it easier to use.

Currently, only Thermo LTQ-Orbitrap and LTQ-FT-ICR instruments are supported. X! Tandem is used as primary MS/MS search algorithm and OMSSA is used as a secondary MS/MS search algorithm.

License

The Princeton LC-MS/MS Data Viewer is freely available to academic researchers. For commercial licensing, please contact Laurie Tzodikov, Technology Licensing Associate, Princeton University. Please read the copyright notice below.

Copyright 2009, Princeton University. All rights reserved. By using this software the USER indicates that he or she has read, understood and will comply with the following:

Princeton University hereby grants USER nonexclusive permission to use, copy and/or modify this software for internal, noncommercial, research purposes only. Any distribution, including commercial sale or license, of this software, copies of the software, its associated documentation and/or modifications of either is strictly prohibited without the prior consent of Princeton University. Title to copyright to this software and its associated documentation shall at all times remain with Princeton University. Appropriate copyright notice shall be placed on all software copies, and a complete copy of this notice shall be included in all copies of the associated documentation. No right is granted to use in advertising, publicity or otherwise any trademark, service mark, or the name of Princeton University.

This software and any associated documentation is provided "as is"

PRINCETON UNIVERSITY MAKES NO REPRESENTATIONS OR WARRANTIES, EXPRESS OR IMPLIED, INCLUDING THOSE OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, OR THAT USE OF THE SOFTWARE, MODIFICATIONS, OR ASSOCIATED DOCUMENTATION WILL NOT INFRINGE ANY PATENTS, COPYRIGHTS, TRADEMARKS OR OTHER INTELLECTUAL PROPERTY RIGHTS OF A THIRD PARTY.

Princeton University shall not be liable under any circumstances for any direct, indirect, special, incidental, or consequential damages with respect to any claim by USER or any third party on account of or arising from the use, or inability to use, this software or its associated documentation, even if Princeton University has been advised of the possibility of those damages.

Source Code

We request you to read the above copyright notice before downloading the software. For commercial licensing, please contact Laurie Tzodikov, Technology Licensing Associate, Princeton University.

Click here to download. Unzip should create a subdirectory ./pview.

Viewer requires Qt version 4.3.x or greater and the Expat XML Parser version 2.0.1 or greater. The R Statistical Programming Language is highly recommended for data analysis, but optional.

Compile by running qmake to generate a Makefile. You might want to edit the Makefile to use the -O3 compiler option. Then, run make.

Our implementation has been tested on Ubuntu Linux 7.04. It should work on other versions of Linux. We have succeeded in compiling the application for Windows using the MinGW port of Qt.

Keyboard Interface

All interaction with the GUI occurs using the following shortcut keys. Planned versions of the Princeton LC-MS/MS Viewer will use a mouse driven interface. Shortcut keys are shown in bold below:

Algorithm Options and Parameters

Algorithm options and parameters are described below based organized by tab in the "Data Load Configuration" dialog box.

Data

Filtering

XICs

Grouping

Memory

Tutorial

Viewer only supports the open mzXML data format. Please convert your Thermo RAW files using ReAdW (requires a working Xcalibur installation). You can download a binary version here (build 4.0.1 Jun 13 2008)

Below is a walk-through tutorial that describes how to use the software. Stable isotope quantification users, please step through the label-free tutorial first.

Label-Free Data

Note that this processing is somewhat involved. This is why we emphasize this alpha version is for developers only. Future versions of this software will have many fewer processing steps.

  1. Click here (305MB) to download a sample label-free data set. Unzip into the ./pview directory. The file should create a ./pview/LabelFree subdirectory.
  2. Look at the contents of the ./pview/LabelFree/mzXML directory. You will find two subdirectories 6_3_c and 6_5_d. These are two strains of S. cerevisiae. Each subdirectory contains two mzXML files. These are 2 replicate runs on an LTQ-FT-ICR for each strain. Note the directory names do not contain any spaces. The names are used by the program to label the replicate set. The names of the files (minus the mzXML) are used to label each replicate. You will also find a file pview.xml. This file saves the last parameter set you used when you loaded the data. We have supplied some parameters that work well for the sample label-free data set.
  3. Run the pview program. Run the menu option "File > Open..." Open the directory LabelFree/mzXML. If you did this correctly you will get the "Data Load Configuration" dialog box. Click "Load data..."
  4. Once the data is loaded the interface should look like screen shot 1. Press Z and draw a box in the data area too zoom into the data. Try to get something that looks like screen shot 2. Press the F key to see the effect of filtering. Press S key to see XICs. You should see something like screen shot 3. The blue dots correspond to the peak in the XIC that is the most intense. Press the O key to see a symbol corresponding to the XICs that have been grouped across replicates and the two strains. It should look like screen shot 4. Cycle through the conditions and the replicates on the left to see how the XICs have been grouped.
  5. Save fragmentation spectra in mzXML format for X! Tandem. Click the "Label Free > Save mzXML" menu option. Use the file dialog to get to the ./pview/LabelFree/pview_out directory. In the "File name:" box you only need to enter the file prefix tandin. If you did this right, you will see the files tandin_[1-5].mzXML.
  6. Now save fragmentation spectra in MGF format for OMSSA ID. Do the same thing for X!Tandem, but use the "Label Free > Save MGF..." menu option. Use the file prefix omin. You might have to wait a few seconds for the output. If you did this right you should see the files omin_[1-5].mgf directory.
  7. The next step entails running X! Tandem to collect protein IDs. The zip file with the sample data set contains an x86_64 binary of X! Tandem. You may need to run make clean and make from the ./LabelFree/tandem-linux-08-02-01-3/bin in order to compile a binary for your platform.
    There are a couple things of note. First, the Perl script ./LabelFree/tandem-linux-08-02-01-3/fasta/setup_fasta.pl demonstrates how to setup the yeast protein database. Note that we modify the meta data in the fasta file to use a simpler naming system instead of all of the protein sequence meta data. Second, the directory ./LabelFree/tandem-linux-08-02-01-3/bin contains fticr.xml which has the parameters for MS/MS data base search and a script run_ids.sh. run_ids.sh generates the files pview_in/tandout_[1-5].xml
    In order to generate protein ids, run the script ./run_ids.sh and make sure that the directory pview_in contains tandout_[1-5].xml.
  8. Next, run the OMSSA database search. The driver script run_ids.sh in ./LabelFree/omssa calls run_omssa.sh. Unlike X! Tandem, the parameters for OMSSA database search are specified at the command line in the run_omssa.sh script. If you ran this correctly the pview_in directory should now also contain the files omout_[1-5].xml.
  9. Now load the X!Tandem IDs by clicking on the "Label Free > Load Tandem" menu option. Change to the directory ./LabelFree/pview_in and highlight all of the tandout_[1-5].xml files. This will load all of the X! Tandem IDs. In order to view them press the O key to turn on the grouped observation display and press the W key to cycle to X!Tandem IDs. The display should look something like screen shot 5.
  10. Load OMSSA IDs by clicking on the "Label Free > Load OMSSA." Change to the directory ./LabelFree/pview_in and highlight all of the omout_[1-5].xml files. Press W to cycle between X!Tandem IDs and OMSSA IDs.
  11. Now you can save the quantitative data collected by the algorithm. Click on the menu option "Label Free > Save CSV" to save the data in CVS table format. Save the file as ./LabelFree/pview_out/pview_out.csv.
  12. Load the file ./LabelFree/pview_out/pview_out.csv in a spreadsheet program. Here is a legend for the columns:
  13. The script ./LabelFree/process.R generates heat map output using quantitative data per protein. It selects a single representative XIC group for a protein. This should be a good starting point for data analysis.

Stable Isotope Labeled Data

Note that this processing is somewhat involved. This is why we emphasize this alpha version is for developers only. Future versions of this software will have many fewer processing steps.
  1. Click here (350MB) to download a sample isotope-labeled data set. Unzip into the ./pview directory. The file should create a ./pview/SILAC subdirectory.
  2. Examine the contents of the ./SILAC/mzXML directory. You will find four mzXML files in the subdirectories S1 and S2. These correspond to four gel fractions divided into sets S1 and S2 sets for convenience. Note you need at least one subdirectory with a set of files. The file pview.xml contains algorithm parameters. We have supplied some parameters that work reasonably with this data set.
  3. Load the data by using the menu option "File > Open..." Click on "Load Data..." The data set should look like screen shot 6. Use the Z key and highlight a region to zoom into the data until you get something like screen shot 7. Press the S key to display XICs and you will see something like screen shot 8. The lines connecting the XICs show paired XICs based on Arg-6 and Lys-6 labels.
  4. Click on "S1" under the "Conditions" list to the left. Go to the menu option "SILAC > Save mzXML..." Save all of the fragmentation spectra for the set "S1" in the ./SILAC/pview_out directory with the file name S1.mzXML. Then, click on "S2" under the "Conditions" list and repeat. Saving the file S2.mzXML in the ./SILAC/pview_out directory.
  5. Now, run X! Tandem. By running the script run_ids.sh in ./SILAC/tandem-linux-08-02-01-3/bin. This will take a little while since we use Arg-6 and Lys-6 as variable modifications. If the script succeeded you should see the files S[1-2].xml in the ./SILAC/pview_in directory. Take note that we supply a simple script ./SILAC/tandem-linux-08-02-01-3/fasta/ipifix.pl that removes some of the meta information the IPI proteome fast a file and lists amino acid sequences by IPI number and gene symbol.
  6. Load protein IDs. Note unlike the label-free data you need to do this condition by condition. First click on the "S1" item in the "Conditions" list to the left. Select the menu option "SILAC > Load Tandem..." Load the file ./SILAC/pview_in/S1.xml. If you did this right you should see MS/MS ids clearly as you zoom into the data, see screen shot 9. Repeat this for the second set by clicking on "S2" item in the "Conditions" list and loading the file ./SILAC/pview_in/S2.xml.
  7. Now create the ratio data sets by using the menu option "SILAC > Save CSV..." Save the CSV files for S1 and S2 in ./SILAC/pview_out. Use the default file names. If you did this right you should have the following two files: ./SILAC/pview_out/p01211.p01212.csv and ./SILAC/pview_out/p01213.p01214.csv
  8. Load the file ./SILAC/pview_out/p01211.p01212.csv in a spreadsheet program. Here is a legend for the columns:
  9. Process and aggregate data by gene symbol. We supply an R script ./SILAC/process.R that aggregates the ratios based on gene symbol and computers a median ratio per gene symbol. This should be a good starting point for data analysis.

Additional Notes

Note that the input data must be collected in centroid mode. If your data was collected in base-lined profile mode, we provide a simple centroiding algorithm in our implementation. Note that there is no guarantee that it will work well on your data.

CVS

Developers are needed to improve the algorithms and the software. We like to keep our software as simple as possible. Improvements that remove code and simplify functionality will get the highest priority. Please email Zia Khan and sign up for a CVS account here.

Planned Improvements

New features and improvements include and are not limited to the following:
Zia Khan
Last modified: Thu Jul 30 13:18:45 EDT 2009