Princeton LC-MS/MS Data Viewer v.0.0.1 (alpha)
|
|
NOTE: We are currently working on PVIEW2 which has an integrated MS/MS ID algorithm. It's
a lot easier to use. Please email Zia Khan if you
want to give it a try.
This software implements the algorithms described in the following paper:
Please note this implementation is an alpha version. In its
current form, it is for developers only. Significant
usability improvements are necessary and will be integrated within the
year. We are looking for collaborators to help improve this software and
make it easier to use.
Currently, only Thermo LTQ-Orbitrap and LTQ-FT-ICR instruments are supported.
X! Tandem is used as primary MS/MS search
algorithm and OMSSA is used
as a secondary MS/MS search algorithm.
License
The Princeton LC-MS/MS Data Viewer is freely available to academic
researchers. For commercial licensing, please contact Laurie Tzodikov,
Technology Licensing Associate, Princeton University. Please read the copyright
notice below.
Copyright 2009, Princeton University. All rights reserved.
By using this software the USER indicates that he or she has read,
understood and will comply with the following:
Princeton University hereby grants USER nonexclusive permission
to use, copy and/or modify this software for internal, noncommercial,
research purposes only. Any distribution, including commercial sale
or license, of this software, copies of the software, its associated
documentation and/or modifications of either is strictly prohibited
without the prior consent of Princeton University. Title to copyright
to this software and its associated documentation shall at all times
remain with Princeton University. Appropriate copyright notice shall
be placed on all software copies, and a complete copy of this notice
shall be included in all copies of the associated documentation.
No right is granted to use in advertising, publicity or otherwise
any trademark, service mark, or the name of Princeton University.
This software and any associated documentation is provided "as is"
PRINCETON UNIVERSITY MAKES NO REPRESENTATIONS OR WARRANTIES, EXPRESS
OR IMPLIED, INCLUDING THOSE OF MERCHANTABILITY OR FITNESS FOR A
PARTICULAR PURPOSE, OR THAT USE OF THE SOFTWARE, MODIFICATIONS, OR
ASSOCIATED DOCUMENTATION WILL NOT INFRINGE ANY PATENTS, COPYRIGHTS,
TRADEMARKS OR OTHER INTELLECTUAL PROPERTY RIGHTS OF A THIRD PARTY.
Princeton University shall not be liable under any circumstances for
any direct, indirect, special, incidental, or consequential damages
with respect to any claim by USER or any third party on account of
or arising from the use, or inability to use, this software or its
associated documentation, even if Princeton University has been advised
of the possibility of those damages.
Source Code
We request you to read the above copyright notice before
downloading the software. For commercial licensing, please contact
Laurie Tzodikov,
Technology Licensing Associate, Princeton University.
Click here to download. Unzip should
create a subdirectory ./pview.
Viewer requires Qt version 4.3.x or greater and the Expat XML Parser version 2.0.1 or greater. The R Statistical Programming Language is highly recommended for data analysis, but optional.
Compile by running qmake to generate a Makefile. You might want to edit the Makefile to use the -O3 compiler option. Then, run make.
Our implementation has been tested on Ubuntu Linux 7.04. It should work on other versions of Linux. We have succeeded in compiling the
application for Windows using the MinGW port of Qt.
Keyboard Interface
All interaction with the GUI occurs using the following shortcut
keys. Planned versions of the Princeton LC-MS/MS Viewer will use a mouse
driven interface. Shortcut keys are shown in bold below:
- Z Press Z and draw box too zoom into data.
- X Zoom out.
- F Toggle filtered and raw peak data display.
- 2 Press 2 and draw box to select MS/MS peaks across replicates and display in dialog.
- + Increase size of displayed peaks.
- - Decrease size of displayed peaks.
- R Cycle through range query region used for filtering, XIC construction, etc.
- G Display graph used to build XICs. Only works for filtered peak data.
- A Show region used for aligning to a reference scan or reference set.
- 3 Switch to mode that allows selection of XICs grouped across replicates. Opens dialog that shows MS/MS spectra.
- 4 Same as 3 but opens dialog with MS/MS spectra from groups across conditions.
- 5 Selects XICs across replicates. Shows XICs in a separate dialog.
- W Toggle MS/MS id displayed for label-free data.
- K Cycle through top-k precursor intensity MS/MS ids.
- S Turn XIC display on and off.
- I Display regions used to group XICs in stable-isotope mode or regions used to filter bad XICs in label-free mode
- O Display positions of grouped XICs across replicates and conditions
Algorithm Options and Parameters
Algorithm options and parameters are described below based organized by tab in the "Data Load Configuration" dialog box.
Data
- m/z minimum Minimum m/z (Da) to load from data file.
- m/z maximum Maximum m/z (Da) to load from data file.
- retention time minimum Minimum retention time (seconds ) to load from data file.
- retention time maximum Maximum retention time (second) to load from data file.
- load threads Number of threads to use to load data. One thread per data subdirectory.
Filtering
- filter delta m/z (ppm) Height of orthogonal range query used to filter noise peaks in ppms. (ppm * 1e-6 * m/z = Da)
- filter delta time Width of orthogonal range query used to filter noise peaks.
- peak threshold Minimum number of peaks returned in range queried to keep peak as signal.
- log10(I) threshold Absolute intensity threshold for peaks.
- contaminant ranges m/z (Da) ranges to remove to filter out contaminant signal
XICs
- XIC delta time Width of orthogonal range query used to construct XIC graph.
- XIC delta m/z (ppm) Height of orthogonal range query used to construct XIC graph.
- XIC min Minimum length in seconds of an XIC.
- XIC max Maximum length in seconds of an XIC.
- fat XIC tol (ppm) Filter applied to XICs caused by broad peaks.
- XIC wilih (ppm) Wilih of an XIC.
- precursor time Time window used to re-estimate precursor intensity.
Grouping
- align liime Wilih of query used to align up to translation (label-free only).
- repl group min size Minimum size of a replicate XIC group (label-free only).
- min cond thresh Minimum number of conditions represented by an XIC group (label-free only).
- use SILAC mode Turn on/off SILAC mode. (SILAC only)
- Arg-6 or Lys-6, Lys-8, Arg-10 Turn on/off labels.
- isotope tolerance Tolerance in the m/z dimension used to pair XICs (in PPMS, SILAC only).
- isotope shift Shift used to account for isotope label.
- max label count Maximum number of labels possible (e.g. KR, RR KK, etc.)
Memory
- keep raw peaks Keep raw peak data in memory.
- keep filtered peaks Keep filtered peaks.
- keep XIC peaks Keep chromatogram peaks (2d peaks).
- load MS/MS data Load MS/MS fragmentation spectra.
- Run centroider on profile data For profile data, run a simple centroiding algorithm.
- Run processing algorithms Run algorithms or just server as a viewer (need to hit F to switch to raw peak data).
Tutorial
Viewer only supports the open
mzXML
data format. Please convert your Thermo RAW files using ReAdW (requires a working Xcalibur installation). You can download a binary version here (build 4.0.1 Jun 13 2008)
Below is a walk-through tutorial that describes how to use the
software. Stable isotope quantification users, please step through
the label-free tutorial first.
Label-Free Data
Note that this processing is somewhat involved. This is why we
emphasize this alpha version is for developers only. Future
versions of this software will have many fewer processing steps.
- Click here (305MB) to download a sample label-free data set.
Unzip into the ./pview directory. The file should create a
./pview/LabelFree subdirectory.
-
Look at the contents of the ./pview/LabelFree/mzXML directory. You will find two
subdirectories 6_3_c and 6_5_d. These are two strains of S. cerevisiae.
Each subdirectory contains two mzXML files. These are 2 replicate runs on an LTQ-FT-ICR for each
strain. Note the directory names do not contain any spaces. The names are used by the program to label
the replicate set. The names of the files (minus the mzXML) are used to label each replicate.
You will also find a file pview.xml. This file saves the last parameter set you used
when you loaded the data. We have supplied some parameters that work well for the sample
label-free data set.
-
Run the pview program. Run the menu option "File > Open..." Open the directory
LabelFree/mzXML. If you did this correctly you will get the "Data Load Configuration" dialog box.
Click "Load data..."
-
Once the data is loaded the interface should look like screen shot 1. Press
Z and draw a box in the data area too zoom into the data. Try to get something
that looks like screen shot 2. Press the F key to see
the effect of filtering. Press S key to see XICs. You should see something like screen shot 3.
The blue dots correspond to the peak in the XIC that is the most intense. Press the O key to
see a symbol corresponding to the XICs that have been grouped across replicates and the two
strains. It should look like screen shot 4. Cycle through the conditions
and the replicates on the left to see how the XICs have been grouped.
- Save fragmentation spectra in mzXML format for X!
Tandem. Click the "Label Free > Save mzXML" menu option. Use the
file dialog to get to the ./pview/LabelFree/pview_out
directory. In the "File name:" box you only need to enter the
file prefix tandin. If you did this right, you will see
the files tandin_[1-5].mzXML.
- Now save fragmentation spectra in MGF format for OMSSA ID. Do the same thing
for X!Tandem, but use the "Label Free > Save MGF..." menu option. Use the file
prefix omin. You might have to wait a few seconds for the output.
If you did this right you should see the files omin_[1-5].mgf directory.
- The next step entails running X! Tandem to collect
protein IDs.
The zip file with the sample data set contains an x86_64 binary of X! Tandem. You may
need to run make clean and make from the ./LabelFree/tandem-linux-08-02-01-3/bin
in order to compile a binary for your platform.
There are a couple things of note. First, the Perl script ./LabelFree/tandem-linux-08-02-01-3/fasta/setup_fasta.pl
demonstrates how to setup the yeast protein database. Note that we modify the meta data in the fasta file to
use a simpler naming system instead of all of the protein sequence meta data.
Second, the directory ./LabelFree/tandem-linux-08-02-01-3/bin contains fticr.xml which has the
parameters for MS/MS data base search and a script run_ids.sh. run_ids.sh generates
the files pview_in/tandout_[1-5].xml
In order to generate protein ids, run the script ./run_ids.sh and make sure that the directory
pview_in contains tandout_[1-5].xml.
- Next, run the OMSSA database search. The driver script run_ids.sh in ./LabelFree/omssa
calls run_omssa.sh. Unlike X! Tandem, the parameters for OMSSA database search are specified at the
command line in the run_omssa.sh script. If you ran this correctly the pview_in directory
should now also contain the files omout_[1-5].xml.
- Now load the X!Tandem IDs by clicking on the "Label Free > Load Tandem" menu option. Change to the directory
./LabelFree/pview_in and highlight all of the tandout_[1-5].xml files. This will load all
of the X! Tandem IDs. In order to view them press the O key to turn on the grouped
observation display and press the W key to cycle to X!Tandem IDs. The display should
look something like screen shot 5.
- Load OMSSA IDs by clicking on the "Label Free > Load OMSSA." Change to the directory
./LabelFree/pview_in and highlight all of the omout_[1-5].xml files.
Press W to cycle between X!Tandem IDs and OMSSA IDs.
- Now you can save the quantitative data collected by the algorithm. Click
on the menu option "Label Free > Save CSV" to save the data in CVS table format.
Save the file as ./LabelFree/pview_out/pview_out.csv.
-
Load the file ./LabelFree/pview_out/pview_out.csv in a spreadsheet program.
Here is a legend for the columns:
- id identifier used by the quantification program for the XIC group.
- [T,O]protein[1-5] protein name assigned to the XIC group by X!Tandem (T) or OMSSA (O).
- [T,O]seq[1-5] corresponding amino acid sequence
- [T,O]logevalue[1-5] corresponding log10(e-value) assigned by the algorithm
- mz m/z position in the data set
- retentionTime in the original data
- ms2cnt number of fragmentation spectra in the XIC group
- xiccnt number of XICs in the XIC group
- cond.[directory name].rep.[mzXML file prefix] All of the columns with quantitative data use the subdirectory
name for the replicate set and the file prefix to identify the actual data set process
- The script ./LabelFree/process.R generates heat map output
using quantitative data per protein. It selects a single representative XIC group
for a protein. This should be a good starting point for data analysis.
Stable Isotope Labeled Data
Note that this processing is somewhat involved. This is why we
emphasize this alpha version is for developers only. Future
versions of this software will have many fewer processing steps.
- Click here (350MB) to download a sample isotope-labeled data set. Unzip into the
./pview directory. The file should create a ./pview/SILAC subdirectory.
- Examine the contents of the ./SILAC/mzXML directory. You will find four mzXML files
in the subdirectories S1 and S2. These correspond to four gel fractions divided
into sets S1 and S2 sets for convenience. Note you need at least one subdirectory with a
set of files. The file pview.xml contains algorithm parameters. We
have supplied some parameters that work reasonably with this data set.
- Load the data by using the menu option "File > Open..."
Click on "Load Data..." The data set should look like screen shot 6.
Use the Z key and highlight a region to zoom into the data until
you get something like screen shot 7.
Press the S key to display XICs and you will see something like screen shot 8.
The lines connecting the XICs show paired XICs based on Arg-6 and Lys-6 labels.
- Click on "S1" under the "Conditions" list to the left. Go to the menu option "SILAC > Save mzXML..."
Save all of the fragmentation spectra for the set "S1" in the ./SILAC/pview_out directory
with the file name S1.mzXML. Then, click on "S2" under the "Conditions" list
and repeat. Saving the file S2.mzXML in the ./SILAC/pview_out directory.
- Now, run X! Tandem. By running the script run_ids.sh in
./SILAC/tandem-linux-08-02-01-3/bin. This will
take a little while since we use Arg-6 and Lys-6 as variable modifications.
If the script succeeded you should see the files
S[1-2].xml in the ./SILAC/pview_in directory. Take note
that we supply a simple script ./SILAC/tandem-linux-08-02-01-3/fasta/ipifix.pl that
removes some of the meta information the IPI proteome fast a file and lists amino acid sequences
by IPI number and gene symbol.
- Load protein IDs. Note unlike the label-free data you need to do this condition by condition.
First click on the "S1" item in the "Conditions" list to the left. Select the menu option
"SILAC > Load Tandem..." Load the file ./SILAC/pview_in/S1.xml. If you did this right
you should see MS/MS ids clearly as you zoom into the data, see screen shot 9.
Repeat this for the second set by clicking on "S2" item in the "Conditions" list
and loading the file ./SILAC/pview_in/S2.xml.
- Now create the ratio data sets by using the menu option "SILAC > Save CSV..." Save
the CSV files for S1 and S2 in ./SILAC/pview_out.
Use the default file names. If you did this right you should have the following two files:
./SILAC/pview_out/p01211.p01212.csv and ./SILAC/pview_out/p01213.p01214.csv
-
Load the file ./SILAC/pview_out/p01211.p01212.csv in a spreadsheet program.
Here is a legend for the columns:
- id identifier used by the quantification program for the XIC group.
- mz[H,L] m/z of the heavy (H) and light (L) XICs in the isotope pair
- rt[H,L] retention time of the heavy (H) and light (L) paired XICs
- protein[H,L] heavy (H) and light (H) protein IDs
- seq[L,H] heavy and light amino acid sequence
- logevalue[L,H] corresponding log10(e-value) assigned by X! Tandem
- [H,L]ms2 number of fragmentation spectra in heavy (H) and light (L) XICs
- [H,L]log2ratio log2 ratio computed by trapezoidal integration fo the XICs
- Process and aggregate data by gene symbol. We supply an R script ./SILAC/process.R that
aggregates the ratios based on gene symbol and computers a median ratio per
gene symbol. This should be a good starting point for data analysis.
Additional Notes
Note that the input data must be collected in centroid mode. If
your data was collected in base-lined profile mode, we provide a
simple centroiding algorithm in our implementation. Note that
there is no guarantee that it will work well on your data.
CVS
Developers are needed to improve the algorithms and the
software. We like to keep our software as simple as
possible. Improvements that remove code and simplify functionality
will get the highest priority. Please email Zia Khan and sign up for a CVS
account here.
Planned Improvements
New features and improvements include and are not limited to the following:
- Integrated protein MS/MS database search algorithm
- Metabolite data processing
- Lipid data processing
- Improved mouse and menu interface
- Fewer algorithm parameters.
Zia Khan
Last modified: Thu Jul 30 13:18:45 EDT 2009