Before you set the pipeline override, you may want to think about how your file server is organized. The pipeline override directory is essentially a window into your file system, so you should make sure that the directories beneath the override directory will contain only files that users of your LabKey system should have permissions to see. On the LabKey side, subfolders inherit pipeline override settings, so once you set the override, LabKey can upload data files from the override directory tree into the folder and any subfolders.
These steps will help you set up the pipeline, including an override directory, for usage on a single computer. For information on setup for a distributed environment, see the next section.
Notice that you also have the option to override email notification settings at this level if desired .
MS2 projects that set a pipeline override can specify a supplemental, read-only directory, which can be used as a repository for your original data files. If a supplemental directory is specified, LabKey Server will treat both directories as sources for input data to the pipeline, but it will create and change files only in the first, primary directory.
By default, pipeline files are not shared. To allow pipeline files to be downloaded or updated via the web server, check the Share files via web site checkbox. Then select appropriate levels of permissions for members of global and project groups.
If you are running LabKey Server on Windows and you are connecting to a remote network share, you may need to configure network drive mapping for LabKey Server so that LabKey Server can create the necessary service account to access the network share. For more information, see labkey.xml Configuration File.
The FASTA root is the directory where the FASTA databases that you will use for peptide and protein searches against MS/MS data are located. FASTA databases may be located within the FASTA root directory itself, or in a subdirectory beneath it.
Selecting the Allow Upload checkbox permits users with admin privileges to upload FASTA files to the FASTA root directory. If this checkbox is selected, the Add FASTA File link appears under MS2 specific settings on the data pipeline setup page. Admin users can click this link to upload a FASTA file from their local computer to the FASTA root on the server.
If you prefer to control what FASTA files are available to users of your LabKey Server site, leave this checkbox unselected. The Add FASTA File link will not appear on the pipeline setup page. In this case, the network administrator can add FASTA files directly to the root directory on the file server.
By default, all subfolders will inherit the pipeline configuration from their parent folder. You can override this if you wish.
When you use the pipeline to browse for files, it will remember where you last loaded data for your current folder and bring you back to that location. You can click on a parent directory to change your location in the file system.
You can specify default settings for X! Tandem, Sequest or Mascot for the data pipeline in the current project or folder. On the pipeline setup page, click the Set defaults link under X! Tandem specific settings, Sequest specific settings, or Mascot specific settings.
The default settings are stored at the pipeline override in a file named default_input.xml. These settings are copied to the search engine's analysis definition file (named tandem.xml, sequest.xml or mascot.xml by default) for each search protocol that you define for data files beneath the pipeline override. The default settings can be overridden for any individual search protocol. See Search and Process MS2 Data for information about configuring search protocols.
The pipeline that is installed with a standard LabKey installation runs on a single computer. Since the pipeline's search and analysis operations are resource-intensive, the standard pipeline is most useful for evaluation and small-scale experimental purposes.
For institutions performing high-throughput experiments and analyzing the resulting data, the pipeline is best run in a distributed environment, where the resource load can be shared across a set of dedicated servers. Setting up the LabKey pipeline on a server cluster currently demands some customization as well as a high level of network and server administrative skill. If you wish to set up the LabKey pipeline for use in a distributed environment, contact LabKey.