Expression Matrix Assay Tutorial: /Documentation/Archive/20.7

Expression Matrix Assay Tutorial

The expression matrix assay ties expression-level information to sample and feature/probe information. After appropriate files are loaded into the system, users can explore microarray results by building queries and visualizations based on feature/probe properties (such as genes) and sample properties.

Expression data may be manually extracted from Gene Expression Omnibus (GEO), transformed, and imported to LabKey Server.

Tutorial steps:

Review File Formats
Set Up the Folder
Define Sample Type
Import Sample Information
Add Feature Annotation Set
Create an Expression Matrix Assay Design
Import a Run
View Run Results

Files loaded include:

Metadata about features/probes (typically at the plate level)
Sample information
Actual expression data (often called a "series matrix" file)

Review File Formats

In order to use the assay, you will need three sets of data: a run file, a sample type, and a feature annotation file.

The run file will have one column for probe ids (ID_REF) and a variable number of columns named after a sample found in your sample type. The ID_REF column in the run file will contain probe ids that will be found in your feature annotation file, under the Probe_ID column. All of the other columns in your run file will be named after samples, which must be found in your sample type.

In order to import your run data, you must first import your sample type and your feature annotation set. Your run import will fail if we are unable to find a match for your ID_REF value for a sample in your sample type. If you don't have current files, you can use these small example files:

Set Up the Folder

If you don't already have a server to work on where you can create projects, start here.
If you don't know how to create projects and folders, review this topic.

Create a new folder named "Expression Matrix Tutorial". Choose the folder type "Assay."
Select (Admin) > Folder > Management and click the Folder Type tab.
Check the box for Microarray and click Update Folder.
Add a Sample Types web part on the left.
Add a Feature Annotation Sets web part, also on the left.

Define Sample Type

Click New Sample Type.
On the Create Sample Type page, name your sample type. Here we use ExpressionMatrixSamples.
In the Naming Pattern field, enter the following, which means use the ID_REF column as the unique identifier for samples:
```
${ID_REF}
```

Click the Fields section to open it.
Click manually define fields.
Use Add Field to create all the fields that will match your spreadsheet.

Three built in fields, Name, Description, and Flag, all of type Text, are always created and should not be included on this list.

For each field enter a name without spaces and select a data type. For our sample, enter:

ID_REF - Text
SampleA - Integer
SampleB - Integer

Click Save.

Import Sample Information

Now that you have created the new sample type, you will see the name in the Sample Types web part.

Click ExpressionMatrixSamples to open it.
Click Import More Samples.
Into the Data area, paste in a TSV of all your samples (or you can click Upload file... and upload the file directly).

The file sample_expression_matrix.tsv you downloaded contains this data:

Click Submit.

Add Feature Annotation Set

Return to the main folder pageby clicking the Expression Matrix Tutorial link near the top.
In the Feature Annotation Sets web part, click Import Feature Annotation Set.

Enter the Name: Feature Annotations 1
Enter the Vendor: Vendor 1
For Folder: Select the current folder.
Browse to select the annotation file. (Or use the provided file sample_feature_annotation_set.txt.) These can be from any manufacturer (i.e. Illumina or Affymetrix), but must be a TSV file with the following column headers:
```
Probe_ID 
Gene_Symbol 
UniGene_ID 
Gene_ID 
Accession_ID 
RefSeq_Protein_ID 
RefSeq_Transcript_ID
```
Click Upload.

Create an Expression Matrix Assay Design

In the Assay List web part, click New Assay Design.
Select the Expression Matrix assay type.
Scroll down to select the Assay Location (for our samples, use the current folder).
Click Next.
Name your assay, adjust any fields if needed.
Click Save.

Import a Run

Runs will be in the TSV format and have a variable number of columns.

The first column will always be ID_REF, which will contain a probe id that matches the Probe_ID column from your feature annotation set.
The rest of the columns will be for samples from your imported sample type (ExpressionMatrixSamples).

An example of column headers:

ID_REF GSM280331 GSM280332 GSM280333 GSM280334 GSM280335 GSM280336 GSM280337 GSM280338 ...

An example of row data:

1007_s_at 7.1722616266753 7.3191207236008 7.32161337343459 7.31420082996567 7.13913363545954 ...

To import a run:

Navigate to the expression matrix assay you just created. (Click the name in the Assay List from the main page of your folder.)
Click Import Data.
Select the appropriate Feature Annotation Set.
Click Choose File and navigate to your series matrix file (or use the provided example file series_matrix.tsv).
Click Save and Finish to begin the import.

Note: Importing a run may take a very long time as we are generally importing millions of rows of data. The Run Properties options include a checkbox named Import Values. If checked, the values for the run are imported normally. If unchecked, the values are not imported to the server, but links between the series matrix, samples, and annotations are preserved.

View Run Results

After the run is imported, to view the results:

Click the Assay ID (run or file name) in the runs grid.

There is also an alternative view of the run data, which is pivoted to have a column for each sample and a row for each probe id. To view the data as a pivoted grid:

Select (Admin) > Go to Module > Query
Browse to assay > ExpressionMatrix > [YOUR_ASSAY_NAME] > FeatureDataBySample.
Click View Data.
You can add this query to the dashboard using a Query web part.

LabKey Support

LabKey Support