The LabKey flow module supports importing and exporting analyses as a series of .tsv and supporting files in a zip archive. The format is intended to be simple for tools to reformat the results of an external analysis engine for importing into LabKey. Notably, the analysis definition is not included in the archive, but may be defined elsewhere in a FlowJo workspace gating hierarchy, an R flowCore script, or be defined by some other software package.
Export an Analysis Archive
From the flow Runs or FCSAnalysis grid, you can export the analysis results including the original FCS files, keywords, compensation matrices, and statistics.
- Open the analysis and select the runs to export.
- Select (Export).
- Click the Analysis tab.
- Make the selections you need and click Export.
Import an Analysis Archive
To import a flow analysis archive, perhaps after making changes outside the server to add different statistics, graphs, or other information, follow these steps:
- In the flow folder, Flow Summary web part, click Upload and Import.
- Drag and drop the analysis archive into the upload panel.
- Select the archive and click Import Data.
- In the popup, confirm that Import External Analysis is selected.
Analysis Archive Format
In brief, the archive format contains the following files:
<root directory>
├─ keywords.tsv
├─ statistics.tsv
│
├─ compensation.tsv
├─ <comp-matrix01>
├─ <comp-matrix02>.xml
│
├─ graphs.tsv
│
├─ <Sample Name 01>/
│ └─ <graph01>.png
│ └─ <graph02>.svg
│
└─ <Sample Name 02>/
├─ <graph01>.png
└─ <graph02>.pdf
All analysis tsv files are optional. The keywords.tsv file lists the keywords for each sample. The statistics.tsv file contains summary statistic values for each sample in the analysis grouped by population. The graphs.tsv contains a catalog of graph images for each sample where the image format may be any image format (pdf, png, svg, etc.) The compensation.tsv contains a catalog of compensation matrices. To keep the directory listing clean, the graphs or compensation matrices may be grouped into sub-directories. For example, the graph images for each sample could be placed into a directory with the same name as the sample.
ACS Container Format
The
ACS container format is not sufficient for direct import to LabKey. The ACS table of contents only includes relationships between files and doesn’t include, for example, the population name and channel/parameter used to calculate a statistic or render a graph. If the ACS ToC could include those missing metadata, the graphs.tsv would be made redundant. The statistics.tsv would still be needed, however.
If you have analyzed results tsv files bundled inside an ACS container, you may be able to extract portions of the files for reformatting into the LabKey flow analysis archive zip format, but you would need to generate the graphs.tsv file manually.
Statistics File
The statistics.tsv file is a tab-separated list of values containing stat names and values. The statistic values may be grouped in a few different ways: (a) no grouping (one statistic value per line), (b) grouped by sample (each column is a new statistic), (c) grouped by sample and population (the current default encoding), or (d) grouped by sample, population, and channel.
Sample Name
Samples are identified by the value in the sample column so must be unique in the analysis. Usually the sample name is just the FCS file name including the ‘.fcs’ extension (e.g., “12345.fcs”).
Population Name
The population column is a unique name within the analysis that identifies the set of events that the statistics were calculated from. A common way to identify the statistics is to use the gating path with gate names separated by a forward slash. If the population name starts with “(” or contains one of “/”, “{”, or “}” the population name must be escaped. To escape illegal characters, wrap the entire gate name in curly brackets { }. For example, the population “A/{B/C}” is the sub-population “B/C” of population “A”.
Statistic Name
The statistic is encoded in the column header as
statistic(parameter:percentile) where the parameter and percentile portions are required depending upon the statistic type. The
statistic part of the column header may be either the short name (“%P”) or the long name (“Frequency_Of_Parent”). The
parameter part is required for the frequency of ancestor statistic and for other channel based statistics. The frequency of ancestor statistic uses the name of an ancestor population as the parameter value while the other statistics use a channel name as the parameter value. To represent compensated parameters, the channel name is wrapped in angle brackets, e.g “<FITC-A>”. The
percentile part is required only by the “Percentile” statistic and is an integer in the range of 1-99.
The statistic value is a either an integer number or a double. Count stats are integer values >= 0. Percentage stats are doubles in the range 0-100. Other stats are doubles. If the statistic is not present for the given sample and population, it is left blank.
Allowed Statistics
Short Name | Long Name | Parameter | Type |
---|
Count | Count | n/a | Integer |
% | Frequency | n/a | Double (0-100) |
%P | Frequency_Of_Parent | n/a | Double (0-100) |
%G | Frequency_Of_Grandparent | n/a | Double (0-100) |
%of | Frequency_Of_Ancestor | ancestor population name | Double (0-100) |
Min | Min | channel name | Double |
Max | Max | channel name | Double |
Median | Median | channel name | Double |
Mean | Mean | channel name | Double |
GeomMean | Geometric_Mean | channel name | Double |
StdDev | Std_Dev | channel name | Double |
rStdDev | Robust_Std_Dev | channel name | Double |
MAD | Median_Abs_Dev | channel name | Double |
MAD% | Median_Abs_Dev_Percent | channel name | Double (0-100) |
CV | CV | channel name | Double |
rCV | Robust_CV | channel name | Double |
%ile | Percentile | channel name and percentile 1-99 | Double (0-100) |
For example, the following are valid statistic names:
- Count
- Robust_CV(<FITC>)
- %ile(<Pacific-Blue>:30)
- %of(Lymphocytes)
Examples
NOTE: The following examples are for illustration purposes only.
No Grouping: One Row Per Sample and Statistic
The required columns are
Sample, Population, Statistic, and
Value. No extra columns are present. Each statistic is on a new line.
Sample | Population | Statistic | Value |
---|
Sample1.fcs | S/L/Lv/3+/4+/IFNg+IL2+ | %P | 0.85 |
Sample1.fcs | S/L/Lv/3+/4+/IFNg+IL2- | Count | 12001 |
Sample2.fcs | S/L/Lv/3+/{escaped/slash} | Median(FITC-A) | 23,000 |
Sample2.fcs | S/L/Lv/3+/4+/IFNg+IL2+ | %ile(<Pacific-Blue>:30) | 0.93 |
Grouped By Sample
The only required column is
Sample. The remaining columns are statistic columns where the column name contain the population name and statistic name separated by a colon.
Sample | S/L/Lv/3+/4+/IFNg+IL2+:Count | S/L/Lv/3+/4+/IFNg+IL2+:%P | S/L/Lv/3+/4+/IFNg+IL2-:%ile(<Pacific-Blue>:30) | S/L/Lv/3+/4+/IFNg+IL2-:%P |
---|
Sample1.fcs | 12001 | 0.93 | 12314 | 0.24 |
Sample2.fcs | 13056 | 0.85 | 13023 | 0.56 |
Grouped By Sample and Population
The required columns are
Sample and
Population. The remaining columns are statistic names including any required parameter part and percentile part.
Sample | Population | Count | %P | Median(FITC-A) | %ile(<Pacific-Blue>:30) |
---|
Sample1.fcs | S/L/Lv/3+/4+/IFNg+IL2+ | 12001 | 0.93 | 45223 | 12314 |
Sample1.fcs | S/L/Lv/3+/4+/IFNg+IL2- | 12312 | 0.94 | | 12345 |
Sample2.fcs | S/L/Lv/3+/4+/IFNg+IL2+ | 13056 | 0.85 | | 13023 |
Sample2.fcs | S/L/Lv/{slash/escaped} | 3042 | 0.35 | 13023 | |
Grouped By Sample, Population, and Parameter
The required columns are
Sample, Population, and
Parameter. The remaining columns are statistic names with any required percentile part.
Sample | Population | Parameter | Count | %P | Median | %ile(30) |
---|
Sample1.fcs | S/L/Lv/3+/4+/IFNg+IL2+ | | 12001 | 0.93 | | |
Sample1.fcs | S/L/Lv/3+/4+/IFNg+IL2+ | FITC-A | | | 45223 | |
Sample1.fcs | S/L/Lv/3+/4+/IFNg+IL2+ | <Pacific-Blue> | | | | 12314 |
Graphs File
The graphs.tsv file is a catalog of plot images generated by the analysis. It is similar to the statistics file and lists the sample name, plot file name, and plot parameters. Currently, the only plot parameters included in the graphs.tsv are the population and x and y axes. The graph.tsv file contains one graph image per row. The population column is encoded in the same manner as in the statistics.tsv file. The graph column is the colon-concatenated x and y axes used to render the plot. Compensated parameters are surrounded with <> angle brackets. (Future formats may split x and y axes into separate columns to ease parsing.) The path is a relative file path to the image (no “.” or “..” is allowed in the path) and the image name is usually just an MD5-sum of the graph bytes.
Multi-sample or multi-plot images are not yet supported.
Sample | Population | Graph | Path |
---|
Sample1.fcs | S/L/Lv/3+/4+/IFNg+IL2+ | <APC-A> | sample01/graph01.png |
Sample1.fcs | S/L/Lv/3+/4+/IFNg+IL2- | SSC-A:<APC-A> | sample01/graph02.png |
Sample2.fcs | S/L/Lv/3+/4+/IFNg+IL2+ | FSC-H:FSC-A | sample02/graph01.svg |
... | | | |
Compensation File
The compensation.tsv file maps sample names to compensation matrix file paths. The required columns are
Sample and
Path. The path is a relative file path to the matrix (no “.” or “..” is allowed in the path). The comp. matrix file is in the FlowJo comp matrix file format or a GatingML transforms:spilloverMatrix XML document.
Sample | Path |
---|
Sample1.fcs | compensation/matrix1 |
Sample2.fcs | compensation/matrix2.xml |
Keywords File
The keywords.tsv lists the keyword names and values for each sample. This file has the required columns
Sample, Keyword, and Value.
Sample | Keyword | Value |
---|
Sample1.fcs | $MODE | L |
Sample1.fcs | $DATATYPE | F |
... | | |