Calculate Statistics calculates measures of assay quality (V and Z' factors) and dose response data (EC50) for all measured features made from images.
The V and Z' factors are statistical measures of assay quality and are calculated for each per-image measurement and for each average per-object measurement that you have made in the pipeline. Placing this module at the end of a pipeline in order to calculate these values allows you to identify which measured features are most powerful for distinguishing positive and negative control samples, or for accurately quantifying the assay's response to dose. These measurements will be calculated for all measured values (Intensity, AreaShape, Texture, etc.). These measurements can be exported as the "Experiment" set of data.
Available measurements
- Experiment features: Whereas most CellProfiler measurements are calculated for each object (per-object) or for each image (per-image), this module produces per-experiment values; for example, one Z' factor is calculated for each measurement, across the entire analysis run.
- Zfactor: The Z'-factor indicates how well separated the positive and negative controls are. A Z'-factor > 0 is potentially screenable; a Z'-factor > 0.5 is considered an excellent assay. The formula is 1 &minus 3 × (σp + σn)/|μp - μn| where σp and σn are the standard deviations of the positive and negative controls, and μp and μn are the means of the positive and negative controls.
- Vfactor: The V-factor is a generalization of the Z'-factor, and is calculated as 1 &minus 6 × mean(σ)/|μp - μn| where σ are the standard deviations of the data, and μp and μn are defined as above.
- EC50: The half maximal effective concentration (EC50) is the concentration of a treatment required to induce a response which is 50%% of the maximal response.
- OneTailedZfactor: This measure is an attempt to overcome a limitation of the original Z'-factor formulation (it assumes a Gaussian distribution) and is informative for populations with moderate or high amounts of skewness. In these cases, long tails opposite to the mid-range point lead to a high standard deviation for either population, which results in a low Z' factor even though the population means and samples between the means may be well-separated. Therefore, the one-tailed Z' factor is calculated with the same formula but using only those samples that lie between the positive/negative population means. This is not yet a well established measure of assay robustness, and should be considered experimental.
For both Z' and V factors, the highest possible value (best assay quality) is 1, and they can range into negative values (for assays where distinguishing between positive and negative controls is difficult or impossible). The Z' factor is based only on positive and negative controls. The V factor is based on an entire dose-response curve rather than on the minimum and maximum responses. When there are only two doses in the assay (positive and negative controls only), the V factor will equal the Z' factor.
Note: If the standard deviation of a measured feature is zero for a particular set of samples (e.g., all the positive controls), the Z' and V factors will equal 1 despite the fact that the assay quality is poor. This can occur when there is only one sample at each dose. This also occurs for some non-informative measured features, like the number of cytoplasm compartments per cell, which is always equal to 1.
This module can create MATLAB scripts that display the EC50 curves for each measurement. These scripts will require MATLAB and the statistics toolbox in order to run. See Create dose/response plots? below.
References
- Z' factor: Zhang JH, Chung TD, et al. (1999) "A simple statistical parameter for use in evaluation and validation of high throughput screening assays" J Biomolecular Screening 4(2): 67-73. (link)
- V factor: Ravkin I (2004): Poster #P12024 - Quality Measures for Imaging-based Cellular Assays. Society for Biomolecular Screening Annual Meeting Abstracts.
- Code for the calculation of Z' and V factors was kindly donated by Ilya Ravkin. Carlos Evangelista donated his copyrighted dose-response-related code.
Example format for a file to be loaded by LoadData for this module:
LoadData loads information from a CSV file. The first line of this file is a header that names the items. Each subsequent line represents data for one image cycle, so your file should have the header line plus one line per image to be processed. You can also make a file for LoadData to load that contains the positive/negative control and dose designations plus the image file names to be processed, which is a good way to guarantee that images are matched with the correct data. The control and dose information can be designated in one of two ways:
- As metadata (so that the column header is prefixed with the "Metadata_" tag). "Metadata" is the category and the name after the underscore is the measurement.
- As some other type of data, in which case the header needs to be of the form <prefix>_<measurement>. Select <prefix> as the category and <measurement> as the measurement.
Here is an example file:
Image_FileName_CY3, | Image_PathName_CY3, | Data_Control, | Data_Dose |
"Plate1_A01.tif", | "/images", | -1, | 0 |
"Plate1_A02.tif", | "/images", | 1, | 1E10 |
"Plate1_A03.tif", | "/images", | 0, | 3E4 |
"Plate1_A04.tif", | "/images", | 0, | 5E5 |
See also the Metadata and legacy LoadData modules.
Settings:
Select the image measurement describing the positive and negative control status
The Z' factor, a measure of assay quality, is calculated by this
module based on measurements from images that are specified as positive controls
and images that are specified as negative controls. (Images that are neither are
ignored.) The module assumes that
all of the negative controls are specified by a minimum value, all of the
positive controls are specified by a maximum value, and all other images have an
intermediate value; this might allow you to use your dosing information to also
specify the positive and negative controls. If you don't use actual dose
data to designate your controls, a common practice is to designate -1 as a
negative control, 0 as an experimental sample, and 1 as a positive control.
In other words, positive controls should all be specified by a single high
value (for instance, 1) and negative controls should all be specified by a
single low value (for instance, 0). Other samples should have an intermediate value
to exclude them from the Z' factor analysis.
The typical way to provide this information in the pipeline is to create
a text comma-delimited (CSV) file outside of CellProfiler and then load that file into the pipeline
using the Metadata module or the legacy LoadData module. In that case, choose the
measurement that matches the column header of the measurement
in the input file. See the Metadata module help for an example text file.
Select the image measurement describing the treatment dose
The V and Z' factor, a measure of assay quality, and the EC50, indicating
dose/response, are calculated by this module based on each image being
specified as a particular treatment dose. Choose a measurement that gives
the dose of some treatment for each of your images.
The typical way to provide this information in the pipeline is to create
a comma-delimited text file (CSV) outside of CellProfiler and then load that file into the pipeline
using Metadata or the LoadData. In that case, choose the
measurement that matches the column header of the measurement
in the CSV input file. See LoadData help for an example text file.
Log-transform the dose values?
Select
Yes if you have dose-response data and you want to log-transform
the dose values before fitting a sigmoid curve.
Select No if your data values indicate only positive vs. negative
controls.
Create dose/response plots?
Select
Yes if you want to create and save
dose response plots. You will be asked for information on how to save the plots.
Figure prefix
(Used only when creating dose/response plots)
CellProfiler will create a file name by appending the measurement name
to the prefix you enter here. For instance, if you have objects
named, "Cells", the "AreaShape_Area measurement", and a prefix of "Dose_",
CellProfiler will save the figure as Dose_Cells_AreaShape_Area.m.
Leave this setting blank if you do not want a prefix.
Output file location
(Used only when creating dose/response plots)
This setting lets you choose the folder for the output
files.
You can choose among the following options which are common to all file input/output
modules:
- Default Input Folder: Use the default input folder.
- Default Output Folder: Use from the default output folder.
- Elsewhere...: Use a particular folder you specify.
- Default input directory sub-folder: Enter the name of a subfolder of
the default input folder or a path that starts from the default input folder.
- Default output directory sub-folder: Enter the name of a subfolder of
the default output folder or a path that starts from the default output folder.
Elsewhere and the two sub-folder options all require you to enter an additional
path name. You can use an absolute path (such as "C:\imagedir\image.tif" on a PC) or a
relative path to specify the file location relative to a directory):
- Use one period to represent the current directory. For example, if you choose
Default Input Folder sub-folder, you can enter "./MyFiles" to look in a
folder called "MyFiles" that is contained within the Default Input Folder.
- Use two periods ".." to move up one folder level. For example, if you choose
Default Input Folder sub-folder, you can enter "../MyFolder" to look in a
folder called "MyFolder" at the same level as the Default Input Folder.
For Elsewhere..., Default Input Folder sub-folder and
Default Output Folder sub-folder, if you have metadata associated with your
images via Metadata module, you can name the folder using metadata
tags.
You can insert a previously defined metadata tag by either using:
- The insert key
- A right mouse button click inside the control
- In Windows, the Context menu key, which is between the Windows key and Ctrl key
The inserted metadata tag will appear in green. To change a previously inserted metadata tag,
navigate the cursor to just before the tag and either:
- Use the up and down arrows to cycle through possible values.
- Right-click on the tag to display and select the available values.
For instance, if you have a metadata tag named
"Plate", you can create a per-plate folder by selecting one of the subfolder options
and then specifying the subfolder name as "\g<Plate>". The module will
substitute the metadata values for the current image set for any metadata tags in the
folder name. Please see the
Metadata module for more details on metadata collection and usage.