Module: CalculateStatistics

Calculate Statistics calculates measures of assay quality (V and Z' factors) and dose response data (EC50) for all measured features made from images.
The V and Z' factors are statistical measures of assay quality and are calculated for each per-image measurement and for each average per-object measurement that you have made in the pipeline. Placing this module at the end of a pipeline in order to calculate these values allows you to identify which measured features are most powerful for distinguishing positive and negative control samples, or for accurately quantifying the assay's response to dose. These measurements will be calculated for all measured values (Intensity, AreaShape, Texture, etc.). These measurements can be exported as the "Experiment" set of data.

Available measurements

For both Z' and V factors, the highest possible value (best assay quality) is 1, and they can range into negative values (for assays where distinguishing between positive and negative controls is difficult or impossible). The Z' factor is based only on positive and negative controls. The V factor is based on an entire dose-response curve rather than on the minimum and maximum responses. When there are only two doses in the assay (positive and negative controls only), the V factor will equal the Z' factor.

Note: If the standard deviation of a measured feature is zero for a particular set of samples (e.g., all the positive controls), the Z' and V factors will equal 1 despite the fact that the assay quality is poor. This can occur when there is only one sample at each dose. This also occurs for some non-informative measured features, like the number of cytoplasm compartments per cell, which is always equal to 1.

This module can create MATLAB scripts that display the EC50 curves for each measurement. These scripts will require MATLAB and the statistics toolbox in order to run. See Create dose/response plots? below.

References

Example format for a file to be loaded by LoadData for this module:

LoadData loads information from a CSV file. The first line of this file is a header that names the items. Each subsequent line represents data for one image cycle, so your file should have the header line plus one line per image to be processed. You can also make a file for LoadData to load that contains the positive/negative control and dose designations plus the image file names to be processed, which is a good way to guarantee that images are matched with the correct data. The control and dose information can be designated in one of two ways:

Here is an example file:

Image_FileName_CY3,Image_PathName_CY3,Data_Control,Data_Dose
"Plate1_A01.tif","/images",-1,0
"Plate1_A02.tif","/images",1,1E10
"Plate1_A03.tif","/images",0,3E4
"Plate1_A04.tif","/images",0,5E5

See also the Metadata and legacy LoadData modules.

Settings:

Select the image measurement describing the positive and negative control status

The Z' factor, a measure of assay quality, is calculated by this module based on measurements from images that are specified as positive controls and images that are specified as negative controls. (Images that are neither are ignored.) The module assumes that all of the negative controls are specified by a minimum value, all of the positive controls are specified by a maximum value, and all other images have an intermediate value; this might allow you to use your dosing information to also specify the positive and negative controls. If you don't use actual dose data to designate your controls, a common practice is to designate -1 as a negative control, 0 as an experimental sample, and 1 as a positive control. In other words, positive controls should all be specified by a single high value (for instance, 1) and negative controls should all be specified by a single low value (for instance, 0). Other samples should have an intermediate value to exclude them from the Z' factor analysis.

The typical way to provide this information in the pipeline is to create a text comma-delimited (CSV) file outside of CellProfiler and then load that file into the pipeline using the Metadata module or the legacy LoadData module. In that case, choose the measurement that matches the column header of the measurement in the input file. See the Metadata module help for an example text file.

Select the image measurement describing the treatment dose

The V and Z' factor, a measure of assay quality, and the EC50, indicating dose/response, are calculated by this module based on each image being specified as a particular treatment dose. Choose a measurement that gives the dose of some treatment for each of your images.

The typical way to provide this information in the pipeline is to create a comma-delimited text file (CSV) outside of CellProfiler and then load that file into the pipeline using Metadata or the LoadData. In that case, choose the measurement that matches the column header of the measurement in the CSV input file. See LoadData help for an example text file.

Log-transform the dose values?

Select Yes if you have dose-response data and you want to log-transform the dose values before fitting a sigmoid curve.

Select No if your data values indicate only positive vs. negative controls.

Create dose/response plots?

Select Yes if you want to create and save dose response plots. You will be asked for information on how to save the plots.

Figure prefix

(Used only when creating dose/response plots)
CellProfiler will create a file name by appending the measurement name to the prefix you enter here. For instance, if you have objects named, "Cells", the "AreaShape_Area measurement", and a prefix of "Dose_", CellProfiler will save the figure as Dose_Cells_AreaShape_Area.m. Leave this setting blank if you do not want a prefix.

Output file location

(Used only when creating dose/response plots)
This setting lets you choose the folder for the output files. You can choose among the following options which are common to all file input/output modules:

Elsewhere and the two sub-folder options all require you to enter an additional path name. You can use an absolute path (such as "C:\imagedir\image.tif" on a PC) or a relative path to specify the file location relative to a directory):

For Elsewhere..., Default Input Folder sub-folder and Default Output Folder sub-folder, if you have metadata associated with your images via Metadata module, you can name the folder using metadata tags. You can insert a previously defined metadata tag by either using:

The inserted metadata tag will appear in green. To change a previously inserted metadata tag, navigate the cursor to just before the tag and either: For instance, if you have a metadata tag named "Plate", you can create a per-plate folder by selecting one of the subfolder options and then specifying the subfolder name as "\g<Plate>". The module will substitute the metadata values for the current image set for any metadata tags in the folder name. Please see the Metadata module for more details on metadata collection and usage.