# Data Tools¶

Data Tools allow you to plot, view, export or perform specialized analyses on your measurements. These can be run as a module in a pipeline but also run after analysis is complete using “Data Tools” in CellProfiler’s main menu.

## CalculateMath¶

CalculateMath takes measurements produced by previous modules and performs basic arithmetic operations.

The arithmetic operations available in this module include addition, subtraction, multiplication, and division. The result can be log-transformed or raised to a power and can be used in further calculations if another CalculateMath module is added to the pipeline.

The module can make its calculations on a per-image basis (for example, multiplying the area occupied by a stain in the image by the total intensity in the image) or on an object-by-object basis (for example, dividing the intensity in the nucleus by the intensity in the cytoplasm for each cell).

Supports 2D? Supports 3D? Respects masks?
YES YES NO

### Measurements made by this module¶

• Image measurements: If both input measurements are whole-image measurements, then the result will also be a whole-image measurement.
• Object measurements: Object measurements can be produced in two ways:
• If both input measurements are individual object measurements, then the result will also be an object measurement. In these cases, the measurement will be associated with both objects that were involved in the measurement.
• If one measure is object-based and one image-based, then the result will be an object measurement.

The result of these calculations is a new measurement in the “Math” category.

## CalculateStatistics¶

CalculateStatistics calculates measures of assay quality (V and Z’ factors) and dose-response data (EC50) for all measured features made from images.

The V and Z’ factors are statistical measures of assay quality and are calculated for each per-image measurement and for each average per-object measurement that you have made in the pipeline. Placing this module at the end of a pipeline in order to calculate these values allows you to identify which measured features are most powerful for distinguishing positive and negative control samples (Z’ factor), or for accurately quantifying the assay’s response to dose (V factor). These measurements will be calculated for all measured values (Intensity, AreaShape, Texture, etc.) upstream in the pipeline. The statistics calculated by this module can be exported as the “Experiment” set of data.

Supports 2D? Supports 3D? Respects masks?
YES NO NO

### What do I need as input?¶

Example format for a file to be loaded by LoadData for this module:

LoadData loads information from a CSV file. The first line of this file is a header that names the items. Each subsequent line represents data for one image cycle, so your file should have the header line plus one line per image to be processed. You can also make a file for LoadData to load that contains the positive/negative control and dose designations plus the image file names to be processed, which is a good way to guarantee that images are matched with the correct data. The control and dose information can be designated in one of two ways:

• As metadata (so that the column header is prefixed with the “Metadata_” tag). “Metadata” is the category and the name after the underscore is the measurement.
• As some other type of data, in which case the header needs to be of the form <prefix>_<measurement>. Select <prefix> as the category and <measurement> as the measurement.

Here is an example file:

 Image_FileName_CY3, Image_PathName_CY3, Data_Control, Data_Dose “Plate1_A01.tif”, “/images”, -1, 0 “Plate1_A02.tif”, “/images”, 1, 1E10 “Plate1_A03.tif”, “/images”, 0, 3E4 “Plate1_A04.tif”, “/images”, 0, 5E5

### Measurements made by this module¶

• Experiment features: Whereas most CellProfiler measurements are calculated for each object (per-object) or for each image (per-image), this module produces per-experiment values; for example, one Z’ factor is calculated for each measurement, across the entire analysis run.
• Zfactor: The Z’-factor indicates how well separated the positive and negative controls are. A Z’-factor > 0 is potentially screenable; a Z’-factor > 0.5 is considered an excellent assay. The formula is 1 - 3 × (σp + σn)/|μp - μn| where σp and σn are the standard deviations of the positive and negative controls, and μp and μn are the means of the positive and negative controls.
• Vfactor: The V-factor is a generalization of the Z’-factor, and is calculated as 1 - 6 × mean(σ)/|μp - μn| where σ are the standard deviations of the data, and μp and μn are defined as above.
• EC50: The half maximal effective concentration (EC50) is the concentration of a treatment required to induce a response that is 50% of the maximal response.
• OneTailedZfactor: This measure is an attempt to overcome a limitation of the original Z’-factor formulation (it assumes a Gaussian distribution) and is informative for populations with moderate or high amounts of skewness. In these cases, long tails opposite to the mid-range point lead to a high standard deviation for either population, which results in a low Z’ factor even though the population means and samples between the means may be well-separated. Therefore, the one-tailed Z’ factor is calculated with the same formula but using only those samples that lie between the positive/negative population means. This is not yet a well established measure of assay robustness, and should be considered experimental.

For both Z’ and V factors, the highest possible value (best assay quality) is 1, and they can range into negative values (for assays where distinguishing between positive and negative controls is difficult or impossible). The Z’ factor is based only on positive and negative controls. The V factor is based on an entire dose-response curve rather than on the minimum and maximum responses. When there are only two doses in the assay (positive and negative controls only), the V factor will equal the Z’ factor.

Note that if the standard deviation of a measured feature is zero for a particular set of samples (e.g., all the positive controls), the Z’ and V factors will equal 1 despite the fact that the assay quality is poor. This can occur when there is only one sample at each dose. This also occurs for some non-informative measured features, like the number of cytoplasm compartments per cell, which is always equal to 1.

This module can create MATLAB scripts that display the EC50 curves for each measurement. These scripts will require MATLAB and the statistics toolbox in order to run. See Create dose-response plots? below.