LoadData

Load Data loads text or numerical data to be associated with images, and can also load images specified by file names.

This module loads a file that supplies text or numerical data associated with the images to be processed, e.g., sample names, plate names, well identifiers, or even a list of image filenames to be processed in the analysis run.

Disclaimer: Please note that the Input modues (i.e., Images, Metadata, NamesAndTypes and Groups) largely supercedes this module. However, old pipelines loaded into CellProfiler that contain this module will provide the option of preserving them; these pipelines will operate exactly as before.

The module currently reads files in CSV (comma-separated values) format. These files can be produced by saving a spreadsheet from Excel as "Windows Comma Separated Values" file format. The lines of the file represent the rows, and each field in a row is separated by a comma. Text values may be optionally enclosed by double quotes. The LoadData module uses the first row of the file as a header. The fields in this row provide the labels for each column of data. Subsequent rows provide the values for each image cycle.

There are many reasons why you might want to prepare a CSV file and load it via LoadData. Below, we describe how the column nomenclature allows for special functionality for some downstream modules:

Columns with any name: Any data loaded via LoadData will be exported as a per-image measurement along with CellProfiler-calculated data. This is a convenient way for you to add data from your own sources to the files exported by CellProfiler.

Columns whose name begins with Image_FileName or Image_PathName: A column whose name begins with "Image_FileName" or "Image_PathName" can be used to supply the file name and path name (relative to the base folder) of an image that you want to load. The image's name within CellProfiler appears afterward. For instance, "Image_FileName_CY3" would supply the file name for the CY3-stained image, and choosing the Load images based on this data? option allows the CY3 images to be selected later in the pipeline. "Image_PathName_CY3" would supply the path names for the CY3-stained images. The path name column is optional; if all image files are in the base folder, this column is not needed.

Columns whose name begins with Image_ObjectsFileName or Image_ObjectsPathName: The behavior of these columns is identical to that of "Image_FileName" or "Image_PathName" except that it is used to specify an image that you want to load as objects.

Columns whose name begins with Metadata: A column whose name begins with "Metadata" can be used to group or associate files loaded by LoadData.
For instance, an experiment might require that images created on the same day use an illumination correction function calculated from all images from that day, and furthermore, that the date be captured in the file names for the individual image sets and in a CSV file specifying the illumination correction functions.
In this case, if the illumination correction images are loaded with the LoadData module, the file should have a "Metadata_Date" column which contains the date metadata tags. Similarly, if the individual images are loaded using the LoadImages module, LoadImages should be set to extract the metadata tag from the file names (see LoadImages for more details on how to do so). The pipeline will then match the individual image with their corresponding illumination correction functions based on matching "Metadata_Date" tags. This is useful if the same data is associated with several images (for example, multiple images obtained from a single well).

Columns whose name begins with Series or Frame: A columns whose name begins with "Series" or "Frame" refers to CSVs containing information about image stacks or movies. The name of the image within CellProfiler appears afterward an underscore character. For example, "Frame_DNA" would supply the frame number for the movie/image stack file specified by the "Image_FileName_DNA" and "Image_PathName_DNA" columns.
Using a CSV for loading frames and/or series from an movie/image stack allows you more flexibility in assembling image sets for operations that would difficult or impossible using the Input modules alone. For example, if you wanted to analyze a movie of 1,000 frames by computing the difference between frames, you could create two image columns in a CSV, one for loading frames 1,2,...,999, and the other for loading frames 2,3,...,1000. In this case, CellProfiler would load the frame and its predecessor for each cycle and ImageMath could be used to create the differece image for downstream use.

Columns that contain dose-response or positive/negative control information: The CalculateStatistics module can calculate metrics of assay quality for an experiment if provided with information about which images represent positive and negative controls and/or what dose of treatment has been used for which images. This information is provided to CalculateStatistics via the LoadData module, using particular formats described in the help for CalculateStatistics. Again, using LoadData is useful if the same data is associated with several images (for example, multiple images obtained from a single well).

Example CSV file:

Image_FileName_FITC, Image_PathName_FITC, Metadata_Plate, Titration_NaCl_uM "04923_d1.tif", "2009-07-08", "P-12345", 750 "51265_d1.tif", "2009-07-09", "P-12345", 2750

After the first row of header information (the column names), the first image-specific row specifies the file, "2009-07-08/04923_d1.tif" for the FITC image (2009-07-08 is the name of the subfolder that contains the image, relative to the Default Input Folder). The plate metadata is "P-12345" and the NaCl titration used in the well is 750 uM. The second image-specific row has the values "2009-07-09/51265_d1.tif", "P-12345" and 2750 uM. The NaCl titration for the image is available for modules that use numeric metadata, such as CalculateStatistics; "Titration" will be the category and "NaCl_uM" will be the measurement.

Using metadata in LoadData

If you would like to use the metadata-specific settings, please see Help > General help > Using metadata in CellProfiler for more details on metadata usage and syntax. Briefly, LoadData can use metadata provided by the input CSV file for grouping similar images together for the analysis run and for metadata-specfic options in other modules; see the settings help for Group images by metadata and, if that setting is selected, Select metadata tags for grouping for details.

Using MetaXpress-acquired images in CellProfiler

To produce a CSV file containing image location and metadata from a MetaXpress imaging run, do the following:

Collect image locations from all files that match the string .tif in the desired image folder, one row per image.
Split up the image pathname and filename into separate data columns for LoadData to read.
Remove data rows corresponding to:
- Thumbnail images (do not contain imaging data)
- Duplicate images (will cause metadata mismatching)
- Corrupt files (will cause failure on image loading)
The image data table may be linked to metadata contained in plate maps. These plate maps should be stored as flat files, and may be updated periodically via queries to a laboratory information management system (LIMS) database.
The complete image location and metadata is written to a CSV file where the headers can easily be formatted to match LoadData's input requirements (see column descriptions above). Single plates split across multiple directories (which often occurs in MetaXpress) are written to separate files and then merged, thereby removing the discontinuity.

For a GUI-based approach to performing this task, we suggest using Pipeline Pilot.

For more details on configuring CellProfiler (and LoadData in particular) for a LIMS environment, please see our wiki on the subject.

Available measurements

Pathname, Filename: The full path and the filename of each image, if image loading was requested by the user.
Per-image information obtained from the input file provided by the user.
Scaling: The maximum possible intensity value for the image format.
Height, Width: The height and width of the current image.

See also the Input modules, LoadImages and CalculateStatistics.

Settings:

Input data file location

Select the folder containing the CSV file to be loaded. You can choose among the following options which are common to all file input/output modules:

Default Input Folder: Use the default input folder.
Default Output Folder: Use from the default output folder.
Elsewhere...: Use a particular folder you specify.
Default input directory sub-folder: Enter the name of a subfolder of the default input folder or a path that starts from the default input folder.
Default output directory sub-folder: Enter the name of a subfolder of the default output folder or a path that starts from the default output folder.

Elsewhere and the two sub-folder options all require you to enter an additional path name. You can use an absolute path (such as "C:\imagedir\image.tif" on a PC) or a relative path to specify the file location relative to a directory):

Use one period to represent the current directory. For example, if you choose Default Input Folder sub-folder, you can enter "./MyFiles" to look in a folder called "MyFiles" that is contained within the Default Input Folder.
Use two periods ".." to move up one folder level. For example, if you choose Default Input Folder sub-folder, you can enter "../MyFolder" to look in a folder called "MyFolder" at the same level as the Default Input Folder.

An additional option is the following:

URL: Use the path part of a URL. For instance, an example .CSV file is hosted at https://svn.broadinstitute.org/CellProfiler/trunk/Examplehttp://d1zymp9ayga15t.cloudfront.net/images/ExampleSBShttp://d1zymp9ayga15t.cloudfront.net/images/1049_Metadata.csv To access this file, you would choose URL and enter https://svn.broadinstitute.org/CellProfiler/trunk/Examplehttp://d1zymp9ayga15t.cloudfront.net/images/ExampleSBSImages as the path location.

Name of the file

Provide the file name of the CSV file containing the data.

Load images based on this data?

Select Yes to have LoadData load images using the Image_FileName field and the Image_PathName fields (the latter is optional).

Base image location

The parent (base) folder where images are located. If images are contained in subfolders, then the file you load with this module should contain a column with path names relative to the base image folder (see the general help for this module for more details). You can choose among the following options:

Default Input Folder: Use the Default Input Folder.
Default Output Folder: Use the Default Output Folder.
None: You have an Image_PathName field that supplies an absolute path.
Elsewhere...: Use a particular folder you specify.

Process just a range of rows?

Select Yes if you want to process a subset of the rows in the CSV file. Rows are numbered starting at 1 (but do not count the header line). LoadData will process up to and including the end row.

Rows to process

(Used only if a range of rows is to be specified)
Enter the row numbers of the first and last row to be processed.

Group images by metadata?

Select Yes to break the image sets in an experiment into groups that can be processed by different nodes on a computing cluster. Each set of files that share your selected metadata tags will be processed together. See CreateBatchFiles for details on submitting a CellProfiler pipeline to a computing cluster for processing.

Select metadata tags for grouping

(Used only if images are to be grouped by metadata)
Select the tags by which you want to group the image files here. You can select multiple tags. For example, if a set of images had metadata for "Run", "Plate", "Well", and "Site", selecting Run and Plate will create groups containing images that share the same [Run,Plate] pair of tags.

Rescale intensities?

This option determines whether image metadata should be used to rescale the image's intensities. Some image formats save the maximum possible intensity value along with the pixel data. For instance, a microscope might acquire images using a 12-bit A/D converter which outputs intensity values between zero and 4095, but stores the values in a field that can take values up to 65535.

Select Yes to rescale the image intensity so that saturated values are rescaled to 1.0 by dividing all pixels in the image by the maximum possible intensity value.

Select No to ignore the image metadata and rescale the image to 0 – 1.0 by dividing by 255 or 65535, depending on the number of bits used to store the image.

Image_FileName_FITC,	Image_PathName_FITC,	Metadata_Plate,	Titration_NaCl_uM
"04923_d1.tif",	"2009-07-08",	"P-12345",	750
"51265_d1.tif",	"2009-07-09",	"P-12345",	2750

Module: LoadData