File Processing

File Processing modules handle the input and output of files to and from CellProfiler.

CreateBatchFiles

CreateBatchFiles produces files that allow individual batches of images to be processed separately on a cluster of computers.

This module creates files that can be submitted in parallel to a cluster for faster processing. It should be placed at the end of an image processing pipeline.

If your computer mounts the file system differently than the cluster computers, CreateBatchFiles can replace the necessary parts of the paths to the image and output files.


Supports 2D? Supports 3D? Respects masks?
YES NO NO

(Jump to top)

ExportToDatabase

ExportToDatabase exports data directly to a database or in database readable format, including a CellProfiler Analyst properties file, if desired.

This module exports measurements directly to a database or to a SQL-compatible format. It allows you to create and import MySQL and associated data files into a database and gives you the option of creating a properties file for use with CellProfiler Analyst. Optionally, you can create an SQLite database file if you do not have a server on which to run MySQL itself. This module must be run at the end of a pipeline, or second to last if you are using the CreateBatchFiles module. If you forget this module, you can also run the ExportDatabase data tool (accessed from CellProfiler’s main menu) after processing is complete; its functionality is the same.

The database is set up with two primary tables. These tables are the Per_Image table and the Per_Object table (which may have a prefix if you specify):

  • The Per_Image table consists of all the per-image measurements made during the pipeline, plus per-image population statistics (such as mean, median, and standard deviation) of the object measurements. There is one per_image row for every “cycle” that CellProfiler processes (a cycle is usually a single field of view, and a single cycle usually contains several image files, each representing a different channel of the same field of view).
  • The Per_Object table contains all the measurements for individual objects. There is one row of object measurements per object identified. The two tables are connected with the primary key column ImageNumber, which indicates the image to which each object belongs. The Per_Object table has another primary key called ObjectNumber, which is unique to each image.

Typically, if multiple types of objects are identified and measured in a pipeline, the numbers of those objects are equal to each other. For example, in most pipelines, each nucleus has exactly one cytoplasm, so the first row of the Per-Object table contains all of the information about object #1, including both nucleus- and cytoplasm-related measurements. If this one-to-one correspondence is not the case for all objects in the pipeline (for example, if dozens of speckles are identified and measured for each nucleus), then you must configure ExportToDatabase to export only objects that maintain the one-to-one correspondence (for example, export only Nucleus and Cytoplasm, but omit Speckles). If you have extracted “Plate” and “Well” metadata from image filenames or loaded “Plate” and “Well” metadata via the Metadata or LoadData modules, you can ask CellProfiler to create a “Per_Well” table, which aggregates object measurements across wells. This option will output a SQL file (regardless of whether you choose to write directly to the database) that can be used to create the Per_Well table. Note that the “Per_Well” mean/median/stdev values are only usable for database type MySQL (and CSV/MySQL), not SQLite.

At the secure shell where you normally log in to MySQL, type the following, replacing the italics with references to your database and files, to import these CellProfiler measurements to your database:

mysql -h hostname -u username -p databasename < pathtoimages/perwellsetupfile.SQL

The commands written by CellProfiler to create the Per_Well table will be executed. Oracle is not fully supported at present; you can create your own Oracle DB using the .csv output option and writing a simple script to upload to the database.

For details on the nomenclature used by CellProfiler for the exported measurements, see Help > General Help > How Measurements Are Named.


Supports 2D? Supports 3D? Respects masks?
YES NO NO

See also

See also ExportToSpreadsheet.

(Jump to top)

ExportToSpreadsheet

ExportToSpreadsheet exports measurements into one or more files that can be opened in Excel or other spreadsheet programs.

This module will convert the measurements to a comma-, tab-, or other character-delimited text format and save them to the hard drive in one or several files, as requested.


Supports 2D? Supports 3D? Respects masks?
YES NO NO

Using metadata tags for output

ExportToSpreadsheet can write out separate files for groups of images based on their metadata tags. This is controlled by the directory and file names that you enter. For instance, you might have applied two treatments to each of your samples and labeled them with the metadata names “Treatment1” and “Treatment2”, and you might want to create separate files for each combination of treatments, storing all measurements with a given “Treatment1” in separate directories. You can do this by specifying metadata tags for the folder name and file name:

  • Choose “Elsewhere…” or “Default Input/Output Folder sub-folder” for the output file location. Do note that regardless of your choice, the Experiment.csv is saved to the Default Input/Output Folder and not to individual subfolders. All other per-image and per-object .csv files are saved to the appropriate subfolders. See Github issue #1110 for details.

  • Insert the metadata tag of choice into the output path. You can insert a previously defined metadata tag by either using:

    • The insert key
    • A right mouse button click inside the control
    • In Windows, the Context menu key, which is between the Windows key and Ctrl key

    The inserted metadata tag will appear in green. To change a previously inserted metadata tag, navigate the cursor to just before the tag and either:

    • Use the up and down arrows to cycle through possible values.
    • Right-click on the tag to display and select the available values.

    In this instance, you would select the metadata tag “Treatment1”

  • Uncheck “Export all measurements?

  • Uncheck “Use the object name for the file name?

  • Using the same approach as above, select the metadata tag “Treatment2”, and complete the filename by appending the text “.csv”.

Here’s an example table of the files that would be generated:
Treatment1 Treatment2 Path
1M_NaCl 20uM_DMSO 1M_NaCl/20uM_DMSO.csv
1M_NaCl 40uM_DMSO 1M_NaCl/40uM_DMSO.csv
2M_NaCl 20uM_DMSO 2M_NaCl/20uM_DMSO.csv
2M_NaCl 40uM_DMSO 2M_NaCl/40uM_DMSO.csv

Measurements made by this module

For details on the nomenclature used by CellProfiler for the exported measurements, see Help > General Help > How Measurements Are Named. See also ^^^^^^^^

See also ExportToDatabase.

(Jump to top)

LabelImages

LabelImages assigns plate metadata to image sets.

LabelImages assigns a plate number, well and site number to each image set based on the order in which they are processed. You can use Label Images to add plate and well metadata for images loaded using Order for “Image set matching order” in NamesAndTypes.

LabelImages assumes the following are true of the image order:

  • Each well has the same number of images (i.e., sites) per channel.
  • Each plate has the same number of rows and columns, so that the total number of images per plate is the same.

Supports 2D? Supports 3D? Respects masks?
YES NO NO

See also

See also the Metadata module.

Measurements made by this module

  • Metadata_Plate: The plate number, starting at 1 for the first plate.
  • Metadata_Well: The well name, e.g., A01.
  • Metadata_Row: The row name, starting with A for the first row.
  • Metadata_Column: The column number, starting with 1 for the first column.
  • Metadata_Site: The site number within the well, starting at 1 for the first site.

(Jump to top)

LoadData

LoadData loads text or numerical data to be associated with images, and can also load images specified by file names.

This module loads a file that supplies text or numerical data associated with the images to be processed, e.g., sample names, plate names, well identifiers, or even a list of image file names to be processed in the analysis run. Please note that most researchers will prefer to use the Input modules (i.e., Images, Metadata, NamesAndTypes and Groups) to load images.

Note that 3D images to be analyzed volumetrically CANNOT be loaded with LoadData; they must be loaded with the Input modules.

The module reads files in CSV (comma-separated values) format. These files can be produced by saving a spreadsheet from Excel as “Windows Comma Separated Values” file format. The lines of the file represent the rows, and each field in a row is separated by a comma. Text values may be optionally enclosed by double quotes. The LoadData module uses the first row of the file as a header. The fields in this row provide the labels for each column of data. Subsequent rows provide the values for each image cycle.

There are many reasons why you might want to prepare a CSV file and load it via LoadData. Below, we describe how the column nomenclature allows for special functionality for some downstream modules:

  • Columns whose name begins with Image_FileName or Image_PathName: A column whose name begins with “Image_FileName” or “Image_PathName” can be used to supply the file name and path name (relative to the base folder) of an image that you want to load, which will override the settings in the Input modules (Images, Metadata, NamesAndTypes and Groups). For instance, “Image_FileName_CY3” would supply the file name for the CY3-stained image, and choosing the Load images based on this data? option allows the CY3 images to be selected later in the pipeline. “Image_PathName_CY3” would supply the path names for the CY3-stained images. The path name column is optional; if all image files are in the base folder, this column is not needed.

  • Columns whose name begins with Image_ObjectsFileName or Image_ObjectsPathName: The behavior of these columns is identical to that of “Image_FileName” or “Image_PathName” except that it is used to specify an image that you want to load as objects.

  • Columns whose name begins with Metadata: A column whose name begins with “Metadata” can be used to group or associate files loaded by LoadData.

    For instance, an experiment might require that images created on the same day use an illumination correction function calculated from all images from that day, and furthermore, that the date be captured in the file names for the individual image sets and in a CSV file specifying the illumination correction functions.

    In this case, if the illumination correction images are loaded with the LoadData module, the file should have a “Metadata_Date” column which contains the date metadata tags. Similarly, if the individual images are loaded using the LoadImages module, LoadImages should be set to extract the metadata tag from the file names (see LoadImages for more details on how to do so). The pipeline will then match the individual image with their corresponding illumination correction functions based on matching “Metadata_Date” tags. This is useful if the same data is associated with several images (for example, multiple images obtained from a single well).

  • Columns whose name begins with Series or Frame: A column whose name begins with “Series” or “Frame” refers to information about image stacks or movies. The name of the image within CellProfiler appears after an underscore character. For example, “Frame_DNA” would supply the frame number for the movie/image stack file specified by the “Image_FileName_DNA” and “Image_PathName_DNA” columns.

    Using a CSV for loading frames and/or series from an movie/image stack allows you more flexibility in assembling image sets for operations that would difficult or impossible using the Input modules alone. For example, if you wanted to analyze a movie of 1,000 frames by computing the difference between frames, you could create two image columns in a CSV, one for loading frames 1,2,…,999, and the other for loading frames 2,3,…,1000. In this case, CellProfiler would load the frame and its predecessor for each cycle and ImageMath could be used to create the difference image for downstream use.

  • Columns that contain dose-response or positive/negative control information: The CalculateStatistics module can calculate metrics of assay quality for an experiment if provided with information about which images represent positive and negative controls and/or what dose of treatment has been used for which images. This information is provided to CalculateStatistics via the LoadData module, using particular formats described in the help for CalculateStatistics. Again, using LoadData is useful if the same data is associated with several images (for example, multiple images obtained from a single well).

  • Columns with any other name: Columns of data beginning with any other text will be loaded into CellProfiler and then exported as a per-image measurement along with CellProfiler-calculated data. This is a convenient way for you to add data from your own sources to the files exported by CellProfiler.


Supports 2D? Supports 3D? Respects masks?
YES NO YES

See also

See also the Input modules (i.e., Images, Metadata, NamesAndTypes and Groups), LoadImages and CalculateStatistics.

Example CSV file

Image_FileName_FITC Image_PathName_FITC Metadata_Plate Titration_NaCl_uM
“04923_d1.tif” “2009-07-08” “P-12345” 750
“51265_d1.tif” “2009-07-09” “P-12345” 2750

After the first row of header information (the column names), the first image-specific row specifies the file, “2009-07-08/04923_d1.tif” for the FITC image (2009-07-08 is the name of the subfolder that contains the image, relative to the Default Input Folder). The plate metadata is “P-12345” and the NaCl titration used in the well is 750 uM. The second image-specific row has the values “2009-07-09/51265_d1.tif”, “P-12345” and 2750 uM. The NaCl titration for the image is available for modules that use numeric metadata, such as CalculateStatistics; “Titration” will be the category and “NaCl_uM” will be the measurement.

Using metadata in LoadData

If you would like to use the metadata-specific settings, please see Help > General help > Using metadata in CellProfiler for more details on metadata usage and syntax. Briefly, LoadData can use metadata provided by the input CSV file for grouping similar images together for the analysis run and for metadata-specfic options in other modules; see the settings help for Group images by metadata and, if that setting is selected, Select metadata tags for grouping for details.

Using MetaXpress-acquired images in CellProfiler

To produce a CSV file containing image location and metadata from a MetaXpress imaging run, do the following:

  • Collect image locations from all files that match the string .tif in the desired image folder, one row per image.
  • Split up the image pathname and filename into separate data columns for LoadData to read.
  • Remove data rows corresponding to:
    • Thumbnail images (do not contain imaging data)
    • Duplicate images (will cause metadata mismatching)
    • Corrupt files (will cause failure on image loading)
  • The image data table may be linked to metadata contained in plate maps. These plate maps should be stored as flat files, and may be updated periodically via queries to a laboratory information management system (LIMS) database.
  • The complete image location and metadata is written to a CSV file where the headers can easily be formatted to match LoadData’s input requirements (see column descriptions above). Single plates split across multiple directories (which often occurs in MetaXpress) are written to separate files and then merged, thereby removing the discontinuity.

More tips on using LoadData

For a GUI-based approach to creating a proper CSV file for use with LoadData, we suggest using KNIME or Pipeline Pilot.

For more details on configuring CellProfiler (and LoadData in particular) for a LIMS environment, please see our wiki on the subject.

Measurements made by this module

  • Pathname, Filename: The full path and the filename of each image, if you requested image loading.
  • Scaling: The maximum possible intensity value for the image format.
  • Height, Width: The height and width of images loaded by this module.
  • Any additional per-image data loaded from the input file you provided.

(Jump to top)

SaveCroppedObjects

SaveCroppedObjects exports each object as a binary image. Pixels corresponding to an exported object are assigned the value 255. All other pixels (i.e., background pixels and pixels corresponding to other objects) are assigned the value 0. The dimensions of each image are the same as the original image.

The filename for an exported image is formatted as “{object name}_{label index}.{image_format}”, where object name is the name of the exported objects, label index is the integer label of the object exported in the image (starting from 1).


Supports 2D? Supports 3D? Respects masks?
YES NO YES

(Jump to top)

SaveImages

SaveImages saves image or movie files.

Because CellProfiler usually performs many image analysis steps on many groups of images, it does not save any of the resulting images to the hard drive unless you specifically choose to do so with the SaveImages module. You can save any of the processed images created by CellProfiler during the analysis using this module.

You can choose from many different image formats for saving your files. This allows you to use the module as a file format converter, by loading files in their original format and then saving them in an alternate format.


Supports 2D? Supports 3D? Respects masks?
YES YES YES

See also

See also NamesAndTypes.

(Jump to top)

Deprecated

These File Processing modules are considered deprecated and will be removed in a future release of CellProfiler. It is not recommended that you build new pipelines containing these modules.

LoadImages

LoadImages allows you to specify which images or movies are to be loaded and in which order.

This module tells CellProfiler where to retrieve images and gives each image a meaningful name by which other modules can access it. You can also use LoadImages to extract or define the relationships between images and their associated metadata. For example, you could load a group of images (such as three channels that represent the same field of view) together for processing in a single CellProfiler cycle. Finally, you can use this module to retrieve a label matrix and give the collection of objects a meaningful name.

Disclaimer: Please note that the Input modules (i.e., Images, Metadata, NamesAndTypes and Groups) largely supersedes this module. However, old pipelines loaded into CellProfiler that contain this module will provide the option of preserving them; these pipelines will operate exactly as before.

When used in combination with a SaveImages module, you can load images in one file format and save them in another, using CellProfiler as a file format converter.


Supports 2D? Supports 3D? Respects masks?
YES NO NO

See also

See also the Input modules (Images, NamesAndTypes, MetaData, Groups), LoadData, LoadSingleImage, and SaveImages.

Using metadata in LoadImages

If you would like to use the metadata-specific settings, please see Help > General help > Using metadata in CellProfiler for more details on metadata usage and syntax. Briefly, LoadImages can extract metadata from the image filename using pattern-matching strings, for grouping similar images together for the analysis run and for metadata-specific options in other modules; see the settings help for Where to extract metadata, and if an option for that setting is selected, Regular expression that finds metadata in the file name for the necessary syntax.

Measurements made by this module

  • Pathname, Filename: The full path and the filename of each image.
  • Metadata: The metadata information extracted from the path and/or filename, if requested.
  • Scaling: The maximum possible intensity value for the image format.
  • Height, Width: The height and width of the current image.

(Jump to top)

LoadSingleImage

LoadSingleImage loads a single image for use in all image cycles.

This module tells CellProfiler where to retrieve a single image and gives the image a meaningful name by which the other modules can access it. The module executes only the first time through the pipeline; thereafter the image is accessible to all subsequent processing cycles. This is particularly useful for loading an image like an illumination correction image for use by the CorrectIlluminationApply module, when that single image will be used to correct all images in the analysis run.

Disclaimer: Please note that the Input modules (i.e., Images, Metadata, NamesAndTypes and Groups) largely supersedes this module. However, old pipelines loaded into CellProfiler that contain this module will provide the option of preserving them; these pipelines will operate exactly as before.


Supports 2D? Supports 3D? Respects masks?
YES NO NO

See also

See also the Input modules (Images, NamesAndTypes, MetaData, Groups), LoadImages, and LoadData.

Measurements made by this module

  • Pathname, Filename: The full path and the filename of each image.
  • Metadata: The metadata information extracted from the path and/or filename, if requested.
  • Scaling: The maximum possible intensity value for the image format.
  • Height, Width: The height and width of images loaded by this module.

Technical notes

For most purposes, you will probably want to use the LoadImages module, not LoadSingleImage. The reason is that LoadSingleImage does not actually create image sets (or even a single image set). Instead, it adds the single image to every image cycle for an already existing image set. Hence LoadSingleImage should never be used as the only image-loading module in a pipeline; attempting to do so will display a warning message in the module settings.

If you have a single file to load in the pipeline (and only that file), you will want to use LoadImages or LoadData with a single, hardcoded file name.

(Jump to top)