Module: ExportToDatabase

Export To Database exports data directly to a database, or in database readable format, including an imported file with column names and a CellProfiler Analyst properties file, if desired.
This module exports measurements directly to a database or to a SQL-compatible format. It allows you to create and import MySQL and associated data files into a database and gives you the option of creating a properties file for use with CellProfiler Analyst. Optionally, you can create an SQLite database file if you do not have a server on which to run MySQL itself.

This module must be run at the end of a pipeline, or second to last if you are using the CreateBatchFiles module. If you forget this module, you can also run the ExportDatabase data tool after processing is complete; its functionality is the same.

The database is set up with two primary tables. These tables are the Per_Image table and the Per_Object table (which may have a prefix if you specify):

Typically, if multiple types of objects are identified and measured in a pipeline, the numbers of those objects are equal to each other. For example, in most pipelines, each nucleus has exactly one cytoplasm, so the first row of the Per-Object table contains all of the information about object #1, including both nucleus- and cytoplasm-related measurements. If this one-to-one correspondence is not the case for all objects in the pipeline (for example, if dozens of speckles are identified and measured for each nucleus), then you must configure ExportToDatabase to export only objects that maintain the one-to-one correspondence (for example, export only Nucleus and Cytoplasm, but omit Speckles).

If you have extracted "Plate" and "Well" metadata from image filenames or loaded "Plate" and "Well" metadata via the Metadata or LoadData modules, you can ask CellProfiler to create a "Per_Well" table, which aggregates object measurements across wells. This option will output a SQL file (regardless of whether you choose to write directly to the database) that can be used to create the Per_Well table. At the secure shell where you normally log in to MySQL, type the following, replacing the italics with references to your database and files:

mysql -h hostname -u username -p databasename < pathtohttp://d1zymp9ayga15t.cloudfront.net/images/perwellsetupfile.SQL

The commands written by CellProfiler to create the Per_Well table will be executed.

Oracle is not fully supported at present; you can create your own Oracle DB using the .csv output option and writing a simple script to upload to the database.

Available measurements

For details on the nomenclature used by CellProfiler for the exported measurements, see Help > General Help > How Measurements Are Named.

See also ExportToSpreadsheet.

Settings:

Database type

Specify the type of database you want to use:
  If running this module on a computing cluster, there are a few considerations to note:
  • The MySQL option is well-suited for cluster use, since multiple jobs can write to the database simultaneously.
  • The SQLite option is not as appropriate; a SQLite database only allows access by one job at a time.

Experiment name

Select a name for the experiment. This name will be registered in the database and linked to the tables that ExportToDatabase creates. You will be able to select the experiment by name in CellProfiler Analyst and will be able to find the experiment's tables through database queries.

Database name

Select a name for the database you want to use

SQL file prefix

(Used if MySQL / CSV is selected as the database type)
Enter the prefix to be used to name the SQL file.

Name the SQLite database file

(Used if SQLite selected as database type)
Enter the name of the SQLite database filename to which you want to write.

Overwrite without warning?

ExportToDatabase creates tables and databases at the start of a run when writing directly to a MySQL or SQLite database. It writes SQL scripts and CSVs when not writing directly. It also can write CellProfiler Analysis property files. In some cases, it is appropriate to run CellProfiler and append to or overwrite the data in existing tables, for instance when running several CellProfiler instances which each process a range of the experiment's image sets. In other cases, such as when the measurements to be written have changed, the data tables must be dropped completely.
You can choose fromm three options to conrtol overwriting behavior:

Add a prefix to table names?

Select whether you want to add a prefix to your table names. The default table names are Per_Image for the per-image table and Per_Object for the per-object table. Adding a prefix can be useful for bookkeeping purposes. Whether you chose to use a prefix or not, CellProfiler will warn you if your choice entails overwriting an existing table.

Table prefix

(Used if Add a prefix to table names? is selected)
Enter the table prefix you want to use.

MySQL has a 64 character limit on the full name of the table. If the combination of the table name and prefix exceeds this limit, you will receive an error associated with this setting.

Create a CellProfiler Analyst properties file?

Select Yes to generate a template properties file that will allow you to use your new database with CellProfiler Analyst (a data exploration tool which can also be downloaded from http://www.cellprofiler.org/). The module will attempt to fill in as many as the entries as possible based on the pipeline's settings, including the server name, username and password if MySQL is used.

Which objects should be used for locations?

(Used only if creating a properties file)
CellProfiler Analyst displays cells during classification. This setting determines which object centers will be used as the center of the cells to be displayed. Choose one of the listed objects and CellProfiler will save that object's location columns in the properties file so that CellProfiler Analyst centers cells using that object's center.

You can manually change this choice in the properties file by edting the cell_x_loc and cell_y_loc properties.

Note that if there are no objects defined in the pipeline (e.g. if only using MeasureImageQuality and/or Illumination Correction modules), a warning will diplay until you choose 'None' for the subsequent setting: 'Export measurements for all objects to the database?'.

Access CPA images via URL?

(Used only if creating a properties file)
The image paths written to the database will be the absolute path the the image files on your computer. If you plan to make these files accessible via the web, you can have CellProfiler Analyst prepend a URL to your file name. Eg: If an image is loaded from the path "/cellprofilerhttp://d1zymp9ayga15t.cloudfront.net/images/" and you use a url prepend of "http://mysite.com/", CellProfiler Analyst will look for your file at "http://mysite.com/cellprofilerhttp://d1zymp9ayga15t.cloudfront.net/images/"

Enter an image url prepend if you plan to access your files via http

(Used only if accessing CellProfiler Analyst images via URL)
The image paths written to the database will be the absolute path the the image files on your computer. If you plan to make these files accessible via the web, you can enter a url prefix here. Eg: If an image is loaded from the path "/cellprofilerhttp://d1zymp9ayga15t.cloudfront.net/images/" and you use a url prepend of "http://mysite.com/", CellProfiler Analyst will look for your file at "http://mysite.com/cellprofilerhttp://d1zymp9ayga15t.cloudfront.net/images/"

If you are not using the web to access your files (i.e., they are locally aceesible by your computer), leave this setting blank.

Select the plate type

(Used only if creating a properties file)
If you are using a multi-well plate or microarray, you can select the plate type here. Supported types in CellProfiler Analyst are 96- and 384-well plates, as well as 5600-spot microarrays. If you are not using a plate or microarray, select None.

Select the plate metadata

(Used only if creating a properties file)
If you are using a multi-well plate or microarray, you can select the metadata corresponding to the plate here. If there is no plate metadata associated with the image set, select None.

Please see the Metadata module for more details on metadata collection and usage.

Select the well metadata

(Used only if creating a properties file)
If you are using a multi-well plate or microarray, you can select the metadata corresponding to the well here. If there is no well metadata associated with the image set, select None.

Please see the Metadata module for more details on metadata collection and usage.

Include information for all images, using default values?

(Used only if creating a properties file)
Select Yes to include information in the properties file for all images. This option will do the following:

Select No to specify which images should be included or to override the automatic values.

Select an image to include

(Used only if creating a properties file and specifiying the image information)
Choose image name to include it in the properties file of images.

The images in the drop-down correspond to images that have been:

If you do not see your desired image listed, check the settings on these modules.

Use the image name for the display?

(Used only if creating a properties file and specifiying the image information)
Select Yes to use the image name as given above for the displayed name.

Select No to name the file yourself.

Image name

(Used only if creating a properties file, specifiying the image information and naming the image)
Enter a name for the specified image

Channel color

(Used only if creating a properties file and specifiying the image information)
Enter a color to display this channel.

Do you want to add group fields?

(Used only if creating a properties file)
Please note that "groups" as defined by CellProfiler Analyst has nothing to do with "grouping" as defined by CellProfiler in the Groups module.

Select Yes to define a "group" for your image data (for example, when several images represent the same experimental sample), by providing column(s) that identify unique images (the image key) to another set of columns (the group key).

The format for a group in CPA is:
group_SQL_<XXX> = <MySQL SELECT statement that returns image-key columns followed by group-key columns> For example, if you wanted to be able to group your data by unique plate names, you could define a group called SQL_Plate as follows:
group_SQL_Plate = SELECT ImageNumber, Image_Metadata_Plate FROM Per_Image

Grouping is useful, for example, when you want to aggregate counts for each class of object and their scores on a per-group basis (e.g., per-well) instead of on a per-image basis when scoring with Classifier. It will also provide new options in the Classifier fetch menu so you can fetch objects from images with specific values for the group columns.

Enter the name of the group

(Used only if creating a properties file and specifiying an image data group)
Enter a name for the group. Only alphanumeric characters and underscores are permitted.

Enter the per-image columns which define the group, separated by commas

(Used only if creating a properties file and specifiying an image data group)
To define a group, enter the image key columns followed by group key columns, each separated by commas.

In CellProfiler, the image key column is always given the name as ImageNumber; group keys are typically metadata columns which are always prefixed with Image_Metadata_. For example, if you wanted to be able to group your data by unique plate and well metadata tags, you could define a group with the following MySQL statement:
group_SQL_Plate = SELECT ImageNumber, Image_Metadata_Plate, Image_Metadata_Well FROM Per_Image
For this example, the columns to enter in this setting would be:
ImageNumber, Image_Metadata_Plate, Image_Metadata_Well

Groups are specified as MySQL statements in the properties file, but please note that the full SELECT and FROM clauses will be added automatically, so there is no need to enter them here.

Do you want to add filter fields?

(Used only if creating a properties file)
Select Yes to specify a subset of the images in your experiment by defining a filter. Filters are useful, for example, for fetching and scoring objects in Classifier or making graphs using the plotting tools that satisfy a specific metadata contraint.

Automatically create a filter for each plate?

(Used only if creating a properties file and specifiying an image data filter)
If you have specified a plate metadata tag, selecting Yes to create a set of filters in the properties file, one for each plate.

Enter a phenotype class table name if using the classifier tool

(Used only if creating a properties file)
If you are using the machine-learning tool in CellProfiler Analyst, you can create an additional table in your database which contains the per-object phenotype labels. This table is produced after scoring all the objects in your data set and will be named with the label given here.

You can manually change this choice in the properties file by edting the class_table field. Leave this field blank if you are not using the classifier or do not need the table written to the database.

Output file location

(Used only when using a CSV or a SQLite database, and/or creating a properties or workspace file)
This setting determines where the CSV files or SQLite database is saved if you decide to write measurements to files instead of writing them directly to the database. If you request a CellProfiler Analyst properties file or workspace file, it will also be saved to this location. You can choose among the following options which are common to all file input/output modules:

Elsewhere and the two sub-folder options all require you to enter an additional path name. You can use an absolute path (such as "C:\imagedir\image.tif" on a PC) or a relative path to specify the file location relative to a directory):

For Elsewhere..., Default Input Folder sub-folder and Default Output Folder sub-folder, if you have metadata associated with your images via Metadata module, you can name the folder using metadata tags. You can insert a previously defined metadata tag by either using:

The inserted metadata tag will appear in green. To change a previously inserted metadata tag, navigate the cursor to just before the tag and either:
For instance, if you have a metadata tag named "Plate", you can create a per-plate folder by selecting one of the subfolder options and then specifying the subfolder name with the "Plate" metadata tag. The module will substitute the metadata values for the last image set processed for any metadata tags in the folder name. Please see the Metadata module for more details on metadata collection and usage.

Create a CellProfiler Analyst workspace file?

Select Yes to generate a workspace file for use with CellProfiler Analyst, a data exploration tool which can also be downloaded from http://www.cellprofiler.org/. A workspace file allows you to open a selected set of measurements with the display tools of your choice. This is useful, for example, if you want examine a standard set of quality control image measurements for outliers.

Select the measurement display tool

(Used only if creating a workspace file)
Select what display tool in CPA you want to use to open the measurements.

Type of measurement to plot on the X-axis

(Used only if creating a workspace file)
You can plot two types of measurements:

Enter the object name

(Used only if creating a workspace file)
Select the object that you want to measure from the list. This should be an object created by a previous module such as IdentifyPrimaryObjects, IdentifySecondaryObjects, or IdentifyTertiaryObjects.

Select the X-axis measurement

(Used only if creating a workspace file)
Select the measurement to be plotted on the desired axis.

Type of measurement to plot on the Y-axis

(Used only if creating a workspace file)
You can plot two types of measurements:

Select the Y-axis measurement

(Used only if creating a workspace file)
Select the measurement to be plotted on the desired axis.

Calculate the per-image mean values of object measurements?

Select Yes for ExportToDatabase to calculate population statistics over all the objects in each image and store the results in the database. For instance, if you are measuring the area of the Nuclei objects and you check the box for this option, ExportToDatabase will create a column in the Per_Image table called "Mean_Nuclei_AreaShape_Area".

You may not want to use ExportToDatabase to calculate these population statistics if your pipeline generates a large number of per-object measurements; doing so might exceed database column limits. These columns can be created manually for selected measurements directly in MySQL. For instance, the following SQL command creates the Mean_Nuclei_AreaShape_Area column:

ALTER TABLE Per_Image ADD (Mean_Nuclei_AreaShape_Area); UPDATE Per_Image SET Mean_Nuclei_AreaShape_Area = (SELECT AVG(Nuclei_AreaShape_Area) FROM Per_Object WHERE Per_Image.ImageNumber = Per_Object.ImageNumber);

Calculate the per-well mean values of object measurements?

Select Yes for ExportToDatabase to calculate statistics over all the objects in each well and store the results as columns in a "per-well" table in the database. For instance, if you are measuring the area of the Nuclei objects and you check the aggregate mean box in this module, ExportToDatabase will create a table in the database called "Per_Well_avg", with a column called "Mean_Nuclei_AreaShape_Area". Selecting all three aggregate measurements will create three per-well tables, one for each of the measurements.

The per-well functionality will create the appropriate lines in a .SQL file, which can be run on your Per-Image and Per-Object tables to create the desired per-well table.

Note: this option is only available if you have extracted plate and well metadata from the filename using the Metadata or LoadData modules. It will write out a .sql file with the statements necessary to create the Per_Well table, regardless of the option chosen above. Please see the Metadata module for more details on metadata collection and usage

Calculate the per-well median values of object measurements?

Select Yes for ExportToDatabase to calculate statistics over all the objects in each well and store the results as columns in a "per-well" table in the database. For instance, if you are measuring the area of the Nuclei objects and you check the aggregate median box in this module, ExportToDatabase will create a table in the database called "Per_Well_median", with a column called "Median_Nuclei_AreaShape_Area". Selecting all three aggregate measurements will create three per-well tables, one for each of the measurements.

The per-well functionality will create the appropriate lines in a .SQL file, which can be run on your Per-Image and Per-Object tables to create the desired per-well table.

Note: this option is only available if you have extracted plate and well metadata from the filename using the Metadata or LoadData modules. It will write out a .sql file with the statements necessary to create the Per_Well table, regardless of the option chosen above. Please see the Metadata module for more details on metadata collection and usage

Calculate the per-well standard deviation values of object measurements?

Select Yes for ExportToDatabase to calculate statistics over all the objects in each well and store the results as columns in a "per-well" table in the database. For instance, if you are measuring the area of the Nuclei objects and you check the aggregate standard deviation box in this module, ExportToDatabase will create a table in the database called "Per_Well_std", with a column called "Mean_Nuclei_AreaShape_Area". Selecting all three aggregate measurements will create three per-well tables, one for each of the measurements.

The per-well functionality will create the appropriate lines in a .SQL file, which can be run on your Per-Image and Per-Object tables to create the desired per-well table.

Note: this option is only available if you have extracted plate and well metadata from the filename using the Metadata or LoadData modules. It will write out a .sql file with the statements necessary to create the Per_Well table, regardless of the option chosen above. Please see the Metadata module for more details on metadata collection and usage

Export measurements for all objects to the database?

This option lets you choose the objects whose measurements will be saved in the Per_Object and Per_Well(s) database tables.

Select the objects

(Used only if Select is chosen for adding objects)
Choose one or more objects from this list (click using shift or command keys to select multiple objects). The list includes the objects that were created by prior modules. If you choose an object, its measurements will be written out to the Per_Object and/or Per_Well(s) tables, otherwise, the object's measurements will be skipped.

Create one table per object, a single object table or a single object view?

ExportToDatabase can create either one table for each type of object exported or a single object table.

Maximum # of characters in a column name

This setting limits the number of characters that can appear in the name of a field in the database. MySQL has a limit of 64 characters per field, but also has an overall limit on the number of characters in all of the columns of a table. ExportToDatabase will shorten all of the column names by removing characters, at the same time guaranteeing that no two columns have the same name.

Write image thumbnails directly to the database?

(Used only if MySQL or SQLite are selected as database type)
Select Yes if you'd like to write image thumbnails directly into the database. This will slow down the writing step, but will enable new functionality in CellProfiler Analyst such as quickly viewing images in the Plate Viewer tool by selecting "thumbnail" from the "Well display" dropdown.

Select the images for which you want to save thumbnails

(Used only if MySQL or SQLite are selected as database type and writing thumbnails is selected)
Select the images that you wish to save as thumbnails to the database. Make multiple selections by using Ctrl-Click (Windows) or Command-Click (Mac);

Auto-scale thumbnail pixel intensities?

(Used only if MySQL or SQLite are selected as database type and writing thumbnails is selected)
Select Yes if you'd like to automatically rescale the thumbnail pixel intensities to the range 0-1, where 0 is black/unsaturated, and 1 is white/saturated.