Export To Database exports data directly to a database, or in database readable format, including an imported file with column names and a CellProfiler Analyst properties file, if desired.
This module exports measurements directly to a database or to a SQL-compatible format. It allows you to create and import MySQL and associated data files into a database and gives you the option of creating a properties file for use with CellProfiler Analyst. Optionally, you can create an SQLite database file if you do not have a server on which to run MySQL itself.
This module must be run at the end of a pipeline, or second to last if you are using the CreateBatchFiles module. If you forget this module, you can also run the ExportDatabase data tool after processing is complete; its functionality is the same.
The database is set up with two primary tables. These tables are the Per_Image table and the Per_Object table (which may have a prefix if you specify):
- The Per_Image table consists of all the per-image measurements made during the pipeline, plus per-image population statistics (such as mean, median, and standard deviation) of the object measurements. There is one per_image row for every "cycle" that CellProfiler processes (a cycle is usually a single field of view, and a single cycle usually contains several image files, each representing a different channel of the same field of view).
- The Per_Object table contains all the measurements for individual objects. There is one row of object measurements per object identified. The two tables are connected with the primary key column ImageNumber, which indicates the image to which each object belongs. The Per_Object table has another primary key called ObjectNumber, which is unique to each image.
Typically, if multiple types of objects are identified and measured in a pipeline, the numbers of those objects are equal to each other. For example, in most pipelines, each nucleus has exactly one cytoplasm, so the first row of the Per-Object table contains all of the information about object #1, including both nucleus- and cytoplasm-related measurements. If this one-to-one correspondence is not the case for all objects in the pipeline (for example, if dozens of speckles are identified and measured for each nucleus), then you must configure ExportToDatabase to export only objects that maintain the one-to-one correspondence (for example, export only Nucleus and Cytoplasm, but omit Speckles).
If you have extracted "Plate" and "Well" metadata from image filenames or loaded "Plate" and "Well" metadata via the Metadata or LoadData modules, you can ask CellProfiler to create a "Per_Well" table, which aggregates object measurements across wells. This option will output a SQL file (regardless of whether you choose to write directly to the database) that can be used to create the Per_Well table. At the secure shell where you normally log in to MySQL, type the following, replacing the italics with references to your database and files:
mysql -h hostname -u username -p databasename < pathtohttp://d1zymp9ayga15t.cloudfront.net/images/perwellsetupfile.SQL
The commands written by CellProfiler to create the Per_Well table will be executed.
Oracle is not fully supported at present; you can create your own Oracle DB using the .csv output option and writing a simple script to upload to the database.
Available measurements
For details on the nomenclature used by CellProfiler for the exported measurements, see
Help > General Help > How Measurements Are Named.
See also ExportToSpreadsheet.
Settings:
Database type
Specify the type of database you want to use:
- MySQL: Writes the data directly to a MySQL
database. MySQL is open-source software; you may require help from
your local Information Technology group to set up a database
server.
- MySQL / CSV: Writes a script file that
contains SQL statements for creating a database and uploading the
Per_Image and Per_Object tables. This option will write out the Per_Image
and Per_Object table data to two CSV files; you can use these files can be
used to import the data directly into an application
that accepts CSV data.
- SQLite: Writes SQLite files directly.
SQLite is simpler to set up than MySQL and
can more readily be run on your local computer rather than requiring a
database server. More information about SQLite can be found
here.
-
If running this module on a computing cluster, there are a few
considerations to note:
- The MySQL option is well-suited for cluster use, since
multiple jobs can write to the database simultaneously.
- The SQLite option is not
as appropriate; a SQLite database only allows access by one job at a time.
Experiment name
Select a name for the experiment. This name will be
registered in the database and linked to the tables that
ExportToDatabase creates. You will be able to select the experiment
by name in CellProfiler Analyst and will be able to find the
experiment's tables through database queries.
Database name
Select a name for the database you want to use
SQL file prefix
(Used if MySQL / CSV is selected as the database type)
Enter the prefix to be used to name the SQL file.
Name the SQLite database file
(Used if SQLite selected as database type)
Enter the name of the SQLite database filename to which you want to write.
Overwrite without warning?
ExportToDatabase creates tables and databases at the start
of a run when writing directly to a MySQL or SQLite database. It
writes SQL scripts and CSVs when not writing directly. It also
can write CellProfiler Analysis property files. In some cases,
it is appropriate to run CellProfiler and append to or overwrite
the data in existing tables, for instance when running several
CellProfiler instances which each process a range of the experiment's
image sets. In other cases, such as when the measurements to be written
have changed, the data tables must be dropped completely.
You can choose fromm three options to conrtol overwriting behavior:
- Never: ExportToDatabase will
ask before dropping and recreating tables unless you are running
headless. CellProfiler will exit if running headless if the tables
exist and this option is chosen.
- Data only: ExportToDatabase will keep
the existing tables if present and will overwrite the data. Choose
Data only if you are breaking your experiment
into ranges of image sets and running each range on a separate
instance of CellProfiler.
- Data and schema: ExportToDatabase will
drop previous versions of tables at the start of a run. This option
is appropriate if you are using the CreateBatchFiles module;
your tables will be created by the run that creates the batch
data file. The actual analysis runs that utilize the
Batch_data
file will use the existing tables without trying to recreate them.
Add a prefix to table names?
Select whether you want to add a prefix to your table names. The
default table names are
Per_Image for the per-image table and
Per_Object for the per-object table. Adding a prefix can be useful
for bookkeeping purposes.
- Select Yes to add a user-specified prefix to the default table names.
If you want to distinguish multiple sets of data written to the same
database, you probably want to use a prefix.
- Select No to use the default table names. For a one-time export of
data, this option is fine.
Whether you chose to use a prefix or not, CellProfiler will warn
you if your choice entails overwriting an existing table.
Table prefix
(Used if Add a prefix to table names? is selected)
Enter the table prefix you want to use.
MySQL has a 64 character limit on the full name of the table.
If the combination of the table name and prefix exceeds this
limit, you will receive an error associated with this setting.
Create a CellProfiler Analyst properties file?
Select
Yes to generate a template properties file that will allow you to use
your new database with CellProfiler Analyst (a data
exploration tool which can also be downloaded from
http://www.cellprofiler.org/).
The module will attempt to fill in as many as the entries as possible
based on the pipeline's settings, including the
server name, username and password if MySQL is used.
Which objects should be used for locations?
(Used only if creating a properties file)
CellProfiler Analyst displays cells during classification. This
setting determines which object centers will be used as the center
of the cells to be displayed. Choose one of the listed objects
and CellProfiler will save that object's location columns in
the properties file so that CellProfiler Analyst centers cells
using that object's center.
You can manually change this choice in the properties file by
edting the cell_x_loc and cell_y_loc properties.
Note that if there are no objects defined in the pipeline (e.g.
if only using MeasureImageQuality and/or Illumination Correction modules),
a warning will diplay until you choose 'None' for the subsequent setting:
'Export measurements for all objects to the database?'.
Access CPA images via URL?
(Used only if creating a properties file)
The image paths written to the database will be the absolute
path the the image files on your computer. If you plan to make these
files accessible via the web, you can have CellProfiler Analyst prepend
a URL to your file name.
Eg: If an image is loaded from the path "/cellprofilerhttp://d1zymp9ayga15t.cloudfront.net/images/" and you use
a url prepend of "http://mysite.com/", CellProfiler Analyst will look
for your file at "http://mysite.com/cellprofilerhttp://d1zymp9ayga15t.cloudfront.net/images/"
Enter an image url prepend if you plan to access your files via http
(Used only if accessing CellProfiler Analyst images via URL)
The image paths written to the database will be the absolute
path the the image files on your computer. If you plan to make these
files accessible via the web, you can enter a url prefix here. Eg:
If an image is loaded from the path "/cellprofilerhttp://d1zymp9ayga15t.cloudfront.net/images/" and you use
a url prepend of "http://mysite.com/", CellProfiler Analyst will look
for your file at "http://mysite.com/cellprofilerhttp://d1zymp9ayga15t.cloudfront.net/images/"
If you are not using the web to access your files (i.e., they are locally
aceesible by your computer), leave this setting blank.
Select the plate type
(Used only if creating a properties file)
If you are using a multi-well plate or microarray, you can select the plate
type here. Supported types in CellProfiler Analyst are 96- and 384-well plates,
as well as 5600-spot microarrays. If you are not using a plate or microarray, select
None.
Select the plate metadata
(Used only if creating a properties file)
If you are using a multi-well plate or microarray, you can select the metadata corresponding
to the plate here. If there is no plate metadata associated with the image set, select
None.
Please see the Metadata module for more details on metadata collection and usage.
Select the well metadata
(Used only if creating a properties file)
If you are using a multi-well plate or microarray, you can select the metadata corresponding
to the well here. If there is no well metadata associated with the image set, select
None.
Please see the Metadata module for more details on metadata collection and usage.
Include information for all images, using default values?
(Used only if creating a properties file)
Select
Yes to include information in the properties file for all images. This
option will do the following:
- All images loaded using the Input modules or saved in SaveImages will be included.
- The CellProfiler image name will be used for the image_name field.
- A channel color listed in the image_channel_colors field will be assigned to the image by default order.
Select No to specify which images should be included or to override the automatic values.
Select an image to include
(Used only if creating a properties file and specifiying the image information)
Choose image name to include it in the properties file of images.
The images in the drop-down correspond to images that have been:
- Loaded using one of the Load modules.
- Saved with the SaveImages module, with the corresponding file and path information stored.
If you do not see your desired image listed, check the settings on these modules.
Use the image name for the display?
(Used only if creating a properties file and specifiying the image information)
Select
Yes to use the image name as given above for the displayed name.
Select No to name the file yourself.
Image name
(Used only if creating a properties file, specifiying the image information and naming the image)
Enter a name for the specified image
Channel color
(Used only if creating a properties file and specifiying the image information)
Enter a color to display this channel.
Do you want to add group fields?
(Used only if creating a properties file)
Please note that "groups" as defined by CellProfiler Analyst has nothing to do with "grouping" as defined by
CellProfiler in the Groups module.
Select Yes to define a "group" for your image data (for example, when several images represent the same experimental
sample), by providing column(s) that identify unique images (the image key) to another set of columns
(the group key).
The format for a group in CPA is:
group_SQL_<XXX> = <MySQL SELECT statement that returns image-key columns followed by group-key columns>
For example, if you wanted to be able to group your data by unique plate names, you could define a group called
SQL_Plate as follows:
group_SQL_Plate = SELECT ImageNumber, Image_Metadata_Plate FROM Per_Image
Grouping is useful, for example, when you want to aggregate counts for each class of object and their scores
on a per-group basis (e.g., per-well) instead of on a per-image basis when scoring with Classifier. It will
also provide new options in the Classifier fetch menu so you can fetch objects from images with specific
values for the group columns.
Enter the name of the group
(Used only if creating a properties file and specifiying an image data group)
Enter a name for the group. Only alphanumeric characters and underscores are permitted.
Enter the per-image columns which define the group, separated by commas
(Used only if creating a properties file and specifiying an image data group)
To define a group, enter the image key columns followed by group key columns, each separated by commas.
In CellProfiler, the image key column is always given the name as ImageNumber; group keys
are typically metadata columns which are always prefixed with Image_Metadata_. For example, if you
wanted to be able to group your data by unique plate and well metadata tags, you could define a
group with the following MySQL statement:
group_SQL_Plate = SELECT ImageNumber, Image_Metadata_Plate, Image_Metadata_Well FROM Per_Image
For this example, the columns to enter in this setting would be:
ImageNumber, Image_Metadata_Plate, Image_Metadata_Well
Groups are specified as MySQL statements in the properties file, but please note that the full SELECT and
FROM clauses will be added automatically, so there is no need to enter them here.
Do you want to add filter fields?
(Used only if creating a properties file)
Select Yes to specify a subset of the images in your experiment by defining a filter.
Filters are useful, for example, for fetching and scoring objects in Classifier or making graphs using the
plotting tools that satisfy a specific metadata contraint.
Automatically create a filter for each plate?
(Used only if creating a properties file and specifiying an image data filter)
If you have specified a plate metadata tag, selecting Yes to create a set of filters
in the properties file, one for each plate.
Enter a phenotype class table name if using the classifier tool
(Used only if creating a properties file)
If you are using the machine-learning tool in CellProfiler Analyst,
you can create an additional table in your database which contains
the per-object phenotype labels. This table is produced after scoring
all the objects in your data set and will be named with the label given here.
You can manually change this choice in the properties file by
edting the class_table field. Leave this field blank if you are
not using the classifier or do not need the table written to the database.
Output file location
(Used only when using a CSV or a SQLite database, and/or creating a properties or workspace file)
This setting determines where the CSV files or SQLite database is saved if
you decide to write measurements to files instead of writing them
directly to the database. If you request a CellProfiler Analyst properties file
or workspace file, it will also be saved to this location.
You can choose among the following options which are common to all file input/output
modules:
- Default Input Folder: Use the default input folder.
- Default Output Folder: Use from the default output folder.
- Elsewhere...: Use a particular folder you specify.
- Default input directory sub-folder: Enter the name of a subfolder of
the default input folder or a path that starts from the default input folder.
- Default output directory sub-folder: Enter the name of a subfolder of
the default output folder or a path that starts from the default output folder.
Elsewhere and the two sub-folder options all require you to enter an additional
path name. You can use an absolute path (such as "C:\imagedir\image.tif" on a PC) or a
relative path to specify the file location relative to a directory):
- Use one period to represent the current directory. For example, if you choose
Default Input Folder sub-folder, you can enter "./MyFiles" to look in a
folder called "MyFiles" that is contained within the Default Input Folder.
- Use two periods ".." to move up one folder level. For example, if you choose
Default Input Folder sub-folder, you can enter "../MyFolder" to look in a
folder called "MyFolder" at the same level as the Default Input Folder.
For Elsewhere..., Default Input Folder sub-folder and
Default Output Folder sub-folder, if you have metadata associated with your
images via Metadata module, you can name the folder using metadata
tags.
You can insert a previously defined metadata tag by either using:
- The insert key
- A right mouse button click inside the control
- In Windows, the Context menu key, which is between the Windows key and Ctrl key
The inserted metadata tag will appear in green. To change a previously inserted metadata tag,
navigate the cursor to just before the tag and either:
- Use the up and down arrows to cycle through possible values.
- Right-click on the tag to display and select the available values.
For instance, if you have a metadata tag named
"Plate", you can create a per-plate folder by selecting one of the subfolder options
and then specifying the subfolder name with the "Plate" metadata tag.
The module will substitute the metadata values for the last image set
processed for any metadata tags in the folder name. Please see the
Metadata module for more details on metadata collection and usage.
Create a CellProfiler Analyst workspace file?
Select
Yes to generate a workspace file for use with
CellProfiler Analyst, a data exploration tool which can
also be downloaded from
http://www.cellprofiler.org/. A workspace file allows you
to open a selected set of measurements with the display tools
of your choice. This is useful, for example, if you want examine a standard
set of quality control image measurements for outliers.
Select the measurement display tool
(Used only if creating a workspace file)
Select what display tool in CPA you want to use to open the
measurements.
- ScatterPlot
- Histogram
- DensityPlot
- PlateViewer
- BoxPlot
Type of measurement to plot on the X-axis
(Used only if creating a workspace file)
You can plot two types of measurements:
- Image: For a per-image measurement, one numerical value is
recorded for each image analyzed.
Per-image measurements are produced by
many modules. Many have MeasureImage in the name but others do not
(e.g., the number of objects in each image is a per-image
measurement made by IdentifyObject
modules).
- Object: For a per-object measurement, each identified
object is measured, so there may be none or many
numerical values recorded for each image analyzed. These are usually produced by
modules with MeasureObject in the name.
Enter the object name
(Used only if creating a workspace file)
Select the object that you want to measure from the list.
This should be an object created by a previous module such as
IdentifyPrimaryObjects, IdentifySecondaryObjects, or
IdentifyTertiaryObjects.
Select the X-axis measurement
(Used only if creating a workspace file)
Select the measurement to be plotted on the desired axis.
Type of measurement to plot on the Y-axis
(Used only if creating a workspace file)
You can plot two types of measurements:
- Image: For a per-image measurement, one numerical value is
recorded for each image analyzed.
Per-image measurements are produced by
many modules. Many have MeasureImage in the name but others do not
(e.g., the number of objects in each image is a per-image
measurement made by IdentifyObject
modules).
- Object: For a per-object measurement, each identified
object is measured, so there may be none or many
numerical values recorded for each image analyzed. These are usually produced by
modules with MeasureObject in the name.
Select the Y-axis measurement
(Used only if creating a workspace file)
Select the measurement to be plotted on the desired axis.
Calculate the per-image mean values of object measurements?
Select
Yes for
ExportToDatabase to calculate population statistics over all the objects in each image
and store the results in the database. For instance, if
you are measuring the area of the Nuclei objects and you check the box for this option,
ExportToDatabase will create a column in the Per_Image
table called "Mean_Nuclei_AreaShape_Area".
You may not want to use ExportToDatabase to calculate these population statistics if your pipeline generates
a large number of per-object measurements; doing so might exceed database
column limits. These columns can be created manually for selected measurements directly in MySQL.
For instance, the following SQL command creates the Mean_Nuclei_AreaShape_Area column:
ALTER TABLE Per_Image ADD (Mean_Nuclei_AreaShape_Area);
UPDATE Per_Image SET Mean_Nuclei_AreaShape_Area =
(SELECT AVG(Nuclei_AreaShape_Area)
FROM Per_Object
WHERE Per_Image.ImageNumber = Per_Object.ImageNumber);
Calculate the per-well mean values of object measurements?
Select
Yes for
ExportToDatabase to calculate statistics over all the objects in each well
and store the results as columns in a "per-well" table in the database. For instance,
if you are measuring the area of the Nuclei objects and you check the aggregate
mean box in this module,
ExportToDatabase will create a table in the database called
"Per_Well_avg", with a column called "Mean_Nuclei_AreaShape_Area". Selecting all three
aggregate measurements will create three per-well tables, one for each of the measurements.
The per-well functionality will create the appropriate lines in a .SQL file, which can be
run on your Per-Image and Per-Object tables to create the desired per-well table.
Note: this option is only available if you have extracted plate and well metadata
from the filename using the Metadata or LoadData modules.
It will write out a .sql file with the statements necessary to create the Per_Well
table, regardless of the option chosen above. Please see the Metadata module for more details on metadata collection and usage
Calculate the per-well median values of object measurements?
Select
Yes for
ExportToDatabase to calculate statistics over all the objects in each well
and store the results as columns in a "per-well" table in the database. For instance,
if you are measuring the area of the Nuclei objects and you check the aggregate
median box in this module,
ExportToDatabase will create a table in the database called
"Per_Well_median", with a column called "Median_Nuclei_AreaShape_Area". Selecting all
three aggregate measurements will create three per-well tables, one for each of the measurements.
The per-well functionality will create the appropriate lines in a .SQL file, which can be run on your
Per-Image and Per-Object tables to create the desired per-well table.
Note: this option is only
available if you have extracted plate and well metadata from the filename using
the Metadata or LoadData modules.
It will write out a .sql file with the statements necessary to create the Per_Well
table, regardless of the option chosen above. Please see the Metadata module for more details on metadata collection and usage
Calculate the per-well standard deviation values of object measurements?
Select
Yes for
ExportToDatabase to calculate statistics over all the objects in each well
and store the results as columns in a "per-well" table in the database. For instance,
if you are measuring the area of the Nuclei objects and you check the aggregate
standard deviation box in this module,
ExportToDatabase will create a table in the database called
"Per_Well_std", with a column called "Mean_Nuclei_AreaShape_Area". Selecting all
three aggregate measurements will create three per-well tables, one for each of the measurements.
The per-well functionality will create the appropriate lines in a .SQL file, which can be run on your
Per-Image and Per-Object tables to create the desired per-well table.
Note: this option is only
available if you have extracted plate and well metadata from the filename
using the Metadata or LoadData modules.
It will write out a .sql file with the statements necessary to create the Per_Well
table, regardless of the option chosen above. Please see the Metadata module for more details on metadata collection and usage
Export measurements for all objects to the database?
This option lets you choose the objects whose measurements will be saved in the Per_Object and
Per_Well(s) database tables.
- All: Export measurements from all objects.
- None: Do not export data to a Per_Object table. Save only Per_Image or Per_Well
measurements (which nonetheless include population statistics from objects).
- Select...: Select the objects you want to export from a list.
Select the objects
(Used only if Select is chosen for adding objects)
Choose one or more objects from this list (click using shift or command keys to select
multiple objects). The list includes
the objects that were created by prior modules. If you choose an
object, its measurements will be written out to the Per_Object and/or
Per_Well(s) tables, otherwise, the object's measurements will be skipped.
Create one table per object, a single object table or a single object view?
ExportToDatabase can create either one table
for each type of object exported or a single
object table.
- One table per object type creates one
table for each object type you export. The table name will reflect
the name of your objects. The table will have one row for each
of your objects. You can write SQL queries that join tables using
the "Number_ObjectNumber" columns of parent objects (such as those
created by IdentifyPrimaryObjects) with the corresponding
"Parent_... column" of the child objects. Choose
One table per object type if parent objects can have more than
one child object, if you want a relational representation of
your objects in the database,
or if you need to split columns among different
tables and shorten column names because of database limitations.
- Single object table creates a single
database table that records the object measurements.
ExportToDatabase will prepend each column name with the
name of the object associated with that column's measurement.
Each row of the table will have measurements for all objects
that have the same image and object number. Choose
Single object table if parent objects have a single child,
or if you want a simple table structure in your database. You can
combine the measurements for all or selected objects in this way.
- Single object view creates a single
database view to contain the object measurements. A view is a
virtual database table which can be used to package together multiple
per-object tables into a single structure that is accessed just like a
regular table. Choose Single object view if you want to combine multiple
objects but using Single object table would produce a table that hits
the database size limitations.
An important note is that only objects that are related as primary, secondary
or tertiary objects to each other should be combined in a view. This is
because the view expects a one-to-one relationship between the combined objects. If
you are selecting objects for the view, the module will warn you if they are
not related in this way.
Maximum # of characters in a column name
This setting limits the number of characters that can appear
in the name of a field in the database. MySQL has a limit of 64
characters per field, but also has an overall limit on the number of characters
in all of the columns of a table. ExportToDatabase will
shorten all of the column names by removing characters, at the
same time guaranteeing that no two columns have the same name.
Write image thumbnails directly to the database?
(Used only if MySQL or SQLite are selected as database type)
Select Yes if you'd like to write image thumbnails directly
into the database. This will slow down the writing step, but will
enable new functionality in CellProfiler Analyst such as quickly
viewing images in the Plate Viewer tool by selecting "thumbnail"
from the "Well display" dropdown.
Select the images for which you want to save thumbnails
(Used only if MySQL or SQLite are selected as database type
and writing thumbnails is selected)
Select the images that you wish to save as thumbnails to
the database. Make multiple selections by using Ctrl-Click (Windows)
or Command-Click (Mac);
Auto-scale thumbnail pixel intensities?
(Used only if MySQL or SQLite are selected as database type
and writing thumbnails is selected)
Select Yes if you'd like to automatically rescale
the thumbnail pixel intensities to the range 0-1, where 0 is
black/unsaturated, and 1 is white/saturated.