How to use Metadata#

Metadata (the qiime2.metadata.Metadata class, internally) allows users to annotate a QIIME 2 Result with study-specific values: age, elevation, body site, pH, etc. QIIME 2 offers a consistent API for developers to expose their Methods and Visualizers to user-defined metadata. For more details about how users might create and utilize metadata in their studies, check out the Metadata In QIIME 2 tutorial.

Metadata#

Actions may request an entire Metadata object to work on. At its core, Metadata is just a pandas pd.Dataframe, but the Metadata object provides many convenience methods and properties, and unifies the code necessary for handling these data (or metadata). Examples of Actions that consume and operate on Metadata include:

Plugins may work with metadata directly, or they may choose to filter, regroup, partition, pivot, etc. - it all depends on the intended outcome relevant to the method or visualizer in question.

Metadata is subject to framework-level validations, normalization, and verification. We recommend familiarizing yourself with this behavior before utilizing Metadata in your Action. We think having this kind of behavior available via a centralized API helps ensure consistency for all users of Metadata.

def my_viz(output_dir: str, md: qiime2.Metadata) -> None:
    df = md.to_dataframe()
    ...

Metadata Columns#

Plugin Actions may also request one or more MetadataColumns (the qiime2.metadata.MetadataColumn, internally) to operate on, a good example of this is identifying which column of metadata contains barcodes, when using q2-demux’s emp-single or q2-cutadapt’s demux-paired, for example.

Instances of MetadataColumn exist as one of two concrete classes: NumericMetadataColumn (qiime2.metadata.NumericMetadataColumn) and CategoricalMetadataColumn (qiime2.metadata.CategoricalMetadataColumn).

By default, QIIME 2 will attempt to infer the type of each metadata column: if the column consists only of numbers or missing data, the column is inferred to be numeric. Otherwise, if the column contains any non-numeric values, the column is inferred to be categorical. Missing data (i.e. empty cells) are supported in categorical columns as well as numeric columns.

...
numeric_md_cols = metadata.filter(column_type='numeric')
categorical_md_cols = metadata.filter(column_type='categorical')
...

If your Action always needs one type of column or another, you can simply register that type in your plugin registration:

plugin.methods.register_function(
    ...
    parameters={'metadata': MetadataColumn[Numeric]},
    parameter_descriptions={'metadata': 'Numeric metadata column to '
                            'compute pairwise Euclidean distances from'},
    ...

This will ensure that all the necessary type-checking is performed by the framework before these data are passed into the Action utilizing it.

Numeric Metadata Columns#

Columns that consist only of numeric (or missing) values are eligible for being instantiated as NumericMetadataColumn (although these values can be loaded as CategoricalMetadataColumn, too).

Categorical Metadata Columns#

All types of data columns can be instantiated as CategoricalMetadataColumn - values will be cast to strings.

How can the Metadata API Help Me?#

The qiime2.metadata.Metadata API has many interesting features - here are some of the more commonly utlitized elements amongst the plugins within the Amplicon Distribution.

Merging Metadata#

Interfaces can allow users to specify more than one metadata file at a time, the framework will handle merging the files or objects qiime2.metadata.Metadata.merge prior to handing the final merged set to your Action.

Dropping Empty Columns#

When working with a single metadata metadata column, plugin code can determine if there are missing values (qiime2.metadata.MetadataColumn.has_missing_values), and then subsequently drop those IDs (qiime2.metadata.MetadataColumn.drop_missing_values) from the column.

Normalizing TSV Files#

By saving (qiime2.metadata.Metadata.save) a materialized Metadata instance, visualizations that want to provide data exports can do so in a consistent manner (e.g. q2-longitudinal’s volatility, and the relevant code.

Advanced Filtering#

The filter (qiime2.metadata.Metadata.filter_columns) method can be used to restrict column types, drop empty columns, or remove columns made entirely of unique values.

SQL Filtering#

Advanced metadata querying is enabled by SQL-based filtering (qiime2.metadata.Metadata.get_ids).

Making Artifacts Viewable as Metadata#

By registering a transformer from a particular format to qiime2.Metadata, the framework will allow the type represented by that format to be viewed as Metadata — this can open up all kinds of exciting opportunities for plugins!

@plugin.register_transformer
def _1(data: cool_project.InterestingDataFormat) -> qiime2.Metadata:
    df = pd.Dataframe(data)
    return qiime2.Metadata(df)

A visualizer for free!#

If your type is viewable as Metadata (as in, the necessary transformers are registered), there is a general-purpose metadata visualization in the q2-metadata plugin called tabulate, which renders an interactive (searchable, sortable) table of the metadata in question. Cool!

Generating metadata as output from visualizations#

In most cases, if you want to output something that looks like metadata from a QIIME 2 action, you should assign it an artifact class that has a transformer to Metadata. However in some cases you may want to output actual metadata. In this case, you can create an output for your action of artifact class ImmutableMetadata. This will generate an artifact containing the metadata that your function provides as output.

ImmutableMetadata artifacts can be viewed as Metadata, so they can be used anywhere that a typical metadata .tsv file can be provided as input in QIIME 2. This includes q2-metadata’s tabulate visualizer. Additionally, if you want to obtain a .tsv file representation of an ImmutableMetadata artifact, you can export it.