In the metadata of digital environmental datasets, automated processing is hindered by
the wide variety of representations for unit that may be human-readable, but may not be
unambiguous or machine-interpretable, (e.g., grams per square meter, gm/m2, g/m2, gm-2,
g/m^2, g.m-2, g m-2 and gramPerMeterSquared). Matching disparate representations of the same
unit into a single unit concept from an ontology assists with interpretation and reuse by
providing a linkage to a complete unit definitions with label, description, dimensions.
Datasets with shared units can be identified during searches, and are more suitable for
automating analyses and potential transformation.
This dataset contains data and code associated with a project to map units in ecological
metadata collected between 2013 and 2022 by DataONE, the Environmental Data Initiative and
the U.S. National Ecological Observatory Network to the QUDT ontology using successive
string transformations. Data entities include
a) raw metadata as received (355,057 unit instances)
b) integrated raw data
c) substitution tables for string transformations
d) resulting lookup table for 896 distinct units matched to QUDT units
e) associated R code used for QUDT matching plus a web service and R functions for
adding annotation elements to Ecological Metadata Language metadata documents.
Using these substitutions and code, 91% of unit instances in the raw metadata could be
matched to QUDT. Data and results are discussed in “Porter JH, M O’Brien, M Frants, S Earl,
M Martin, C Laney. (in review) Using a Units Ontology to Annotate Pre-Existing Metadata.
Submitted to Scientific Data.