How Ontology Engineers Are Unlocking Pandemic Data
When COVID-19 swept across the globe, scientists responded with an unprecedented genomic sequencing effort. By 2021, over one million SARS-CoV-2 sequences filled databases worldwideâbut with a critical problem. Laboratories in different countries described viral mutations, patient information, and testing methods in wildly different ways. This data chaos created what researchers called a "semantic interoperability crisis"âwhere systems technically share data but fail to convey consistent meaning 1 2 .
The rapid sequencing created a flood of data with inconsistent terminology across institutions and countries.
Researchers developed a "Rosetta Stone" for pandemic data through ontological unpacking of viral genomics.
Semantic interoperability represents the highest level of data understanding:
"Ontologies transform terminology lists into knowledge maps. They don't just name thingsâthey define what things are and how they relate." 7
In 2022, researchers analyzed the Viral Conceptual Model (VCM), a framework designed to standardize COVID-19 genomic data. Despite its widespread adoption, inconsistencies persisted in how institutions implemented it 1 4 .
Through ontological unpacking, the team exposed hidden ambiguities:
The resulting Ontological Viral Conceptual Model (OntoVCM) fixed 491 semantic inconsistencies. Databases using it showed:
reduction in cross-institution mapping errors
faster federated query processing
global databases enabled for AI-driven variant tracking
Original Concept | Ambiguity Type | Resolution Approach |
---|---|---|
Virus Sample | Physical vs. abstract entity | Classified as «MaterialEntity» |
Mutation Impact | Causal vs. correlational link | Defined via «mediation» relation |
Host Organism | Container vs. participant role | Split into «Container» and «Participant» roles |
Resource | Type | Function | Example Use Case |
---|---|---|---|
OntoUML | Modeling Language | Applies UFO distinctions via UML diagrams | Defining viral mutation inheritance hierarchies |
SNOMED CT | Clinical Terminology | Standardized medical concepts | Mapping "fever" across EHR systems |
LOINC | Laboratory Codes | Universal test identifiers | Unifying RT-PCR assay descriptions |
GenBank | Genomic Database | Reference sequences | Annotating SARS-CoV-2 mutations |
Protégé | Ontology Editor | Builds/maintains OWL ontologies | Creating COVID-19 variant knowledge graphs |
The OntoVCM framework proves transformative because:
Defines "variant" with genomic and epidemiological criteria, improving outbreak predictions 1
Links viral sequences to patient EHRs, drug databases, and research literature 6
FAIR Principle | Pre-Unpacking Challenge | Post-Unpacking Solution |
---|---|---|
Findable | Inconsistent metadata thwarted searches | Standardized annotation with UFO-aligned terms |
Interoperable | Mappings required manual reconciliation | Automated integration via formal relations |
Reusable | Context-dependent definitions | Meaning travels with data via embedded semantics |
The ontological revolution is spreading:
"We're moving from sending data to understanding data. In an age of pandemics and climate crises, unambiguous knowledge sharing isn't just convenientâit's survival."