Actions

Talk

Talk:ConferenceCall 2016 03 31

Ontolog Forum

Revision as of 14:15, 26 April 2016 by imported>Garybc (→‎Technical Challenges)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Both syntactic and data structure-based interoperability among systems and applications has been worked on in recent decades. A general, growing belief is that there is a convergence on "standards" for interoperability components, including catalogs, vocabularies, services and information models. But along with this general movement is the recognition that all such efforts require some degree of semantic agreements and techniques. Increasingly these, in turn, have some degree of support from formal ontologies and related actitivities.

Some Items and Synthesis from our two session on Semantic Interoperability (SI) in the Earth Sciences.

Big Science, Big Data and Big Industry provide many motivating challenges to achieve better system and data interoperability. The range of systems, data & semantic content is now broad but increasingly has to be integrated to be of use to Science & Society. Improved & integrated semantics is part of the resulting effort in the Geo and Earth Science supported by programs such as NSF's EarthCube.

The Status of Ontologies in the Geo-Earth Sciences

  • There are quite a few "ontologies" developed along the spectrum of semantic formality, but also comprehensiveness and completeness.

A classic and somewhat of a legacy effort is NASA's [https://sweet.jpl.nasa.gov/ SWEET (Semantic Web for Earth and Environmental Terminology)] ontology with about 6000 concepts in over 200 separate, modular ontologies. SWEET can provide some basis for semantic tagging, however, it has few axioms to support reasoning and needs to be supplemented whenever used for advanced purposes. Such ontologies build on community efforts to develop standard vocabularies within domains to support data and system interoperability. It is generally recognized that these efforts lack formal semantics and that ontologies capturing domain understanding can help to address this limitation.

Some General Problems

Despite the increasing number and quality of ontologies there are still what has graphically described as a "semantic mess." Domain information is heterogeous and described in:

  1. multiple schemas,
  2. different vocabularies & markup languages and
  3. ontologies with different level of granularity in the data & different conceptualization.

In the Earth Science we probably don't have the "right" mix of ontologies needed to routinely solve this challenges and systematically achieve interoperability across domains although we have made some progress with some modules & a reference ontology in a small fraction of the domain space. Within this imperfect but growing collection of ontologies and efforts at controlled vocabularies, we are often using them sloppily, often informally, and we don't have adequate mappings between concepts . Part of this is because there is not entire agreement on well founded integration techniques. One cannot just glue and stitch together a very large, all encompassing, master ontology. In practice interoperability is difficult to achieve,even with the help of ontolgoies since applications across domains utilizes information in a different way, and the knowledge/ontology conceptualizations and representation formalisms inherent in or used explicitly by these applications can also different. There is general agreement that we lack of small, ontological building blocks and there does seem to be some growing interest exploring the use of modular, incremental approaches, early agreement and conceptualizations and well crafted reference ontologies.

Technical Challenges

  • Existing GeoScience standards, ontologies, models and associated terminologies such as shown in the earlier linked Figure (above) were typically developed in isolation.
  • Vocabulary harmonization is a bottleneck and is impeded by lack of reference ontologies which might resolve heterogeneity.

Standards

To a large degree we have been converging on the lower and middle level of what is needed for interoperabilty - common protocols and data formats to ensure the proper exchange of data. There is some belief that we can ensure a perfect syntactic interoperability, e.g., via rigid standardization. But when we consider the higher, semantic level & semantic interoperability we rely on a common understanding of the messaging and exchanged data, i.,e., meaning remains invariant during the exchange between multiple systems. This requires common reference systems and the problem is that:

    • Various types of standards that do exist are, for the most part, heterogeneous, meaning they:
      • are mostly fragmented and disconnected, describing potentially relatable concepts
      • lack a grounding in foundational semantics.
      • may use the same or similar terms but with differences in semantics.
      • are described using different formal (or non-formal) languages.

As a result major problems exist when standard driven efforts and products are combined reflecting differences in conceptualization, semantic drift upon new problem formulation.

  • * Some glue such as via a reference ontology is needed to integrate and harmonize these.
  • Upper-level and many domain ontologies are important to such SI but there are challenges:
    • There seems to be no one taxonomic hierarchy we can agree on.
    • Many of the upper and domain ontologies are hard to understand or have too many terms, are too abstract, with too complicated axioms to understand and yet remain too far from real data.
      • Or they impose ontological commitments that may not be acceptable by all interested parties who have 'local' vocabularies and meanings.
    • Like AI systems in the past they may also be too brittle - small changes are not easily incorporated or compromise the semantics.

There are many Social/non-technical issues including:

  • no agreement or controlling body or process to coordinate efforts and/or to validate ontologies and their axioms
  • How do we verify and validate these structures (ontology efficacy)? (i.e. if an ontology is created to do some thing, x; who verifies it actually does, x?)
  • Who owns the ontology once it is published?

Who maintains the ontology once it's released into the wild; i.e. published or... portaled?

  • Where do we put Earth science ontologies (or semantic models; the word ontology has kind of lost its meaning) once they have been created?
    • e.g. LOV, ontology repositories
    • ESIP, bio portal and OntoHubOOR
  • How do we handle"
    • inability/unwillingness to participate,
    • fear of unanticipated cost, worry with major changes in their local system,
    • skeptic about scalability

Tool Issues

  • There is a need for concept search supported by conceptual similarity like SEM+
  • Agent Brokering employs central mechanisms to help resolve such things as disparate vocabularies, support data distribution requests, enforce translatable standards and to enable uniformity of search and access in heterogeneous operating environments.
    • When searching for data current semantic brokers still yield:
  1. Invalid content or responses
  2. Unidentifiable document types
  3. Empty metadata elements -especially required elements
  • We need integrated conceptual modeling and KE tools to build and bridge ontologies.

Solutions

Ontology Design Patterns

One solution approach is the Ontology design pattern (ODP) which are potentially reusable solution of a frequently occurring modeling problem in the domain. The idea is to use the simple pattern leveraging concrete domain notions for which there are data as a building block of a more complex ontology which adds commitments as needed.

To handle different interpretations we make appropriate mapping/alignment between "local" vocabulary and the pattern. (Local ontology development is the glue) A (local) view of the pattern allows us to connect a data source and the patterns via a specific and explicit mapping.

This "local view" employs a very minimalistic schema (class names, property names,simple domain and range axioms). To facilitate this we separate core conceptualization" from nomenclature" issues, That is: vocabulary terms in a local view may be data repository-speciffic and need not be the same as the the terms used in a pattern. Mapping from data to the pattern can be expressed in rules that help populating the patterns. Data providers can populate the global schema (pattern collection) by simply populating a local view.

Reference Ontologies

Supplementing this is the use of a Reference Ontology.

This supports semantic integration between existing domains ontologies and schemas but requires:

  • Translation between ontology languages.
  • More rigorous specification of the semantics in each ontology.
  • And perhaps deeper semantics

Such integration can currently be done only by manual integration of the ontologies.... But use of a suitable reference ontology may automate this.

Common conceptual models are needed as organizing ontologies like ViVO can also be useful. This allows

  • some chance of leveraging of existing ontologies to reduce modeling effort
  • constraints on ontology needs & possibilities by using information about
    • Particular entity types & relationships
    • Significant legacy dependencies