Ontology classifier: Difference between revisions

From Endeavour Knowledge Base
Line 3: Line 3:
Two styles of axioms are used:
Two styles of axioms are used:


# Simple subtype axioms, such as a subclass or sub property statement, which explicitly state that concept A is a subtype of concept B. This means that the set of objects described by concept B includes the set of objects described by A. Sometimes a concept A is said to be a subclass of an intersection of concepts B and C. This means that concept A consists of a certain subset of objects with all of the properties of concept B and all of the properties of concept C.  These types of axioms imply less than complete information in that subclass A may have other properties not explicitly articulated.
# Simple subtype axioms, such as a subclass or sub property statement, which explicitly state that concept C is a subtype of concept S. This means that the set of objects described by concept S includes the set of objects described by C. Sometimes a concept C is said to be a subclass of an intersection of concepts S1 and S2. This means that concept C describes a set of objects with all of the properties of concept S1 AND all of the properties of concept S2.  These types of axioms imply less than complete information in that subclass C may have other properties not explicitly articulated, so to be safe are described as a subclass without explicitly and fully defining the properties.
# Equivalent class axioms. These state that concept A is equivalent to concept B. These types of axioms imply more complete information in that A, being equivalent, will not have more properties than B. From A, further subclasses can be inferred from the equivalence axioms.
# Equivalent class axioms. These state that concept A is equivalent to concept B. These types of axioms imply more complete information in that A, being equivalent, will not have more properties than B. From A, further subclasses can be inferred from the equivalence axioms.


Classes are defined by their properties and the values of those properties. Not all properties are not explicitly listed (all objects have an infinite number of properties) and only those of relevance to the current use of the the knowledge base are articulated. The use of properties enables a reasoner to infer that another concept may be a subtype of a concept by using a combination of Equivalence axioms and their properties.
Classes are defined by their properties and the values of those properties. Not all properties are not explicitly listed (all objects have an infinite number of properties) and only those of relevance to the current use of the the knowledge base are articulated. The use of properties enables a reasoner to infer that another concept may be a subtype of a concept by using a combination of stated axioms and their properties.


In order to enable an ontology to be used in practice, the general approach is to generate inferred subtype relationships from the ontological source. An inferred view is a view of these relationships is a hierarchy of concepts, from the top level down of the kind used by Snomed-CT browsers. The inferred output contains all of the necessary constructs to be able to perform standard queries such as subsumption testing or queries across post coordinated expressions.
A source of confusion is the fact that many classes have names. Whilst this helps humans understand the meaning of the class, (e.g. Diabetes) from a computable perspective, these names are meaningless. It is the properties of these classes that are used in subsumption testing, aided somewhat when a human has specifically stated, via an axiom, that a class is a subclass of, or equivalent to, another class.


The approach in this article can be considered as a simplified subset of an OWL2 reasoner, focusing on classification. A classifier essentially creates a set of statements that link concepts together by an "is a " relationship i.e. concept A is either the same as, or a subtype of, concept B . The is a relationships are further pruned to make the view simpler by making sure that if concept A -> is a subtype of -> concept B, and if concept B -> is a subtype of -> concept B, then the statement that concept A->  is a subtype of -> concept C is removed, as it can be inferred from the first relationship. Thus the view is less messy.
In order to enable an ontology to be used in practice, the general approach is to generate inferred subtype relationships from the ontological source. An inferred view is a view of these relationships as a hierarchy of concepts, from the top level down of the kind used by Snomed-CT browsers.  The inferred output contains all of the necessary constructs to be able to perform standard queries such as subsumption testing or queries across post coordinated expressions. Queries do not require inferred views to work, but they are substantially simpler to implement when an inferred view is used.
 
The approach in this article can be considered as a simplified subset of an OWL2 reasoner, focusing on classification. A classifier essentially creates a set of statements that link concepts together by an "is a" relationship i.e. concept C is either the same as, or a subtype of, concept S . The 'is a' relationships are further pruned to make the view simpler by making sure that if concept A -> is a subtype of -> concept B, and if concept A-> is a subtype of -> Concept C, and if concept B -> is a subtype of -> concept C, then the statement that concept A->  is a subtype of -> concept C is removed, as it can be inferred from the A-> B relationship. Thus the view is less messy to navigate as a result.


The algorithm described in this article assumes certain constrained editorial policies, namely
The algorithm described in this article assumes certain constrained editorial policies, namely
Line 21: Line 23:
== Working example ==
== Working example ==


The working example used in this article is a simple ontology of amoxicillin 500mg presented in two syntaxes
The working example used in this article is a simple ontology of amoxicillin 500 mg presented in two syntaxes
The example contains only the main class axioms for brevity
The example contains only the main class axioms for brevity


Line 87: Line 89:
</syntaxhighlight></div></div>
</syntaxhighlight></div></div>


The two main classes are defined by their properties, the names being assigned for readability purposes.  In this case the drugs are said to have a property referred to as a "role group".


A 'role group' is a naming convention for an anonymous property whose value is an object from an anonymous class. The role group's range class has properties of ingredient and strength.


In the inferred view, it is intuitively clear from the above that bicycles and motorbikes are subtypes of wheeled vehicles and wheeled vehicles are subtypes of Vehicles.
As defined the two classes appear independent i.e.
 
Thing
 
ManMadeThing
 
Vehicle
 
WheeledVehicle
 
Bicycle
 
 
The algorithm described below is designed to generate the untitively inferred view above, designed to operate at the level of a large ontology whereby the above intutive relationships may not be apparent to a human.
 
It takes  acccount of complex class defintions that includes object intersections object unions.
 
= 3Inferred view algorithm =
To put it simply the algorithm is to  '''Test each class to see which classes it is a subtype of'''
 
“'''for each named class C in the ontology,'''
 
'''If it is only a primitive subtype, then add the supertypes S as C S'''
 
'''Else for each named class C in the ontology,'''
 
'''check each other named class S in the ontology'''
 
'''IF C is a subtype of S then'''
 
'''add it to the inferred view as C S “'''
 
'''then in the inferred hierarchy remove anonymous classes'''
 
“'''for each subtype C'''
 
'''For each supertype S'''
 
'''If S is anonymous        ---- Removal of anonymous parent classes'''
 
    '''Add parents of S to C”'''
 
'''Then remove redundant parents'''
 
“'''for each child class C in the inferred view'''
 
'''For each parent class S'''
 
'''For each other parent class O'''
 
'''If O is subtype of S'''
 
'''Remove S'''
 
== 3.1The ISA relationship and algorithm ==
 
=== 3.1.1ISA relationship ===
If C ISA S then this means that C is a subtype of S, or C is a subset of the set of  S.
 
Isa may originate as any of the following:  “Subclass”, “SubObjectProperty”,”SubDataProperty”,”EquivalentTo” axioms
 
ISA relationship is transitive. Tjis means that if A ISA B, and B ISA C, then A ISA C
 
=== 3.1.2ISA function ===
ISA (descendent, ancestor) returns true or false
 
The ISA algorithm uses the ISA tree recursively to check a descendant class against an ancestor class i.e. it steps up the tree until either it reaches the top concept or finds a match.
 
'''Algorithm'''


If the descendant = ascendant then ISA is true
* Amoxcillin product is equivalent to a product that has a role group that has an ingredient of the substance amoxicillin


Set child to the descendent
* Amoxicillin 500mg is eqivalent to a product that has a role group that has an ingredient of the substance amoxicillin AND a strength of 500mg.


For each parent of the child
Intuitively, it is easy to see a relationship between them. It i reasonable to infer that the second product appears to be a sub type of the first. However, that is a human inference. A computer has to rely on logical inference.


If the parent = ancestor then result is true
A classifier generates the "is a" relationship by testing whether the second product is subsumed by the first.


Else repeat with parent as descendent and same ancestor
== Classifier algorithm ==
To put it simply the algorithm is in two steps:


For example
# For each class 'S' in the ontology, find all the classes C1,C2.....Cn that are subsumed by S
# Organise the subtypes into the most efficient tree structure so that if class C1 is subsumed by S1 and S2, and if S2 is subsumed by S1, then the downward tree view is S1-> S2- > C1


Is a Bicycle wheel a descendent of a man made thing ?
=== The ISA relationship ===
If C-> ISA> S then this means that C is a subtype of S, or C is a subset of the set of  S.


True  because it is a child of wheel which is a child of man made thing.
The algorithms use logical inference to infer that C -> is a - > S via a number of mechanisms.


== 3.2Initial inference tree ==
In addition in Discovery "is a like" relationships are also generated as a result of semantically inexact mappings "mapped to" as well as the properties "replaced by", and its reciprocal "replaced". These are not strictly 'is a' relationships as they may or may not be subtypes, but they are optionally used in subsumption query to find related concepts in records
By stepping through classes, object properties, and data properties it is possible to generate an initial inference view (parent/child and child/ parent).


This algorithm simply restates that the subclasses or equivalent classes asserted are included in the inferred view. Any superclass or equivalent class that is an Object intersection or Object union containing a named class must mean that the class is a subtype
The  is a relationship is transitive. This means that if A ISA B, and B ISA C, then A ISA C


For example
=== Algorithm Steps ===


If  a Vehicle is a Subclass Of (Direction  &/OR ManMade thing ……), then a vehicle must be a subtype of both a Direction and a Man Made thing, whatever other property expressions are.
==== Step 1 indexing of ontology ====
In order to improve efficiency, first index all properties and ranges to the class definition so that for any property, or property range pair, the set of classes that have that property explicitly modelled can be found


'''Algorithm'''
==== Step 2 - Initial inference tree ====
A very large proportion of an inference tree already exists via the axioms stated by humans, which are:


N.B SubX means either Subclass of, Sub object property of, or Sub data property of or Equivalent to
* Where C is a subclass of S  (whether or not subclass of other classes or having properties) and they are both named classes then C -> is A > S  and S -> infers -> C


For each concept
* Where C is equivalent to S (whether or not equivalent to other classes or having additional properties) then C-> is a -> S, and S-> subsumes- >C


If C has no SubX axioms then
* Where P is a sub property of PS then P -> is a >PS and PS-> subsumes -> P


Tree (Top concept , C)  ISA (C , TopConcept)      C ISA Top concept
Creation of an inference tree results in efficiency savings as an already inferred relationship need not be repeated. In addition, the resulting inference tree is also used when testing for subsumption of properties and property ranges.


If C is subX S then
For example:


Tree (S, C)  ISA(C , S) Simple restatement  as ISA
Amoxicillin product  is Equivalent to a Medicinal product AND property X and Y , therefore  Amoxicillin -> is a> Medicinal product , Medicinal product -> subsumes> Amoxicillin


If C is subX of Intersection or union of (S….  then simple class ISAs
Amoxicillin 500 mg product is equivalent to a medicinal product AND property Y and X, therefore Amoxicillin 500mg -> is a> medicinal product and Medicinal Product -> subsumes -> A


Tree(S, C),  ISA (C , S)
==== Step 3 - Property subsumption testing ====


=== 3.2.1Creation of temporary anonymous classes ===
=== 3.2.1Creation of temporary anonymous classes ===

Revision as of 13:24, 29 June 2020

Ontologies are based on the idea of axioms which are statements made by humans that represent the current knowledge of the authors that make them.

Two styles of axioms are used:

  1. Simple subtype axioms, such as a subclass or sub property statement, which explicitly state that concept C is a subtype of concept S. This means that the set of objects described by concept S includes the set of objects described by C. Sometimes a concept C is said to be a subclass of an intersection of concepts S1 and S2. This means that concept C describes a set of objects with all of the properties of concept S1 AND all of the properties of concept S2. These types of axioms imply less than complete information in that subclass C may have other properties not explicitly articulated, so to be safe are described as a subclass without explicitly and fully defining the properties.
  2. Equivalent class axioms. These state that concept A is equivalent to concept B. These types of axioms imply more complete information in that A, being equivalent, will not have more properties than B. From A, further subclasses can be inferred from the equivalence axioms.

Classes are defined by their properties and the values of those properties. Not all properties are not explicitly listed (all objects have an infinite number of properties) and only those of relevance to the current use of the the knowledge base are articulated. The use of properties enables a reasoner to infer that another concept may be a subtype of a concept by using a combination of stated axioms and their properties.

A source of confusion is the fact that many classes have names. Whilst this helps humans understand the meaning of the class, (e.g. Diabetes) from a computable perspective, these names are meaningless. It is the properties of these classes that are used in subsumption testing, aided somewhat when a human has specifically stated, via an axiom, that a class is a subclass of, or equivalent to, another class.

In order to enable an ontology to be used in practice, the general approach is to generate inferred subtype relationships from the ontological source. An inferred view is a view of these relationships as a hierarchy of concepts, from the top level down of the kind used by Snomed-CT browsers. The inferred output contains all of the necessary constructs to be able to perform standard queries such as subsumption testing or queries across post coordinated expressions. Queries do not require inferred views to work, but they are substantially simpler to implement when an inferred view is used.

The approach in this article can be considered as a simplified subset of an OWL2 reasoner, focusing on classification. A classifier essentially creates a set of statements that link concepts together by an "is a" relationship i.e. concept C is either the same as, or a subtype of, concept S . The 'is a' relationships are further pruned to make the view simpler by making sure that if concept A -> is a subtype of -> concept B, and if concept A-> is a subtype of -> Concept C, and if concept B -> is a subtype of -> concept C, then the statement that concept A-> is a subtype of -> concept C is removed, as it can be inferred from the A-> B relationship. Thus the view is less messy to navigate as a result.

The algorithm described in this article assumes certain constrained editorial policies, namely

  1. Object Unions are rarely used. Use of object unions can result in exponential times for subsumption testing.
  2. Object complement of is not supported. It is assumed that negation is considered only in the concept of closed world query.

The classifier is NOT a reasoner in that it does not check for inconsistencies

Working example

The working example used in this article is a simple ontology of amoxicillin 500 mg presented in two syntaxes The example contains only the main class axioms for brevity

Manchester syntax:

Class: im:AmoxicillinProduct
    EquivalentTo: 
        im:MedicinalProduct
         and (im:hasRoleGroup some (im:hasIngredient some im:Amoxicillin))
 
Class: im:Amoxicillin500mg
    EquivalentTo: 
        im:MedicinalProduct
         and (im:hasRoleGroup some 
            ((im:hasIngredient some im:Amoxicillin)
             and (im:hasStrength some im:Fivehundredmg)))
 

In Discovery json syntax this can be represented as

Discovery syntax:

{"Class" : [ {
      "iri" : "im:AmoxicillinProduct",
      "EquivalentTo" : [ {
        "Intersection" : [ {
          "Class" : "im:MedicinalProduct"
        }, {
          "ObjectSome" : {
            "ObjectSome" : {
              "Class" : "im:Amoxicillin",
              "Property" : "im:hasIngredient"
            },
            "Property" : "im:hasRoleGroup"
          }
        } ]
      } ]
    }, {
      "iri" : "im:Amoxicillin500mg",
      "EquivalentTo" : [ {
        "Intersection" : [ {
          "Class" : "im:MedicinalProduct"
        }, {
          "ObjectSome" : {
            "Intersection" : [ {
              "ObjectSome" : {
                "Class" : "im:Amoxicillin",
                "Property" : "im:hasIngredient"
              }
            }, {
              "ObjectSome" : {
                "Class" : "im:Fivehundredmg",
                "Property" : "im:hasStrength"
              }
            } ],
            "Property" : "im:hasRoleGroup"
          }
        } ]
      } ]
    }
}

The two main classes are defined by their properties, the names being assigned for readability purposes. In this case the drugs are said to have a property referred to as a "role group".

A 'role group' is a naming convention for an anonymous property whose value is an object from an anonymous class. The role group's range class has properties of ingredient and strength.

As defined the two classes appear independent i.e.

  • Amoxcillin product is equivalent to a product that has a role group that has an ingredient of the substance amoxicillin
  • Amoxicillin 500mg is eqivalent to a product that has a role group that has an ingredient of the substance amoxicillin AND a strength of 500mg.

Intuitively, it is easy to see a relationship between them. It i reasonable to infer that the second product appears to be a sub type of the first. However, that is a human inference. A computer has to rely on logical inference.

A classifier generates the "is a" relationship by testing whether the second product is subsumed by the first.

Classifier algorithm

To put it simply the algorithm is in two steps:

  1. For each class 'S' in the ontology, find all the classes C1,C2.....Cn that are subsumed by S
  2. Organise the subtypes into the most efficient tree structure so that if class C1 is subsumed by S1 and S2, and if S2 is subsumed by S1, then the downward tree view is S1-> S2- > C1

The ISA relationship

If C-> ISA> S then this means that C is a subtype of S, or C is a subset of the set of S.

The algorithms use logical inference to infer that C -> is a - > S via a number of mechanisms.

In addition in Discovery "is a like" relationships are also generated as a result of semantically inexact mappings "mapped to" as well as the properties "replaced by", and its reciprocal "replaced". These are not strictly 'is a' relationships as they may or may not be subtypes, but they are optionally used in subsumption query to find related concepts in records

The is a relationship is transitive. This means that if A ISA B, and B ISA C, then A ISA C

Algorithm Steps

Step 1 indexing of ontology

In order to improve efficiency, first index all properties and ranges to the class definition so that for any property, or property range pair, the set of classes that have that property explicitly modelled can be found

Step 2 - Initial inference tree

A very large proportion of an inference tree already exists via the axioms stated by humans, which are:

  • Where C is a subclass of S (whether or not subclass of other classes or having properties) and they are both named classes then C -> is A > S and S -> infers -> C
  • Where C is equivalent to S (whether or not equivalent to other classes or having additional properties) then C-> is a -> S, and S-> subsumes- >C
  • Where P is a sub property of PS then P -> is a >PS and PS-> subsumes -> P

Creation of an inference tree results in efficiency savings as an already inferred relationship need not be repeated. In addition, the resulting inference tree is also used when testing for subsumption of properties and property ranges.

For example:

Amoxicillin product is Equivalent to a Medicinal product AND property X and Y , therefore Amoxicillin -> is a> Medicinal product , Medicinal product -> subsumes> Amoxicillin

Amoxicillin 500 mg product is equivalent to a medicinal product AND property Y and X, therefore Amoxicillin 500mg -> is a> medicinal product and Medicinal Product -> subsumes -> A

Step 3 - Property subsumption testing

3.2.1Creation of temporary anonymous classes

If the pattern is C is equivalent to Intersection of A union of Property P.R Property P.R1 with R being a union of (S1 or S2) then generate an anonymous class as follows

For example, if a Wheel is Equivalent to ( a man made thing & ( (is made of some steel) or (is made of some Aluminium)) then an anonymous class of

_a1 Equivalent to “Made of some steel or made of some Aluminium” is created

Tree( _a1, C) ISA (C , _a1)

Tree(S1, _a1) ISA (_a1 ISA S1)

This enables the class _a1 to operate as if it was a named class, (although removed at the end) and thus copes with any level of nesting of property ranges.

3.3Simple class inheritance

As all classes and properties are now in the tree.

Step DOWN the tree in order to infer each class. Stepping down assures that as the lower levels are computed the superclasses will also have been computed already. as then the inferences accumulate as the lower depths are examined.

For any class that only has subclass axioms

And whose superclasses are either simple class expressions, or intersections of simple class expressions

Then they are marked as ISA relationships and no further inference is needed.

Else – complex class inference with entailment tests are required.

3.4Complex class Inference with entailment tests

For each class in the tree, check every other class in the ontology to see if it a subclass.

for each named class C in the ontology,

check each other named class S in the ontology

IF C is a subtype of S then

add it to the inferred view as C S “

3.4.1Is C a subclass of S?

Each class C has a set of axioms (or if not then it is a child of Top concept) and move on

Each test class S has a set of axioms (or not) and therefore the algorithm checks the axioms associated with S and the axioms associated with C.

Algorithm

IF C is a subtype of S then

Tree (S,C) ISA (C,S)

Algorithm :Is subtype Of (class C, testclass S) Return true or false

First a quick check to C if C is a subclass of S, equivalent as intersection or union, as in above then it is already set.

For each Subclass and each Equivalent Class axiom

Compare S ObjectIntersections

Compare S ObjectUnions

3.4.1.1Compare C against S Object Intersections

In summary, all of the S properties must be compatible with the C properties. C may have additional properties.

It should be noted that ‘S’ must be fully defined i.e. via an ‘Equivalent to’ axiom. This is because if S is simply a subclass of Y’ and C is a subclass of Y it does not mean that C is a subclass of S even if all the properties of S are properties of C i.e. it may be a different subclass of Y than S is. Therefore this algorithm only works on sufficiently defined classes.

The below assumes that if a statement is made about the test class it must be true about C. Therefore the logic sets a provisional result variable to true or false depending whether a test class statement is found

Algorithm : IsAndSubtypeOf (class C, test class S)

Set provisional result as true // At this point there may be no simple superclasses for the test class

With S, For each Equivalent intersected equivalent class ‘T’

If C ISA T then result is true

Else

Set provisional result to false // Found a superclass therefore MUST be in C

For each “equivalent to” and “subclass of” axiom of C

For ANY intersected or union of C superclass or equivalent class ‘D’

If ‘D’ ISA ‘T’ then provisional result = true //Found it

Else continue to next intersection class

If provisional result is false then the return false //There were superclasses of S but non were found to be compatible.

The next step is to check each property expression for test class S

The syntax used here is

P = Property

R= Property range class, either found in the property expression OR deduced from the property’s range axiom.

Q = Qualified cardinality (e.g. some, only, max, min, exact)

Algorithm IsandSubPropertyExpression (class C, test, class S)

With S, for Each Equivalent intersected equivalent property expression (SP.SR.SQ)

For each “equivalent to” and/or “subclass of” axiom of C

Provisional result = false // Found a property expression

FOR ANY intersected or union property expression (CP.CR.CQ)

IF CP ISA SP

IF CR IS A SR

IF CQ is compatible with SQ

Provisional result is true

Else continue o next property expression

Else continue to next property expression

Return provisional result as result.

3.4.1.2Compare C against S Object Unions

This is similar to the above except that the test class may use object unions

In summary, ANY ONE of the S properties must be compatible with the C properties. C may have additional properties.

It should be noted that ‘S’ must be fully defined i.e. via an ‘Equivalent to’ axiom. This is because if S is simply a subclass of Y’ and C is a subclass of Y it does not mean that C is a subclass of S even if all the properties of S are properties of C i.e. it may be a different subclass of Y than S is. Therefore this algorithm only works on sufficiently defined classes.

The below assumes that if a statement is made about the test class it must be true about C. Therefore the logic sets a provisional result variable to true or false depending whether a test class statement is found

Algorithm : IsorSubtypeOf (class C, test class S)

Set provisional result as false // At this point there may be no simple superclasses for the test class

With S, For each Equivalent UNION class ‘T’

If C ISA T then result is true

Else

For each “equivalent to” and “subclass of” axiom of C

For ANY intersected or union of C superclass or equivalent class ‘D’

If ‘D’ ISA ‘T’ then result = true //Found it

Else continue to next intersection class

If result is true return true

With S, for Each Equivalent union equivalent property expression (SP.SR.SQ)

For each “equivalent to” and/or “subclass of” axiom of C

FOR ANY intersected or property expression (CP.CR.CQ)

IF CP ISA SP

IF CR IS A SR

IF CQ is compatible with SQ

Provisional result is true

Else continue o next property expression

Else continue to next property expression

Return provisional result as result.

3.5Removal of redundant superclasses

Following the above the following have now been inferred:

Vehicle isa ManMadeThing

WheeledVehicle isa ManMadeMachine

WheeledVehicle isa Vehicle

Bicycle isa WheeledVehicle

Bicycle isa ManMadeMachine

Human isa PowerSource

etc

For each child for each parent, check whether its own parent is also a parent of the child. If it is then remove it. Resulting in

Vehicle isa ManMadeMachine

WheeledVehicle isa Vehicle

Bicycle isa WheeledVehicle

etc

Which is the inferred hierarchy.

3.6Generating outputs

3.6.1Relationship table

To generate a relationship table this can use the ISA table and the axioms where the axioms use property expressions

For each class, property

For each ISA

If named class

Create ISA row

Else if anonymous class

Create role group

For each property expression of anonymous class

Create property row

For each Equivalent or subclass, property expression

Create property row role group 0

3.6.2Class view

A class view is generated by stepping up the ISA tree. It also deals with anonymous classes

For each child in the ISA tree

If a child is an anonymous class

Skip

Else Process (child, child as root)

Algorithm view (child c, root r)

For each parent of c

If a parent is an anonymous class then

View (parent, child) ;passes the named child up a level thus skipping the anonymous class

If child is anonymous class

If parent= “TopConcept”

Skip

Set Tree (parent, root)