Ontology classifier: Difference between revisions

From Endeavour Knowledge Base
(Created page with "Ontologies are based on the idea of axioms which are statements made by humans that represent the current knowledge of the authors that make them. Two styles of axioms are us...")
 
 
(16 intermediate revisions by the same user not shown)
Line 1: Line 1:
Ontologies are based on the idea of axioms which are statements made by humans that represent the current knowledge of the authors that make them.
Ontologies are based on the idea of axioms, which are statements about things made by humans that are said to be probably true, and represent the current knowledge of the authors that make them. The statements do not require proof, but are used by systems, as if they were true and are unchallenged.


Two styles of axioms are used:
Ontology reasoners test the contents of an ontology to assure their consistency. As part of that process a reasoner categorises or "classifies" the concepts such that a concept that appears in a tree view below another concept, is said to be subsumed by that concept. For example, type II diabetes is a subclass of Diabetes Mellitus.


# Simple subtype axioms, such as a subclass or sub property statement, which explicitly state that concept A is a subtype of concept B. This means that the set of objects described by concept B includes the set of objects described by A. Sometimes a concept A is said to be a subclass of an intersection of concepts B and C. This means that concept A consists of a  certain subset of objects with all of the properties of concept B and all of the properties of concept C.  These types of axioms imply less than complete information in that subclass A may have other properties not explicitly articulated.
A reasoner will take a set of objects of certain types and classify them into the appropriate category. The classifier does this within the T-Box of the ontology as well as the A-Box i.e. undertakes a [[subsumption test]] of one concept against another. In practice this process occurs during authoring but is also used to generate and output that can be used in more straightforward conventional query engines.
# Equivalent class axioms. These state that concept A is equivalent to concept B. These types of axioms imply more complete information in that A, being equivalent, will not have more properties than B. From A, further subclasses can be inferred from the equivalence axioms.


Classes are defined by their properties and the values of those properties. Not all properties are not explicitly listed (all objects have an infinite number of properties) and only those of relevance to the current use of the the knowledge base are articulated. The use of properties enables a reasoner to infer that another concept may be a subtype of a concept by using a combination of Equivalence axioms and their properties.
__TOC__
== Axioms and classifier ==


In order to enable an ontology to be used in practice, the general approach is to generate inferred subtype relationships from the ontological source. An inferred view is a view of these relationships is a hierarchy of concepts, from the top level down of the kind used by Snomed-CT browsers.  The inferred output contains all of the necessary constructs to be able to perform standard queries such as subsumption testing or queries across post coordinated expressions.
Two styles of axioms are used:


The approach in this article can be considered as a simplified subset of an OWL2 reasoner, focusing on classification. A classifier essentially creates a set of statements that link concepts together by an "is a " relationship i.e. concept A is either the same as, or a subtype of, concept B . The is a relationships are further pruned to make the view simpler by making sure that if concept A -> is a subtype of -> concept B, and if concept B -> is a subtype of -> concept B, then the statement that concept A->  is a subtype of -> concept C is removed, as it can be inferred from the first relationship. Thus the view is less messy.
# Simple subtype axioms, such as a subclass or sub property statement, which explicitly state that concept C is a subtype of concept S. This means that the set of objects described by concept S includes the set of objects described by C. Sometimes a concept C is said to be a subclass of an intersection of concepts S1 and S2. A Venn diagram illustrated this. This means that concept C describes a set of objects with all of the properties of concept S1 AND all of the properties of concept S2.  These types of axioms imply less than complete information, in that subclass C may have other properties not explicitly articulated. It is not possible to mathematically infer that another concept c-sub is a subclass of C, if C is itself a subclass because of the potential of other properties not yet stated. Subclass axioms provide the bare minimum and are said to be "necessary but not sufficient" to define a concept.  The second style of axiom represents a more complete approach.
# Equivalent class axioms. These state that concept A is equivalent to concept B. Usually concept B is defined by being a combination of concept C and a set of properties P. These types of axioms imply more complete information in that A, being equivalent, will not have more properties than B. They are said to be "sufficiently defined". From A, further subclasses can be inferred.


The algorithm described in this article assumes certain constrained editorial policies, namely
Classes are further defined by their properties and the type of values (ranges) of those properties. Not all properties are explicitly listed (indeed, all objects have an infinite number of properties) and only those of relevance to the current use of the the knowledge base are articulated. The use of properties enables a reasoner to infer that another concept may be a subtype of a concept by using a combination of stated axioms and their properties.


# Object Unions are rarely used. Use  of object unions can result in exponential times for subsumption testing.
A source of confusion is the fact that many classes have names. Whilst this helps humans understand the meaning of the class, (e.g. Diabetes) from a computable perspective, these names are meaningless. It is the properties of these classes that are used in subsumption testing, aided somewhat when a human has specifically stated, via an axiom,  that a class is a subclass of, or equivalent to, another class.
# Object complement of is not supported. It is assumed that negation is considered only in the concept of closed world query.


The classifier is NOT a reasoner in that it does not check for inconsistencies
In order to enable an ontology to be used in practice, the general approach is to generate inferred subtype relationships from the ontological source. An inferred view is a view of these relationships as a hierarchy of concepts, from the top level down of the kind used by Snomed-CT browsers.  The inferred output contains all of the necessary constructs to be able to perform standard queries such as subsumption testing or queries across post coordinated expressions. Queries do not require inferred views to work, but they are substantially simpler to implement when an inferred view is used.


== Working example ==
The approach in this article can be considered as a simplified subset of an OWL2 reasoner, focusing on classification. A classifier essentially creates a set of statements that link concepts together by an "is a" relationship i.e. "is a" means that concept C is "either the same as, or a subtype of", concept S . The 'is a' relationships are further pruned to make the view simpler by making sure that if concept A -> is a subtype of -> concept B, and if concept A-> is a subtype of -> Concept C, and if concept B -> is a subtype of -> concept C, then the statement that concept A->  is a subtype of -> concept C is removed, as it can be inferred from the A-> B relationship. Thus the view is less messy to navigate as a result.
A simple ontology of 2 wheeled vehicles is as follows:


ManMadeThing
The algorithm described in this article assumes certain constrained editorial policies, namely:


Vehicle
# 'Object Union of'  are rarely used. Use  of object unions can result in exponential times for subsumption testing.
# 'Object complement of' is not supported. It is assumed that negation is considered only in the concept of closed world query e.g. for generating value sets whereby concepts are automatically filtered out, whereas the use of object complement often rules everything else out also as the reasoner does not know whether an otherwise included concept is negated or not.
#General concept inclusion axioms (GCI axioms) are ignored in generating the inferred view. This is based on the assumption that if these axioms are to be used then a reasoner would know how o use them without creating an "is a" relationship.


WheeledVehicle
The classifier is NOT a reasoner in that it does not check for inconsistencies


Bicycle
== Working example ==
 
Metal
 
Wheel
 
In the inferred view, it is intuitively clear from the above that bicycles and motorbikes are subtypes of wheeled vehicles and wheeled vehicles are subtypes of Vehicles.
 
Thing
 
ManMadeThing
 
Vehicle
 
WheeledVehicle
 
Bicycle
 
 
The algorithm described below is designed to generate the untitively inferred view above, designed to operate at the level of a large ontology whereby the above intutive relationships may not be apparent to a human.
 
It takes  acccount of complex class defintions that includes object intersections object unions.
 
= 3Inferred view algorithm =
To put it simply the algorithm is to  '''Test each class to see which classes it is a subtype of'''
 
“'''for each named class C in the ontology,'''
 
'''If it is only a primitive subtype, then add the supertypes S as C S'''
 
'''Else for each named class C in the ontology,'''
 
'''check each other named class S in the ontology'''
 
'''IF C is a subtype of S then'''
 
'''add it to the inferred view as C S “'''
 
'''then in the inferred hierarchy remove anonymous classes'''
 
“'''for each subtype C'''
 
'''For each supertype S'''
 
'''If S is anonymous        ---- Removal of anonymous parent classes'''
 
    '''Add parents of S to C”'''
 
'''Then remove redundant parents'''
 
“'''for each child class C in the inferred view'''
 
'''For each parent class S'''
 
'''For each other parent class O'''
 
'''If O is subtype of S'''
 
'''Remove S'''
 
== 3.1The ISA relationship and algorithm ==
 
=== 3.1.1ISA relationship ===
If C ISA S then this means that C is a subtype of S, or C is a subset of the set of  S.
 
Isa may originate as any of the following:  “Subclass”, “SubObjectProperty”,”SubDataProperty”,”EquivalentTo” axioms
 
ISA relationship is transitive. Tjis means that if A ISA B, and B ISA C, then A ISA C
 
=== 3.1.2ISA function ===
ISA (descendent, ancestor) returns true or false
 
The ISA algorithm uses the ISA tree recursively to check a descendant class against an ancestor class i.e. it steps up the tree until either it reaches the top concept or finds a match.
 
'''Algorithm'''
 
If the descendant = ascendant then ISA is true
 
Set child to the descendent
 
For each parent of the child
 
If the parent = ancestor then result is true
 
Else repeat with parent as descendent and same ancestor
 
For example
 
Is a Bicycle wheel a descendent of a man made thing ?
 
True  because it is a child of wheel which is a child of man made thing.
 
== 3.2Initial inference tree ==
By stepping through classes, object properties, and data properties it is possible to generate an initial inference view (parent/child and child/ parent).
 
This algorithm simply restates that the subclasses or equivalent classes asserted are included in the inferred view. Any superclass or equivalent class that is an Object intersection or Object union containing a named class must mean that the class is a subtype
 
For example
 
If  a Vehicle is a Subclass Of (Direction  &/OR ManMade thing ……),  then a vehicle must be a subtype of both a Direction and a Man Made thing, whatever other property expressions are.
 
'''Algorithm'''
 
N.B SubX means either Subclass of, Sub object property of, or Sub data property of or Equivalent to
 
For each concept
 
If C has no SubX axioms then
 
Tree (Top concept , C)  ISA (C , TopConcept)      C ISA Top concept
 
If C is subX S then
 
Tree (S, C)  ISA(C , S) Simple restatement  as ISA
 
If C is subX of Intersection or union of (S….  then simple class ISAs
 
Tree(S, C),  ISA (C , S)
 
=== 3.2.1Creation of temporary anonymous classes ===
If the pattern is  C is equivalent to Intersection of A union of Property P.R Property P.R1 with R being a union of (S1 or S2) then generate an anonymous class as follows
 
For example, if a Wheel is Equivalent to ( a man made thing & ( (is made of some steel) or (is made of some Aluminium)) then an anonymous class of
 
_a1 Equivalent to “Made of some steel or made of some Aluminium” is created
 
Tree( _a1,  C)  ISA (C , _a1)
 
Tree(S1, _a1) ISA (_a1  ISA S1)
 
This enables the class _a1 to operate as if it was a named class, (although removed at the end) and thus copes with any level of nesting of property ranges.
 
== 3.3Simple class inheritance ==
As all classes and properties are now in the tree.
 
'''Step DOWN the tree in order to infer each class'''. Stepping down assures that as the lower levels are computed the superclasses will also have been computed already.  as then the inferences accumulate as the lower depths are examined.
 
'''For any class that only has subclass axioms'''
 
'''And whose superclasses are either simple class expressions, or intersections of simple class expressions'''
 
'''Then they are marked as ISA relationships and no further inference is needed.'''
 
'''Else – complex class inference with entailment tests are required.'''
 
== 3.4Complex class Inference with entailment tests ==
For each class in the tree, check every other class in the ontology to see if it a subclass.
 
“'''for each named class C in the ontology,'''
 
'''check each other named class S in the ontology'''
 
'''IF C is a subtype of S then'''
 
'''add it to the inferred view as C S “'''
 
=== 3.4.1Is C a subclass of S? ===
Each class C has a set of axioms (or if not then it is a child of Top concept) and move on
 
Each test class S has a set of axioms (or not) and therefore the algorithm checks the axioms associated with S and the axioms associated with C.
 
'''Algorithm'''
 
IF C is a subtype of S then
 
Tree (S,C)  ISA (C,S)
 
'''Algorithm :Is subtype Of (class C, testclass S)''' Return true or false
 
First a quick check to C if C is a subclass of S, equivalent as intersection or union,  as in above then it is already set.
 
For each Subclass and each Equivalent Class axiom
 
Compare S ObjectIntersections
 
Compare S ObjectUnions
 
==== 3.4.1.1Compare C against S Object Intersections ====
In summary, all of the S properties must be compatible with the C properties. C may have additional properties.
 
It should be noted that ‘S’ must be fully defined i.e. via an ‘Equivalent to’ axiom. This is because if S is simply a subclass of Y’ and C is a subclass of Y it does not mean that C is a subclass of S even if all the properties of S are properties of C i.e. it may be a different subclass of Y than S is. Therefore this algorithm only works on sufficiently defined classes.
 
The below assumes that if a statement is made about the test class it must be true about C. Therefore the logic sets a provisional result variable to true or false depending whether a test class statement is found
 
'''Algorithm : IsAndSubtypeOf (class C, test class S)'''
 
Set provisional result as true            // At this point there may be no simple superclasses for the test class
 
With S, For each '''''Equivalent''''' intersected equivalent class ‘T’
 
If C ISA T  then result is true
 
Else
 
Set provisional result to false      // Found a superclass therefore MUST be in C
 
For each “equivalent to” and “subclass of” axiom of C
 
For ANY intersected or union of C superclass or equivalent class ‘D’
 
If ‘D’  ISA ‘T’ then provisional result = true          //Found it
 
Else continue to next intersection class
 
If provisional result is false then the return false                  //There were superclasses of S but non were found to be compatible.
 
The next step is to check each property expression for test class S
 
The syntax used here is
 
P = Property
 
R= Property range class, either found in the property expression OR deduced from the property’s range axiom.
 
Q = Qualified cardinality (e.g. some, only, max, min, exact)
 
'''Algorithm IsandSubPropertyExpression (class C, test, class S)'''
 
With S, for Each '''''Equivalent''''' intersected equivalent property expression (SP.SR.SQ)
 
For each “equivalent to” and/or “subclass of” axiom of C
 
Provisional result = false // Found a property expression
 
FOR ANY intersected or union property expression (CP.CR.CQ)
 
IF CP ISA SP
 
IF CR IS A SR
 
IF CQ is compatible with SQ
 
''Provisional result is true''
 
Else continue o next property expression
 
Else continue to next property expression
 
Return provisional result as result.
 
==== 3.4.1.2Compare C against S Object Unions ====
This is similar to the above except that the test class may use object unions
 
In summary, ANY ONE of the S properties must be compatible with the C properties. C may have additional properties.
 
It should be noted that ‘S’ must be fully defined i.e. via an ‘Equivalent to’ axiom. This is because if S is simply a subclass of Y’ and C is a subclass of Y it does not mean that C is a subclass of S even if all the properties of S are properties of C i.e. it may be a different subclass of Y than S is. Therefore this algorithm only works on sufficiently defined classes.
 
The below assumes that if a statement is made about the test class it must be true about C. Therefore the logic sets a provisional result variable to true or false depending whether a test class statement is found
 
'''Algorithm : IsorSubtypeOf (class C, test class S)'''
 
Set provisional result as false            // At this point there may be no simple superclasses for the test class
 
With S, For each '''''Equivalent''''' UNION class ‘T’
 
If C ISA T  then result is true


Else
The working example used in this article is a simple ontology of amoxicillin 500 mg presented in two syntaxes. The example contains only the main class axioms for brevity


For each “equivalent to” and “subclass of” axiom of C
<div class="toccolours mw-collapsible mw-collapsed">
Manchester syntax:
<div class="mw-collapsible-content"><pre>
Class: im:AmoxicillinProduct
    EquivalentTo:
        im:MedicinalProduct
        and (im:hasRoleGroup some (im:hasIngredient some im:Amoxicillin))
Class: im:Amoxicillin500mg
    EquivalentTo:
        im:MedicinalProduct
        and (im:hasRoleGroup some
            ((im:hasIngredient some im:Amoxicillin)
            and (im:hasStrength some im:Fivehundredmg)))
</pre></div> </div>


For ANY intersected or union of C superclass or equivalent class ‘D’
In Discovery json syntax this can be represented as


If ‘D’  ISA ‘T’ then result = true           //Found it
<div class="toccolours mw-collapsible mw-collapsed">
Discovery syntax:
<div class="mw-collapsible-content"><syntaxhighlight lang="JSON">
{"Class" : [ {
      "iri" : "im:AmoxicillinProduct",
      "EquivalentTo" : [ {
        "Intersection" : [ {
          "Class" : "im:MedicinalProduct"
        }, {
          "ObjectSome" : {
            "Property" : "im:hasRoleGroup",
            "ObjectSome" : {
              "Class" : "im:Amoxicillin",
              "Property" : "im:hasIngredient"
            }
          }
        } ]
      } ]
    }, {
      "iri" : "im:Amoxicillin500mg",
      "EquivalentTo" : [ {
        "Intersection" : [ {
          "Class" : "im:MedicinalProduct"
        }, {
          "ObjectSome" : {
            "Property" : "im:hasRoleGroup",
            "Intersection" : [ {
              "ObjectSome" : {
                "Class" : "im:Amoxicillin",
                "Property" : "im:hasIngredient"
              }
            }, {
              "ObjectSome" : {
                "Class" : "im:Fivehundredmg",
                "Property" : "im:hasStrength"
              }
            } ]
           }
        } ]
      } ]
    }
}
</syntaxhighlight></div></div>


Else continue to next intersection class
The two main classes are defined by their properties, the names being assigned for readability purposes.  In this case the drugs are said to have a property referred to as a "role group".


If result is true return true
A 'role group' is a naming convention for an anonymous property whose value is an object from an anonymous class. in this case the different role group's have class with different properties.


With S, for Each '''''Equivalent''''' union equivalent property expression (SP.SR.SQ)
As defined the two classes appear independent i.e.


For each “equivalent to” and/or “subclass of” axiom of C
* Amoxcillin product is equivalent to a product, that has a role group, that has an ingredient of the substance amoxicillin


FOR ANY intersected or property expression (CP.CR.CQ)
* Amoxicillin 500mg is equivalent to a product, that has a role group ,that has an ingredient of the substance amoxicillin AND a strength of 500mg.


IF CP ISA SP
Intuitively, it is easy to see a relationship between them. It is reasonable to infer that the second product appears to be a sub type of the first. However, that is a human inference. A computer has to rely on logical inference.


IF CR IS A SR
A classifier generates the "is a" relationship by testing whether the second product is subsumed by the first.


IF CQ is compatible with SQ
== Classifier algorithm ==
To put it simply the algorithm is in two steps:


''Provisional result is true''
# For each class 'S' in the ontology, find all the classes C1,C2.....Cn that are subsumed by S
# Organise the subtypes into the most efficient tree structure so that if class C1 is subsumed by S1 and S2, and if S2 is subsumed by S1, then the downward tree view is S1-> S2- > C1


Else continue o next property expression
=== The ISA relationship ===
If C-> ISA> S then this means that C is a subtype of S, or the set of C is a subset of the set of  S.


Else continue to next property expression
The algorithms use logical inference to infer that C -> is a - > S via a number of mechanisms.


Return provisional result as result.
In addition, in Discovery "is a like" relationships are also generated as a result of semantically inexact mappings "mapped to" as well as the properties "replaced by", and its reciprocal "replaced". These are not strictly 'is a' relationships as they may or may not be subtypes, but they are optionally used in subsumption query to find related concepts in records.


== 3.5Removal of redundant superclasses ==
The 'is a' relationship is transitive. This means that if A ISA B, and B ISA C, then A ISA C. This is used in subsumption testing.
Following the above the following have now been inferred:


Vehicle isa ManMadeThing
=== Algorithm Steps ===


WheeledVehicle isa ManMadeMachine
==== Step 1 indexing of ontology and anonymous class generation ====
In order to improve efficiency, it is useful to have an index of all properties and ranges to the class definitions so that for any property, or property range pair, the set of classes that have that property explicitly modelled can easily and quickly be found. In effect this is a simple inverted index.


WheeledVehicle isa Vehicle
Account must be taken of nested or complex ranges in class expressions. In the working example, the property "role group" appears in two different classes, each with a different range. Each range is further defined by a complex class expression consisting in one case of a property/ class pair, and in the other as an intersection of 2 property/class pairs.


Bicycle isa WheeledVehicle
The indexer may at this point creates two anonymous classes:


Bicycle isa ManMadeMachine
_a1        which has property of ingredient with a range of amoxicillin


Human isa PowerSource
_a2        which is an intersection of an ingredient with a range of amoxicillin, and a strength with a range of 500 mg


etc
Thus amoxicillin 500 mg product  would have a property of "role group" with a range of a2.  As far as the computer is concerned _a1, _a2 is of the same significance as amoxicillin product.


For each child for each parent, check whether its own parent is also a parent of the child. If it is then remove it. Resulting in
==== Step 2 - Initial inference tree ====
A very large proportion of an inference tree already exists, via the axioms stated by humans, which are:


Vehicle isa ManMadeMachine
* C :  '[https://www.w3.org/TR/owl2-syntax/#Subclass_Axioms SubclassOf] ' S  they are both named classes then C -> is A > S  and S -> infers -> C


WheeledVehicle isa Vehicle
* C  :'[https://www.w3.org/TR/owl2-syntax/#Equivalent_Classes EquivalentTo]' S then C-> is a -> S, and S-> subsumes- >C


Bicycle isa WheeledVehicle
* P:  '[https://www.w3.org/TR/owl2-syntax/#Object_Subproperties SubObjectPropertyO]f' PS then P -> is a >PS and PS-> subsumes -> P  (likewise sub data property)


etc
It does not matter whether the subclass or equivalent axioms use intersections or unions with additional properties in their definitions. They are still "is a" relationships.


Which is the inferred hierarchy.
Creation of an inferred tree results in performance improvements as an already stated 'is a' relationship need not be examined. In addition, the resulting inferred view can also be used when testing for subsumption of properties and property ranges used in the rest of the algorithm.


== 3.6Generating outputs ==
For example:


=== 3.6.1Relationship table ===
Amoxicillin product  is Equivalent to a Medicinal product AND property X and Y , therefore  Amoxicillin -> is a> Medicinal product , or to put it another way: Medicinal product -> subsumes> Amoxicillin
To generate a relationship table this can use the ISA table and the axioms where the axioms use property expressions


For each class, property
Amoxicillin 500 mg product is equivalent to a medicinal product AND property Y and X, therefore Amoxicillin 500mg -> is a> medicinal product, or the other way round:  Medicinal Product -> subsumes -> Amoxicillin 500 mg product.


For each ISA
==== IsA function ====
The isa function - isa(sub, super) is a function used throughout that steps up the populated inferred tree to indicate whether ''sub'' is the same as, or a subtype of, ''super''.


If named class
It should be noted for this function to work it assumes both sub, and super have already been processed as subtypes. In other words the 'isa' function only works after a particular class and super-class has already been processed. This means that eventually the system falls back on primitive concepts authored as equivalent or subclasses by a human i.e. without a human stating that something is the same as or a subclass of, something else then the classifier can rely ONLY on properties to produce an inference tree. In practice, most classes and properties are placed somewhere in a subclass hierarchy so the reasoner tends to have a leg up when operating.


Create ISA row
==== Step 3 - Property expression subsumption testing ====
Subsumption can be inferred from the testing of a class's properties and the property ranges, against other classes that  also have the properties and ranges.


Else if anonymous class
Account must be taken of the boolean logic of Intersection and Union e.g. via the use of a truth tree. When testing intersections (AND) all the expressions must be true. When testing union (OR) at least one of the expressions must be true. A truth tree branches on union and goes back to try again on another branch if one branch ends up as untrue.


Create role group
In theory each expression could be tested against all other classes in the ontology, (except the ones that have already been subsumed as above). However, this results in poor performance. One means of improving performance is to use the property indexes to identify candidate concepts to test.


For each property expression of anonymous class
The subsumption testing can thus be divided into two steps for each concept:


Create property row
a) Look for at least one of properties to access the property index with


For each Equivalent or subclass, property expression
b) Test each class that has the property (or a subproperty), and optionally narrow down further by range testing also.


Create property row role group 0
In essence this uses the following pseudo code logic:<pre>
For each class C in the ontology
          InferClass(C)


=== 3.6.2Class view ===
InferClass(c class)                  //Looks for classes to subsume
A class view is generated by stepping up the ISA tree. It also deals with anonymous classes
  IF equivalentClass is Object Intersection (or single) then
      For the FIRST c.property              // No Branch in truth tree
      for each c.p.range r
          if r not inferred then
            Infer(r)
          else null
         
          SubClasstest (c, p, r)          //found candidate
          SubRangetest (c,p,r)            //looks for candidates with subranges
          SubPropertytest(c,p,r)      //looks for candidates with subproperties


For each child in the ISA tree
  ELSE
    IF Equivalent class is Object Union then
      for EACH c.property                  //Branching OR truth tree
      for each c.p.range r
        if r not inferred then
            Infer(r)
        else null
               
        SubClasstest (c, p, r)          //found candidate
        SubRangetest (c,p, r)            //looks for candidates with subranges
        SubPropertytest(c,p,r)      //looks for candidates with subproperties
</pre>
----The above provides a property to use against the property index to find other classes to test.


If a child is an anonymous class
As well as using the property index, potential candidates may have sub properties, or properties with ranges that are subtypes of the range. All need to be examined and therefore the above takes 3 approaches: The property and the range, a set of subclasses of the range, and a set of sub properties and the range. As follows:<pre>
SubRangeTest(s superclass, op operator, p property, c candidate)
  For each subtype of c  sub
        SubclassTest(s,op, p, sub)  // finding a subtype index
        SubRangeTest(s, op, p, sub)  // steps down the sub types of the range


Skip
SubPropertyTest(s superclass, op operator, p property, c candidate)
  For each subproperty of p  subprop
        SubclassTest(s,op, subprop, sub)  // finding a subtype index
        SubRangeTest(s,op, subprop, sub)  //Looks for subtypes of the range
        SubPropertyTest(s, op, subprop,sub)  //Steps down the sub properties
</pre>Having found a property and range to find in the index, find all of the classes that have that property and range and for each of them, test whether it is subsumed or not.
<pre>


Else Process (child, child as root)
SubClassTest (s superclass, op operator, p property r range)
  for each class c with property p and range r
      SubTest(s, c)                   //Is c a subsumed by s?


'''Algorithm view (child c, root r)'''
SubTest (s superclass, c candidate)   //Gets to do a subsumption test
  if s not defined as equivalent then
          END test                    // cannot subsume from a subclass
                                        // unless already stated as such


For each parent of c
    If class defined by intersection or single then
  (
      For each equivalent class e of s
        if e not yet inferred
          infer(e)                      //e must be processed first
        if c is not a subtype of s then
            END TEST  // tests          // c must also be a subclass of


If a parent is an anonymous class then
      For each property p of s
        for each property p1 of c
            if isa(p1,p) then  
              for each range r of s.p
                if r not yet inferred infer(r)
                for each range r1 of c.p1
                  if isa(r1,r) then
                  IS SUBTYPE
                NOT SUBTYPE
        NOT SUBTYPE


View (parent, child) ;passes the named child up a level thus skipping the anonymous class
    )
    else (must be union)
    Logic as above but only one of the properties must match
</pre>


If child is anonymous class
==== Removal of redundant super classes ====
The algorithm described above would result in the following tree entries<pre>
Medicinal Product
      Amoxicillin Product
        Amoxicillin 500 mg
      Amoxicillin 500 mg
</pre>Because amoxicillin 500 mg is a subtype of amoxicillin it can now be removed from the tree as it is subsumed by both the medicinal product and the amoxicillin product.


If parent= “TopConcept”
== The IsA axiom ==
The OWL language explicitly seperates a sub class from a sub property from an equivalent property.


Skip
However, for convenience it is often easier to treat these all the same.


Set Tree (parent, root)
Therefore in Discovery information modelling language, the inferred axiom 'IsA' is created as part of the language to be used in tree views, subsumption testing, or generation of transitive closure tables.
<br />

Latest revision as of 16:28, 29 June 2020

Ontologies are based on the idea of axioms, which are statements about things made by humans that are said to be probably true, and represent the current knowledge of the authors that make them. The statements do not require proof, but are used by systems, as if they were true and are unchallenged.

Ontology reasoners test the contents of an ontology to assure their consistency. As part of that process a reasoner categorises or "classifies" the concepts such that a concept that appears in a tree view below another concept, is said to be subsumed by that concept. For example, type II diabetes is a subclass of Diabetes Mellitus.

A reasoner will take a set of objects of certain types and classify them into the appropriate category. The classifier does this within the T-Box of the ontology as well as the A-Box i.e. undertakes a subsumption test of one concept against another. In practice this process occurs during authoring but is also used to generate and output that can be used in more straightforward conventional query engines.

Axioms and classifier

Two styles of axioms are used:

  1. Simple subtype axioms, such as a subclass or sub property statement, which explicitly state that concept C is a subtype of concept S. This means that the set of objects described by concept S includes the set of objects described by C. Sometimes a concept C is said to be a subclass of an intersection of concepts S1 and S2. A Venn diagram illustrated this. This means that concept C describes a set of objects with all of the properties of concept S1 AND all of the properties of concept S2. These types of axioms imply less than complete information, in that subclass C may have other properties not explicitly articulated. It is not possible to mathematically infer that another concept c-sub is a subclass of C, if C is itself a subclass because of the potential of other properties not yet stated. Subclass axioms provide the bare minimum and are said to be "necessary but not sufficient" to define a concept. The second style of axiom represents a more complete approach.
  2. Equivalent class axioms. These state that concept A is equivalent to concept B. Usually concept B is defined by being a combination of concept C and a set of properties P. These types of axioms imply more complete information in that A, being equivalent, will not have more properties than B. They are said to be "sufficiently defined". From A, further subclasses can be inferred.

Classes are further defined by their properties and the type of values (ranges) of those properties. Not all properties are explicitly listed (indeed, all objects have an infinite number of properties) and only those of relevance to the current use of the the knowledge base are articulated. The use of properties enables a reasoner to infer that another concept may be a subtype of a concept by using a combination of stated axioms and their properties.

A source of confusion is the fact that many classes have names. Whilst this helps humans understand the meaning of the class, (e.g. Diabetes) from a computable perspective, these names are meaningless. It is the properties of these classes that are used in subsumption testing, aided somewhat when a human has specifically stated, via an axiom, that a class is a subclass of, or equivalent to, another class.

In order to enable an ontology to be used in practice, the general approach is to generate inferred subtype relationships from the ontological source. An inferred view is a view of these relationships as a hierarchy of concepts, from the top level down of the kind used by Snomed-CT browsers. The inferred output contains all of the necessary constructs to be able to perform standard queries such as subsumption testing or queries across post coordinated expressions. Queries do not require inferred views to work, but they are substantially simpler to implement when an inferred view is used.

The approach in this article can be considered as a simplified subset of an OWL2 reasoner, focusing on classification. A classifier essentially creates a set of statements that link concepts together by an "is a" relationship i.e. "is a" means that concept C is "either the same as, or a subtype of", concept S . The 'is a' relationships are further pruned to make the view simpler by making sure that if concept A -> is a subtype of -> concept B, and if concept A-> is a subtype of -> Concept C, and if concept B -> is a subtype of -> concept C, then the statement that concept A-> is a subtype of -> concept C is removed, as it can be inferred from the A-> B relationship. Thus the view is less messy to navigate as a result.

The algorithm described in this article assumes certain constrained editorial policies, namely:

  1. 'Object Union of' are rarely used. Use of object unions can result in exponential times for subsumption testing.
  2. 'Object complement of' is not supported. It is assumed that negation is considered only in the concept of closed world query e.g. for generating value sets whereby concepts are automatically filtered out, whereas the use of object complement often rules everything else out also as the reasoner does not know whether an otherwise included concept is negated or not.
  3. General concept inclusion axioms (GCI axioms) are ignored in generating the inferred view. This is based on the assumption that if these axioms are to be used then a reasoner would know how o use them without creating an "is a" relationship.

The classifier is NOT a reasoner in that it does not check for inconsistencies

Working example

The working example used in this article is a simple ontology of amoxicillin 500 mg presented in two syntaxes. The example contains only the main class axioms for brevity

Manchester syntax:

Class: im:AmoxicillinProduct
    EquivalentTo: 
        im:MedicinalProduct
         and (im:hasRoleGroup some (im:hasIngredient some im:Amoxicillin))
 
Class: im:Amoxicillin500mg
    EquivalentTo: 
        im:MedicinalProduct
         and (im:hasRoleGroup some 
            ((im:hasIngredient some im:Amoxicillin)
             and (im:hasStrength some im:Fivehundredmg)))
 

In Discovery json syntax this can be represented as

Discovery syntax:

{"Class" : [ {
      "iri" : "im:AmoxicillinProduct",
      "EquivalentTo" : [ {
        "Intersection" : [ {
          "Class" : "im:MedicinalProduct"
        }, {
          "ObjectSome" : {
            "Property" : "im:hasRoleGroup",
            "ObjectSome" : {
              "Class" : "im:Amoxicillin",
              "Property" : "im:hasIngredient"
            }
          }
        } ]
      } ]
    }, {
      "iri" : "im:Amoxicillin500mg",
      "EquivalentTo" : [ {
        "Intersection" : [ {
          "Class" : "im:MedicinalProduct"
        }, {
          "ObjectSome" : {
            "Property" : "im:hasRoleGroup",
            "Intersection" : [ {
              "ObjectSome" : {
                "Class" : "im:Amoxicillin",
                "Property" : "im:hasIngredient"
              }
            }, {
              "ObjectSome" : {
                "Class" : "im:Fivehundredmg",
                "Property" : "im:hasStrength"
              }
            } ]
          }
        } ]
      } ]
    }
}

The two main classes are defined by their properties, the names being assigned for readability purposes. In this case the drugs are said to have a property referred to as a "role group".

A 'role group' is a naming convention for an anonymous property whose value is an object from an anonymous class. in this case the different role group's have class with different properties.

As defined the two classes appear independent i.e.

  • Amoxcillin product is equivalent to a product, that has a role group, that has an ingredient of the substance amoxicillin
  • Amoxicillin 500mg is equivalent to a product, that has a role group ,that has an ingredient of the substance amoxicillin AND a strength of 500mg.

Intuitively, it is easy to see a relationship between them. It is reasonable to infer that the second product appears to be a sub type of the first. However, that is a human inference. A computer has to rely on logical inference.

A classifier generates the "is a" relationship by testing whether the second product is subsumed by the first.

Classifier algorithm

To put it simply the algorithm is in two steps:

  1. For each class 'S' in the ontology, find all the classes C1,C2.....Cn that are subsumed by S
  2. Organise the subtypes into the most efficient tree structure so that if class C1 is subsumed by S1 and S2, and if S2 is subsumed by S1, then the downward tree view is S1-> S2- > C1

The ISA relationship

If C-> ISA> S then this means that C is a subtype of S, or the set of C is a subset of the set of S.

The algorithms use logical inference to infer that C -> is a - > S via a number of mechanisms.

In addition, in Discovery "is a like" relationships are also generated as a result of semantically inexact mappings "mapped to" as well as the properties "replaced by", and its reciprocal "replaced". These are not strictly 'is a' relationships as they may or may not be subtypes, but they are optionally used in subsumption query to find related concepts in records.

The 'is a' relationship is transitive. This means that if A ISA B, and B ISA C, then A ISA C. This is used in subsumption testing.

Algorithm Steps

Step 1 indexing of ontology and anonymous class generation

In order to improve efficiency, it is useful to have an index of all properties and ranges to the class definitions so that for any property, or property range pair, the set of classes that have that property explicitly modelled can easily and quickly be found. In effect this is a simple inverted index.

Account must be taken of nested or complex ranges in class expressions. In the working example, the property "role group" appears in two different classes, each with a different range. Each range is further defined by a complex class expression consisting in one case of a property/ class pair, and in the other as an intersection of 2 property/class pairs.

The indexer may at this point creates two anonymous classes:

_a1 which has property of ingredient with a range of amoxicillin

_a2 which is an intersection of an ingredient with a range of amoxicillin, and a strength with a range of 500 mg

Thus amoxicillin 500 mg product would have a property of "role group" with a range of a2. As far as the computer is concerned _a1, _a2 is of the same significance as amoxicillin product.

Step 2 - Initial inference tree

A very large proportion of an inference tree already exists, via the axioms stated by humans, which are:

  • C : 'SubclassOf ' S they are both named classes then C -> is A > S and S -> infers -> C
  • C  :'EquivalentTo' S then C-> is a -> S, and S-> subsumes- >C
  • P: 'SubObjectPropertyOf' PS then P -> is a >PS and PS-> subsumes -> P (likewise sub data property)

It does not matter whether the subclass or equivalent axioms use intersections or unions with additional properties in their definitions. They are still "is a" relationships.

Creation of an inferred tree results in performance improvements as an already stated 'is a' relationship need not be examined. In addition, the resulting inferred view can also be used when testing for subsumption of properties and property ranges used in the rest of the algorithm.

For example:

Amoxicillin product is Equivalent to a Medicinal product AND property X and Y , therefore Amoxicillin -> is a> Medicinal product , or to put it another way: Medicinal product -> subsumes> Amoxicillin

Amoxicillin 500 mg product is equivalent to a medicinal product AND property Y and X, therefore Amoxicillin 500mg -> is a> medicinal product, or the other way round: Medicinal Product -> subsumes -> Amoxicillin 500 mg product.

IsA function

The isa function - isa(sub, super) is a function used throughout that steps up the populated inferred tree to indicate whether sub is the same as, or a subtype of, super.

It should be noted for this function to work it assumes both sub, and super have already been processed as subtypes. In other words the 'isa' function only works after a particular class and super-class has already been processed. This means that eventually the system falls back on primitive concepts authored as equivalent or subclasses by a human i.e. without a human stating that something is the same as or a subclass of, something else then the classifier can rely ONLY on properties to produce an inference tree. In practice, most classes and properties are placed somewhere in a subclass hierarchy so the reasoner tends to have a leg up when operating.

Step 3 - Property expression subsumption testing

Subsumption can be inferred from the testing of a class's properties and the property ranges, against other classes that also have the properties and ranges.

Account must be taken of the boolean logic of Intersection and Union e.g. via the use of a truth tree. When testing intersections (AND) all the expressions must be true. When testing union (OR) at least one of the expressions must be true. A truth tree branches on union and goes back to try again on another branch if one branch ends up as untrue.

In theory each expression could be tested against all other classes in the ontology, (except the ones that have already been subsumed as above). However, this results in poor performance. One means of improving performance is to use the property indexes to identify candidate concepts to test.

The subsumption testing can thus be divided into two steps for each concept:

a) Look for at least one of properties to access the property index with

b) Test each class that has the property (or a subproperty), and optionally narrow down further by range testing also.

In essence this uses the following pseudo code logic:

For each class C in the ontology 
          InferClass(C)

InferClass(c class)                  //Looks for classes to subsume
   IF equivalentClass is Object Intersection (or single) then
      For the FIRST c.property              // No Branch in truth tree
       for each c.p.range r
          if r not inferred then
             Infer(r)
          else null
           
          SubClasstest (c, p, r)           //found candidate
          SubRangetest (c,p,r)            //looks for candidates with subranges
          SubPropertytest(c,p,r)      //looks for candidates with subproperties

   ELSE
    IF Equivalent class is Object Union then
      for EACH c.property                   //Branching OR truth tree
       for each c.p.range r
         if r not inferred then
            Infer(r)
         else null
                
         SubClasstest (c, p, r)           //found candidate
         SubRangetest (c,p, r)            //looks for candidates with subranges
         SubPropertytest(c,p,r)      //looks for candidates with subproperties

The above provides a property to use against the property index to find other classes to test. As well as using the property index, potential candidates may have sub properties, or properties with ranges that are subtypes of the range. All need to be examined and therefore the above takes 3 approaches: The property and the range, a set of subclasses of the range, and a set of sub properties and the range. As follows:

SubRangeTest(s superclass, op operator, p property, c candidate)
  For each subtype of c  sub
        SubclassTest(s,op, p, sub)  // finding a subtype index
        SubRangeTest(s, op, p, sub)  // steps down the sub types of the range

SubPropertyTest(s superclass, op operator, p property, c candidate)
  For each subproperty of p  subprop
        SubclassTest(s,op, subprop, sub)  // finding a subtype index
        SubRangeTest(s,op, subprop, sub)  //Looks for subtypes of the range
        SubPropertyTest(s, op, subprop,sub)  //Steps down the sub properties

Having found a property and range to find in the index, find all of the classes that have that property and range and for each of them, test whether it is subsumed or not.


SubClassTest (s superclass, op operator, p property r range)
  for each class c with property p and range r
      SubTest(s, c)                    //Is c a subsumed by s?

SubTest (s superclass, c candidate)    //Gets to do a subsumption test
   if s not defined as equivalent then
           END test                     // cannot subsume from a subclass
                                        // unless already stated as such

    If class defined by intersection or single then
   (
      For each equivalent class e of s 
        if e not yet inferred 
           infer(e)                       //e must be processed first
        if c is not a subtype of s then 
            END TEST   // tests           // c must also be a subclass of 

      For each property p of s
         for each property p1 of c
            if isa(p1,p) then 
               for each range r of s.p
                 if r not yet inferred infer(r)
                 for each range r1 of c.p1
                   if isa(r1,r) then
                   IS SUBTYPE
                 NOT SUBTYPE
        NOT SUBTYPE

    )
    else (must be union)
     Logic as above but only one of the properties must match

Removal of redundant super classes

The algorithm described above would result in the following tree entries

Medicinal Product
      Amoxicillin Product
         Amoxicillin 500 mg
      Amoxicillin 500 mg

Because amoxicillin 500 mg is a subtype of amoxicillin it can now be removed from the tree as it is subsumed by both the medicinal product and the amoxicillin product.

The IsA axiom

The OWL language explicitly seperates a sub class from a sub property from an equivalent property.

However, for convenience it is often easier to treat these all the same.

Therefore in Discovery information modelling language, the inferred axiom 'IsA' is created as part of the language to be used in tree views, subsumption testing, or generation of transitive closure tables.