Supporting the analysis of ontology evolution processes through the combination of static and dynamic scaling functions in OQuaRE

Definitions

In this section, we define a series of concepts related to ontology evolution from the OQuaRE perspective.

Definition 1

Versioned corpus of an ontology (v
C
?
): is a list of versions {v
i
} of the same ontology ?, where i represents the chronological position of v
i
in v
C
?
.

The comparison of different versions of the same ontology highlights changes and commonalities between the versions [5]. The comparison can be done using metrics of different nature (real-valued metrics, factor, ordered factors, etc.). In order to include all of them in a common context, the method requires the adaptation of the metrics, because they need to satisfy the constraints described in Definition 2.

Definition 2

Comparison criteria (f
?
): is a discretisation framework that, for every version v
i
?v
C
?
, provides a vector s
i
of integers that can be used to rank those versions in v
C
?
.

The number of components of the vector s
i
is r. For example, if we use TMOnto as a unique comparison criterion, f
?
discretises its real-value, using the quality score, to the range [1,5]. Moreover, in this case these integers are related to the different qualitative levels defined by OQuaRE, although different levels could be used. Then, given two versions v
i
and v
j
, if f
?
produces the scores 5 and 1 respectively, that means that v
j
is more tangled than v
i
. Similarly, the remaining 13 metrics can be added to the comparison criteria, and this is what we propose as a means to analyse the evolution of ontologies. Therefore, the application of f
?
to v
i
generates a vector s
i
of 14 components. The more components the vector s
i
has, the harder it is to compare and interpret the changes. For this reason we provide the user with some definitions whose aim is to describe different types of changes. Hence, given two consecutive versions v
i?1,v
i
?v
C
?
, with i1, and given the vectors s
i?1 and s
i
obtained by the application of the comparison criteria f
?
, a change in scale of version v
i
from version v
i?1 is described in Definition 3.

Definition 3

Change in scale: vector of change associated with different values of the components of the vector s
i
with respect to s
i?1. The vector l
i
, which is calculated as s
i
?s
i?1, represents the levels in size and direction of the changes from v
i?1 to v
i
version, with i1.

It should be pointed out that the change in scale applies to all the versions of an ontology except to the first one, which corresponds to i=1 in v
C
?
. Since the OQuaRE quality scores are the comparison criteria the level ranges from [-4, 4], so the direction can be positive or negative. For example, let us suppose a v
C
?
that contains six elements v
1, …, v
6. The application of f
?
to v
C
?
generates a matrix with 6 rows, like the one shown in Expression 1. The row i represents the vector s
i
and has 14 components, with i=1,…,6.

We propose to use a summarised representation of the change in scale of the r metrics and between v
i
and v
i?1 by using the frequency distribution F
i
associated with the change in scale l
i
, which is defined in the following way:

Definition 4

Frequency distribution of the chase in scale (F
i
): it is an ordered list of the frequencies f
l
associated with the different change levels l in the vector l
i
.

Hence the frequency distribution F
i
can be used for describing different types of changes between two consecutive versions v
i?1 and v
i
with respect to the set of OQuaRE quality scores. Next, we define some associated statistics such as weighted means.

To avoid possible undefined values of the forward or backward means, we also use the size of the forward and backward changes defined as the numerator of the previous definitions, but considering absolute values |l| in backward mean changes. Now, Definition 7 provides the definition for the global mean change.

In our running example, the frequency distribution F
3 does not provide a determined finite value for the forward mean change, whereas the backward mean change is ?1 and the mean change is ?0.5. The sizes of the forward and backward changes are 0 and 2, respectively.

The value of the mean change can be interpreted as follows:

  • It takes a positive value when the forward mean change is greater than the backward one and negative when the opposite.

  • It becomes zero when forward and backward mean changes take equal and finite values.

  • It becomes zero if v
    i
    and v
    i?1 are identical. In this case forward and backward mean changes do not take a determined finite value (undefined value).

The mean change provides information about changes in quality scores. For analysing the number of metrics that have changed regardless of the direction of the change, we define next the concept magnitude of change.

In our example, the magnitude of change of version v
2 is 50 %. The largest number of metrics with changes happens in v
6 (see F
6 in Expression 3), having a magnitude of change of 100 %, but the mean change is 0.0. The major increase in quality scores happens in v
4 (see F
4 in Expression 3) with mean change 0.75.