26
Research Center for Advanced Studies from the National Polytechnic Institute Information Technology Laboratory Technical Report: Ontology learning from text: Method for learning axioms Ana Rios-Alvarado and Ivan Lopez-Arevalo CINVESTAV UNIDAD TAMAULIPAS. Parque Cient´ ıfico y Tecnol´ogico TECNOTAM – Km. 5.5 carretera Cd. Victoria-Soto La Marina. C.P. 87130 Cd. Victoria, Tamps. LTI-TR-2012-07 Cd. Victoria, Tamaulipas, Mexico. December, 2012

Ontology learning from text: Method for learning axiomsarios/docs/seminar... · Figure 1: Classi cation of ontology learning approaches methods and techniques used for building an

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Ontology learning from text: Method for learning axiomsarios/docs/seminar... · Figure 1: Classi cation of ontology learning approaches methods and techniques used for building an

Research Center for Advanced Studiesfrom the National Polytechnic Institute

Information Technology Laboratory

Technical Report:

Ontology learning from text:Method for learning axioms

Ana Rios-Alvarado and Ivan Lopez-Arevalo

CINVESTAV UNIDAD TAMAULIPAS. Parque Cientıfico y Tecnologico TECNOTAM – Km. 5.5

carretera Cd. Victoria-Soto La Marina. C.P. 87130 Cd. Victoria, Tamps.

LTI-TR-2012-07

Cd. Victoria, Tamaulipas, Mexico. December, 2012

Page 2: Ontology learning from text: Method for learning axiomsarios/docs/seminar... · Figure 1: Classi cation of ontology learning approaches methods and techniques used for building an
Page 3: Ontology learning from text: Method for learning axiomsarios/docs/seminar... · Figure 1: Classi cation of ontology learning approaches methods and techniques used for building an

Abstract:

Ontologies provide a structural organizational knowledge, they support the exchange and sharingof information. Moreover, one of the main benefits of using ontologies is the ability to infer newknowledge that allows the development of more realistic applications. The need for overcomingthe bottleneck, given in the knowledge acquisition by the manual construction of ontologies, hasmotivated studies on semi-automatic and automatic methods to build ontologies. One of the mainsources of knowledge created by humans is given by text resources. The analysis and extractionof the elements of an ontology from texts is a very hard task. In this report we focus on presentthe method for learning axioms from text based on named entity recognition. In our proposal weexploit corpora with high occurrence of named entities that give information on the individuals in aspecific domain knowledge expressed by the corpus. Given the set of identified named entities theaxiomatic relations such as subClassOf, disjointWith, and equivalentClass were identified. For thispurpose a named entity recognition tool was used and the linguistic context, where classes co-occur,was extracted. This report of activities corresponds to the thrid year of doctoral studies.

KEYWORDS: Ontology learning, axioms, named entity recognition

Corresponding author: Ana Rios-Alvarado <[email protected]> and Ivan Lopez-Arevalo<[email protected]>

© Copyright by CINVESTAV-Tamaulipas. All rigths reserved

Date of submission: December 5th, 2012

Ana B. Rios-Alvarado and Ivan Lopez-Arevalo. Ontology learning from text: Method for learningaxioms. CINVESTAV Tamaulipas. 2012 Dec. 20 pp. Technical Report No. LTI-TR-2012-07

Place and date of publication: Ciudad Victoria, Tamaulipas, MEXICO. December 5th, 2012

Page 4: Ontology learning from text: Method for learning axiomsarios/docs/seminar... · Figure 1: Classi cation of ontology learning approaches methods and techniques used for building an
Page 5: Ontology learning from text: Method for learning axiomsarios/docs/seminar... · Figure 1: Classi cation of ontology learning approaches methods and techniques used for building an

Contents

Contents i1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.3 Overview of this document . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 The method - Axiom learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 Experiments and Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

4.1 Named Entity Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94.2 Identification of instances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94.3 SameIndividualAs/differentFrom relation . . . . . . . . . . . . . . . . . . . . . . . . 114.4 SubClassOf relation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124.5 DisjointWith relation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134.6 EquivalentClass relation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

Bibliography 19

i

Page 6: Ontology learning from text: Method for learning axiomsarios/docs/seminar... · Figure 1: Classi cation of ontology learning approaches methods and techniques used for building an
Page 7: Ontology learning from text: Method for learning axiomsarios/docs/seminar... · Figure 1: Classi cation of ontology learning approaches methods and techniques used for building an

1 Introduction

The access to new technologies on the Web has allowed the creation of a huge amount of unstructured

information. This information currently comes in different formats, such as news, e-mails, blogs, tweets,

which significantly represents a source of collective expertise (know how). In order to store, retrieve, or

infer knowledge from this information, it is necessary represent it by using a conceptual structure. This

could be achieved by means of ontologies.

1.1 Background

According to Studer et al. [9], an ontology is a formal, explicit specification of a shared conceptualization.

Conceptualization refers to an abstract model of some phenomenon in the world. Explicit makes reference

to define the type of concepts used and the constraints of their use. Formal involves the fact that the

ontology should be machine-readable. Shared shows that an ontology captures consensual knowledge, that

is, it is accepted by a group of experts in the domain. Neches et al. [4] describe an ontology as an element

that it defines the basic terms and relations contained into the vocabulary of a topic area as well as the

rules for combining terms and relations to define extensions of a conceptualization. Staab et al. [8] define

an ontology as a formal description of concepts and relationships that can exist for a community of human

and/or machine agents.

The notion of ontologies is crucial for the purpose of enabling knowledge sharing and reuse. In WordNet1

(lexical database for English) appears the follow definition: “an ontology (in Computer Science) is a rigorous

and exhaustive organization of some knowledge domain that is usually hierarchical and contains all the

relevant entities and their relations”. Thus, an ontology should: 1) capture a shared understanding and 2)

enable logical inference on facts through axioms.

An ontology can be built in a manual manner through knowledge engineers and domain experts, resulting

in long and tedious development stages, and becoming a knowledge acquisition bottleneck [3]. That is the

reason why, nowadays an important research area is ontology learning. Ontology learning is the set of

1http://wordnet.princeton.edu/

1

Page 8: Ontology learning from text: Method for learning axiomsarios/docs/seminar... · Figure 1: Classi cation of ontology learning approaches methods and techniques used for building an

2 1. Introduction

Figure 1: Classification of ontology learning approaches

methods and techniques used for building an ontology from scratch, enriching, or adapting an existing

ontology in a semi-automatic fashion using several knowledge and information sources [5]. Shamsfard and

Barforoush [7] show an overview of classification of ontology learning approaches from different points of

view (see Figure 1). Maedche and Staab [3] consider the cycle of life in the building of an ontology and

claimed four parts in the ontology learning process: extract, prune, refine, import or reuse. Weng et al.

[15] emphasize in the extraction methods considering four categories: dictionary-based, text clustering,

association rules, and knowledge base. Cimiano [1] describes the process to build an ontology based on

the named cake model. The cake model considers building an ontology as overlay, where each layer

corresponds to a task that allows to get a component of the ontology. From the bottom to top layer is

organized as: terms, synonyms, concepts, concept hierarchy, relations, relation hierarchy, axiom schemata,

and general axioms. The methods that can perfom these tasks are classified into four groups based on [14]:

lexical-syntactic patterns, information extraction, machine learning, and co-ocurrence analysis. Techniques

of natural language proccesing are typically used for recognizing relevant terms and their relationships.

The text requires a processing phase, where tasks as: 1) extraction of plain text, 2) splitting of text into

sentences, 3) elimination of stopwords, 4) tagging of sentences, and 5) parsing of the sentences are applied.

In this context, text mining plays an important role for ontology learning. The text mining is the process

that allows to discover patterns and new knowledge for collections of text. In particular, text clustering

techniques are a good option for ontology learning because they can find relations between words that

Page 9: Ontology learning from text: Method for learning axiomsarios/docs/seminar... · Figure 1: Classi cation of ontology learning approaches methods and techniques used for building an

OL from text: Method for learning axioms 3

appears in different places into the text. In the case of taxonomy extraction task, some techniques based

on distribution of the information on the Web are a good option because they exploit the whole information

contained into the Web. Even though large number of methods for ontology learning have been proposed

and there are a big number of lightweight ontologies, one of the main challenges in ontology learning is

building more expressive ontologies which contain elements such as axioms.

An axiom is an assertion in a logical form that together comprise the overall theory that the ontology

describes in its domain of application. The acquisition of axiom schemata and general axioms are the tasks

which given the high level of expressiveness to an ontology and these elements make a big difference in an

ontology with respect to other models for representing knowledge. Some approaches in ontology learning

that address (automatic and semi-automatic way) the axiom extraction task have used techniques such as

pattern-based [2, 6], transforming rules based [11, 12, 13] or heuristic based [10]. These proposals have

shown that exist an association between lexical relations and axioms.

In this research work, ontology learning techniques are proposed and developed to automatically discover

the vocabulary, relations, and axioms from textual resources.

1.2 Goals

The main objective in this research work is obtain an approach for ontology learning getting the vocabulary,

taxonomic relationships, and key axioms from textual resources.

Thus, the particular goals of our research topic are:

� Obtain a clustering algorithm for get the vocabulary from text.

� Obtain a method to get taxonomic relations and propose a method to extract the taxonomic

relationships between concepts.

� Obtain a method to extract axioms from text and propose a method to learn the axiomatic relations

between classes and instances.

Particularly, this report focuses on present an approach for identify class expression axioms based on

named entity recognition from unstructured text.

Page 10: Ontology learning from text: Method for learning axiomsarios/docs/seminar... · Figure 1: Classi cation of ontology learning approaches methods and techniques used for building an

4 1. Introduction

For learning axiomatic relationships such as sameIndividualAs, differentFrom, subClassOf,

disjointWith, and equivalentClass, we propose take into account the evidence of named entities in domain-

specific text. Thus, several Named Entity Recognition (NER) tools will tested and the tool with the best

precision was selected. Thus, axiomatic relations can be established based on the named entities identified

by the best NER tool.

Figure 2: Example of ontology manually build. The instance level in an ontology characterises aclass3

Following a bottom up approach, the idea is first to identify individuals, which are instances of some

class. Such classes belong to a taxonomic structure, which is at the core of the ontology. Figure 2 shows

that the instance level corresponds to the leaves in a taxonomic tree structure and the class level to the

branches. The difference between one class and another is that its set of leaves is different and therefore

it can be characterised as a separate (disjoint) class, otherwise if the set of leaves is very similar, then it

can be characterised as an equivalent class. For example, in the instance level, the set of leaves for country

class includes France, Ireland, and Brazil instances, but the set of leaves for city class contains Brussels,

Iraklio, and Belfast instances. Then, country class and city class are disjoint (disjointWith(country,city)).

Thus, the collection of named entities provides the instances for a specific class, and defines a class in an

extensional manner. The proposed approach was implemented and tested with the input corpus of Tourism

domain.

Page 11: Ontology learning from text: Method for learning axiomsarios/docs/seminar... · Figure 1: Classi cation of ontology learning approaches methods and techniques used for building an

OL from text: Method for learning axioms 5

1.3 Overview of this document

The rest of the document is organized as follows, in Section 2 a brief description of the work related to

generation of axioms is given. Next, in Section 3 the approach and the method to identify class expression

axioms are described. Later, in the Section 4, the experiments carried out are presented. Finally, Section 5

gives some conclusions and the future work.

2 Related work

In order to provide a higher level of expressiveness to learned ontologies, several approaches have been

proposed for extending ontology learning tools. The first approaches were manual; frameworks and tools

such as Protege4, OntoEdit5, NEON6, and KAON7 allow adding axioms by users or domain knowledge

engineers. More recent approaches include some kind of automation in order to add axioms under the

evaluation and supervision of a knowledge expert. Approaches such as LExO [11], LEDA [13], and

ReLExO [12] use a sequence of linguistic analyzers. LExO starts by analyzing the syntactic structure

of an input sentence. The resulting dependency tree is transformed into a set of OWL axioms (concept

inclusion, transitivity, role inclusion, role assertions, concept assertions and individual equalities) by means

of manually engineered transformation rules. ReLExO supports the acquisition and refinement of complex

class descriptions in order to identify passages from text, which indicate the validity of certain knowledge.

Given that the text can contain inconsistency, LeDA allows the automatic generation of disjointness axioms

based on machine learning classification. The classifier, which determines disjointness for any given pair of

classes, is trained based on a gold standard baseline of disjoint axioms manually created.

In other cases, such as [2, 10], the methods are completely automated. In [2] an automatic axiom-

learning algorithm starts from a set of non-taxonomic relations. It uses the Web as corpus and linguistic

techniques based on text patterns and statistical analysis from the distribution of web information. In the

project YAGO [10], facts are extracted from the category system and the info-boxes of Wikipedia and they

are combined with taxonomic relations from WordNet.

4http://protege.stanford.edu/5http://www.semtalk.com/semnet files/POntoEdit.htm6http://www.neon-project.org/nw/Welcome to the NeOnP roject7http://kaon.semanticweb.org/

Page 12: Ontology learning from text: Method for learning axiomsarios/docs/seminar... · Figure 1: Classi cation of ontology learning approaches methods and techniques used for building an

6 3. The method - Axiom learning

3 The method - Axiom learning

The disjointness of classes guarantees that an individual, as member of one class, cannot be simultaneously

an instance of a specified other class. Similarly, the equivalence of classes is used to indicate that two

classes have precisely the same instances. The obtaining of instances for each class is a key step in the

identification of disjointness or equivalence relations.

The proposed method starts at the instance level, where a NER tool extracts the named entities from

text. Later, at the class level, each class has a set of instances associated with it that characterise it. The

NER tool provides a set of types (type/subtype) associated to each named entity. Using the type and the

linguistic context of each class, an axiomatic relation is identified. Figure 3 shows the general overview of

the proposed steps to extract axioms. This method consists of a bottom-up approach and it follows the

next steps:

1. Extraction of named entities. A NER tool obtains the named entities from text. The named entities

can correspond to one of the following types (defined by the tool): Person, Organization, State, City,

or Holiday among others. NER tools using Linked Data principles provide additional information

describing named entities identified in text. According to Linked Data8 principles, a unique global

identifier defines an entity. Such a de-referenced identifier provides useful information about the

corresponding resources and links to other relevant identifiers. NER tools such as AlchemyAPI and

OpenCalais exploit the Linked Data principles.

2. Identification of instances. The relations of type instanceOf(named entity, class) between a named

entity and a class are obtained by two methods: 1) the given type from the NER tool and 2) the

context where the named entity and its class co-occurs.

3. Building context. The sentences where a set of instances and its corresponding class occur are

grouped to determine if there exists a relation between the contexts of two classes. A part-of-speech

(POS) tagger and a syntactic parser are used to get the linguistic context (i.e., representative elements

such as nouns, verbs, or adjectives and their grammatical relations).

8T. Berners-Lee, Linked Data-Desing Issues, http://www.w3.org/DesingIssues/LinkedData.html (2006)

Page 13: Ontology learning from text: Method for learning axiomsarios/docs/seminar... · Figure 1: Classi cation of ontology learning approaches methods and techniques used for building an

OL from text: Method for learning axioms 7

The linguistic context supports the identification of relations based on entities used to derive one of the

following axioms: sameIndividualAs, differentFrom, subClassOf, disjointWith, or equivalentClass.

To illustrate the method, a set of sentences was considered. These sentences provide evidence for the

relation subClassOf between festival and event based on the lexical-pattern <is a> and the instanceOf

relation:

� In Wexford the November Opera Festival is an international event.

� The Elephanta Festival is a classical dance and music event on Elephanta Island usually held in

February.

� The Grenada National Museum in the center of town incorporates an old French barracks dating

from 1704.

The festival class does not share a context (a sentence) with museum, theatre or church. On the

contrary, the festival class occurs with the class time or event that indicates some relation between festival

and event. From the given sentences, with the proposed method, we can infer the following relations of the

festival class:

� instanceOf(November Opera Festival, festival),

� instanceOf(Elephanta Festival, festival),

� subClassOf(festival, event),

� disjointWith(festival, museum)

because: i) the named entities November Opera Festival and Elephanta Festival are instances of festival

class; ii) the event class is more general than festival class; iii) the festival and museum class have different

instances.

4 Experiments and Results

For the experiments, we used the Lonely Planet9 dataset, which consists of 1801 files about the Tourism

domain. It covers a list of 96 classes, 278 named entities, and taxonomy with 103 hierarchical relations

9http://www.cimiano.de/doku.php?id=olp

Page 14: Ontology learning from text: Method for learning axiomsarios/docs/seminar... · Figure 1: Classi cation of ontology learning approaches methods and techniques used for building an

8 4. Experiments and Results

Figure 3: Method for learning axioms

manually annotated. The measures used for the evaluation are: precision (Equation 1), recall (Equation 2),

and F-measure (Equation 3). The precision score is the result of dividing the amount of knowledge entities

retrieved and that are accepted by a human team by the total amount of knowledge entities retrieved.

The recall score is the result of dividing the amount of knowledge entities retrieved and that are accepted

by a human team by the total amount of knowledge entities contained into Lonely Planet dataset. The

F-measure score can be interpreted as a weighted average of the values corresponding to the precision and

recall.

Precision =CorrectlySelectedEntities

TotalSelectedEntities(1)

Recall =CorrectlySelectedEntities

TotalDomainEntities(2)

F =2 ∗ P ∗RP +R

(3)

Page 15: Ontology learning from text: Method for learning axiomsarios/docs/seminar... · Figure 1: Classi cation of ontology learning approaches methods and techniques used for building an

OL from text: Method for learning axioms 9

4.1 Named Entity Recognition

In the first step, we analyzed eight different NER tools. The aim was test the named entity recognition

tools for ontology population task, so the results retrieved for each tool were compared against a list of

named entities provided by a benchmark for ontology population. The purpose of this test was find the best

NER tool for the automatic process of axiom learning. The classical tools evaluated were: 1) OpenNLP, 2)

PythonNLTK, and 3) StanfordNER; the tools based on Linked Data were: 4) AlchemyAPI, 5) OpenCalais,

6) DBPedia Spotlight, 7) Zemanta, and 8) Extractiv which make use of Linked Data. NER tools such as

AlchemyAPI and OpenCalais exploit the Linked Data principles.

The named entities extraction from the second group of tools incorporates a solution for the

disambiguation problem in named entities detection by analysing the input content for detecting named

entities, assigning them a weighted type by a confidence score and by providing a list of URIs for

disambiguation. In addition, these tools are able to associate every entity to a type in a taxonomy of types.

Table 1 shows the results of evaluation based on precision, recall, and F-measure for the experiments; where

AlchemyAPI presents the best precision. The obtained named entities were compared against a list of 278

named entities annotated manually in a sample corpus with 30 files.

Tool Precision Recall F MeasureAlchemyAPI 0.6648 0.4512 0.5376OpenCalais 0.6384 0.4079 0.4977

StanfordNER 0.5478 0.6389 0.5900Zemanta 0.5404 0.4584 0.4960OpenNLP 0.4873 0.2279 0.3540

PythonNLTK 0.4853 0.9061 0.6346Extractiv 0.2767 0.5703 0.3726

DBpedia Spotlight 0.1168 0.4981 0.1893

Table 1: Evaluation of NER tools

4.2 Identification of instances

In this stage, the objective was to evaluate the identification of the instanceOf relation using AlchemyAPI

and OpenCalais, as these tools define a taxonomy of types. Table 2 presents the results of precision,

Page 16: Ontology learning from text: Method for learning axiomsarios/docs/seminar... · Figure 1: Classi cation of ontology learning approaches methods and techniques used for building an

10 4. Experiments and Results

recall, and F-measure values on 42 instanceOf relations that were manually annotated. According to

the evaluation, AlchemyAPI had better precision than OpenCalais in this task. More in detail, Table 3

presents the performance of AlchemyAPI and OpenCalais for the identification of instances belonging to

these classes: City, Continent, Country, Holiday, Person, Organization, and Region. The obtained results

were compared manually with 81 instanceOf relations from the Lonely Planet, where 12 correspond to City,

2 to Continent, 35 to Country, 10 to Holiday, 6 to Person, 2 to Organization, and 14 to Region. In most

cases, AlchemyAPI showed the best precision; only for the Person class OpenCalais had better precision

than AlchemyAPI.

Tool Precision Recall F MeasureAlchemyAPI 0.4667 0.1667 0.2456OpenCalais 0.4117 0.1667 0.2376

Table 2: Performance NER tools - identified instances

Class Tool Precision Recall F MeasureCity AlchemyAPI 0.4000 0.5000 0.4444

OpenCalais 0.3529 0.5000 0.4137Country AlchemyAPI 0.7631 0.8285 0.7945

OpenCalais 0.7000 0.8000 0.7486Holiday AlchemyAPI 0.4285 0.3000 0.3529

OpenCalais 0.4000 0.4000 0.4000Person AlchemyAPI 0.0667 0.3333 0.1111

OpenCalais 0.2000 0.1667 0.1818Organization AlchemyAPI 0.1000 0.5000 0.1667

OpenCalais - - -Region AlchemyAPI 0.4000 0.3333 0.1111

OpenCalais 0.2000 0.0714 0.1052

Table 3: Performance NER tools - identified instances by class

Using the context, it can be seen that, in some cases, instances of different classes appear in the same

sentence, i.e. they co-occur. For extracting relations, the linguistic context for each of the extracted named

entity was analysed. Examples of patterns that identify the instanceOf relation are:

� <NE> is a <NP>. Example 1: The Donia is a traditional music festival, it is held on Nosy Be in

May-June.

Page 17: Ontology learning from text: Method for learning axiomsarios/docs/seminar... · Figure 1: Classi cation of ontology learning approaches methods and techniques used for building an

OL from text: Method for learning axioms 11

� <NP>, <NE>. Example 2: Crete is Greece’s most southerly point, with its largest city, Iraklio,

situated in the middle of the north side of the island.

� <NE>: <NP>. Example 3: South Africa: the country offers everything from ostrich riding to the

world’s highest bungee jump!

� <NP> like <NE> , <NE>, ... and <NE>. Example 4: The usual Christian holidays like Easter

and Christmas are celebrated...

where NE is a named entity and NP is a noun phrase. In example 1, the Donia is an instance of the

festival class and the pattern associated is <NE is a NP>. In example 2, Iraklio is an instance of the city

class where the pattern is <NP, NE>. In example 3, South Africa is an instance of country and the pattern

is <NE: NP>. Finally, in example 4 Easter and Christmas are instances of holiday and the identified

pattern is <NP like NE> and a more general pattern is <NP like NE , NE, ... and NE>.

It is important to note that the context analysis could allow dealing with the problem of ambiguous

named entities. For example, with the Country class and Australia individual, if both elements co-occur in

the same sentence, then Australia as instance of Country is resolved, for example in the sentence A highly

developed country, Australia is the world’s 13th-largest economy?. Whereas, if Australia co-occurs with

the Continent class in the sentence: Australia is the smallest continent and it is also an island?, then the

instanceOf(Australia,Continent) relation is established.

4.3 SameIndividualAs/differentFrom relation

At the instance level, two (or, sometimes, more than two) different named entities identify the same resource.

Those named entities refer to the same individual, and so they can assign objects to the sameIndividualAs

constructor. Some examples of linguistic context where this relation occurs are:

� <NE> (<NE>). Example 1. Beit al-Sahel (Palace Museum) served as the Sultan’s residence until

1964 when the dynasty was overthrown.

� <NE> (also known as <NE>). Example 2: North-eastern Libya, the Jebel Akhdar area (also known

as the Green Mountains), is the most verdant and arguably the most beautiful part of the country.

Page 18: Ontology learning from text: Method for learning axiomsarios/docs/seminar... · Figure 1: Classi cation of ontology learning approaches methods and techniques used for building an

12 4. Experiments and Results

� <NE>, also called <NE>. Example 3: Dominica’s national bird, the Sisserou, also called the Imperial

Parrot, is about 20in (50cm) long when full grown, the largest of all Amazon parrots.

In example 1, the Sultan’s residence is named as Beit al-Sahel or Palace Museum. In example 2, the

North-eastern Libya area is also called Jebel Akhdar or Green Mountains. Finally, in example 3, Dominica’s

national bird corresponds to Sisserou and Imperial Parrot. The identified patterns are <NE (NE)>, <NE

(also known as NE)>, and <NE, also called NE>, respectively. On the contrary, when the sameIndividualAs

relation is not found between two or more named entities, then the differentFrom relation is established.

For example, in the example 2 the relation sameIndividualAs(Beit al-Sahel, Palace Museum) is found, as

well as differentFrom(Beit al-Sahel, Jebel Akhdar) relation is established.

4.4 SubClassOf relation

At the class level, the SubClassOf relation represents one of the main axioms, which structures the set

of classes into a taxonomy where a higher class is more general than a lower class. Hearst patterns

[13] are mostly used for extracting such subClassOf relationship. Other many approaches for learning

subClassOf relationships exploit hyponymy relationships from WordNet. However, as this approach has

been shown to be limited, we propose the use of NER as an additional approach for identifying subClassOf

relations in text. The NER tool used for this was AlchemyAPI because it shows best precision in

obtaining instances. AlchemyAPI obtains 15 types of instances and 54 subtypes on a sample corpus

with 30 files from the Lonely Planet corpus. Table 4 shows the types of instances, some examples of

the total of subtypes obtained for each of them and the number of correct subtype relations for each type.

For example Location, CityTown, River, BodyOfWater, AdministrativeDivision, TouristAttraction, Island,

Mountain, and Lake are correct subtypes of GeographicFeature type and MilitaryPerson, Actor, FilmActor,

Monarch, MemberOfParlament, OperaCharacter, and Politician are correct subtypes of Person type. In

other cases, the extracted subtype relation is not correct such as Saint for Person, or MeteorologicalService,

GeographicFeature, HumanLanguage, FilmDirector, FilmArtDirector, Organization, and CompanyDivision

as subtypes for Country. A team of humans was asked to evaluate all extracted subtype relation, which

gave a precision of 70.37% for the extracted relations based on AlchemyAPI identified subtypes-types. For

the complete Lonely Planet corpus the total number of classes identified was 441, which 415 subtype-type

relations were obtained. According to the human team evaluation, 222 relations were correct, which gave

Page 19: Ontology learning from text: Method for learning axiomsarios/docs/seminar... · Figure 1: Classi cation of ontology learning approaches methods and techniques used for building an

OL from text: Method for learning axioms 13

a precision of 47.22%.

Types Subtypes Total of Total ofSubtypes Corrects

Organization - 0 -GeographicFeature Location, Island, CityTown, BodyOfWater, . . . 9 9

Country Location, GovermentalJurisdiction, Kingdom, . . . 19 8City AdministrativeDivision, . . . 3 3

GovermentalJurisdiction. . . .Region Location 1 0Facility - 0 -Holiday - 0 -Sport - 0 -

Continent Location 1 0StateOrCounty Location, PoliticalDistrict, 5 4

AdministrativeDivision. . . .Company - 0 -

NaturalDisaster - 0 -Person MilitaryPerson, Actor, Monarch, Politician, . . . 8 7

HealthCondition DiseaseOrMedicalCondition, CauseOfDeath, . . . 8 7FieldTerminology - 0 -

Table 4: Types/Subtypes identified by AlchemyAPI in 30 files of “Lonely Planet”

4.5 DisjointWith relation

A disjointWith relation states that one class has not an instance member in common with another class.

For learning the disjoint relationship between two classes, we consider named entities that co-occur in the

same context. For each NER (class1, class2) duple, the list of instances was compared. If there is not a

common named entity between the two classes then the disjointWith(class1, class2) relation is established.

To illustrate the evaluation of disjointWith relation extraction, it was used a sample corpus with 30 files;

here 5 types of instances without overlap between their set of instances were obtained. Thus, a number

of 105 duples (class1, class2) were obtained. According to the evaluation of the human team, 88 of the

relationships correspond correctly to disjointWith (class1, class2) and the rest of them (17) have some other

relation. Figure 4 shows a fragment of the total of obtained duples and what duple has a disjoint relation

between class1 and class2. For example, the Region and Holiday classes are disjoint; as are the Country and

Page 20: Ontology learning from text: Method for learning axiomsarios/docs/seminar... · Figure 1: Classi cation of ontology learning approaches methods and techniques used for building an

14 4. Experiments and Results

Figure 4: Example of disjoint classes

Organization classes, Person and Facility classes, Country and Holiday classes, and the City and Holiday

classes. However, the Region and GeographicFeature are not necessary disjoint classes. Even although

according to NER results, the set of instances were very different between Region and GeographicFeature,

the classes meet in a subClassOf relation. The same case occurs with the Region and Country classes. As

a result, the precision was 83.80% for the learned disjoint relations.

Using the complete Lonely Planet corpus, a total of 325 disjointWith relations were identified, 299 of

those relations are identified as correct, which gave a precision of 92.0%.

4.6 EquivalentClass relation

The equivalentClass relation is established between two classes when the class descriptions include the

same set of individuals. It is important to mention that class equality means that the classes have

the same intensional meaning i.e. denote the same concept. For learning equivalentClass relation, two

ontologies were considered and for each ontology class its set of instances obtained by two different

NER tools were compared, if the set of instances between two different classes is highly similar then

an equivalentClass(class1, class2) relation can be established. Highly similar means that almost the total

of named entities detected by the NER tool is the same in both classes, that is because the identification

Page 21: Ontology learning from text: Method for learning axiomsarios/docs/seminar... · Figure 1: Classi cation of ontology learning approaches methods and techniques used for building an

OL from text: Method for learning axioms 15

Figure 5: Example of equivalent classes

of instances depends on the precision of the NER tool.

In this case, using the same sample corpus with 30 files, the AlchemyAPI and OpenCalais tools identify

15 and 17 classes, respectively. However, only 16 duples (AlchemyAPI:class1, OpenCalais:class2) of the

total (255) have overlap between their set of instances. According to the evaluation of the human team, 11

of the relationships correspond correctly to equivalentClass(class1, class2) and the rest of them have some

other relation. As a result, the precision was 68.75% for the learned equivalentClass relations.

Figure 5 shows some examples of duples. In the case of the classes as AlchemyAPI:Organization /

OpenCalais:Organization, AlchemyAPI:Country / OpenCalais:Country, AlchemyAPI:Sport /

OpenCalais:SportsGame, and AlchemyAPI:HealthCondition / OpenCalais:MedicalCondition can clearly be

determined a equivalence relationship between them. In contrast, even when other classes belong to different

ontologies and they have similar individuals; they are not in an equivalence relationship. For example,

the classes AlchemyAPI:Organization / OpenCalais:Company or AlchemyAPI:Person / OpenCalais:Holiday

which have similar individuals but they are not equivalent.

The complete Lonely Planet corpus was evaluated and the number of 21 duples equivalentClass(class1,

class2) were identified as correct by the human team, which gave a precision of 80.73%.

Table 5 shows the results of precision (%) for axiom extraction task. This evaluation was done for two

corpus: Tourism and Sport Event domain.

Page 22: Ontology learning from text: Method for learning axiomsarios/docs/seminar... · Figure 1: Classi cation of ontology learning approaches methods and techniques used for building an

16 5. Conclusions

Axiom Tourism Sport EventinstanceOf 46.67 23.52subClassOf 70.37 64.44

disjointWith 83.80 85.00equivalentClass 92.00 93.33

Table 5: Results of axiom extraction task

5 Conclusions

The main goal in this research is to obtain a method for ontology learning from textual resources in english

about a specific domain. This implies several challenges:

� obtain representative concepts

� find hierarchical relationships

� achieve a higher level of expressiveness (axioms)

The axiomatic relation learning, which represents an important task in ontology learning, is a very

hard task. In this report, an approach to discover axiomatic relations from text was described. The

approach is based on identifying named entities as class instances and comparing their textual context to

establish axiomatic relations such as sameIndividualAs, differentFrom, instanceOf subClassOf, disjointWith

and equivalentClass. New technologies in NER tools based on Linked Data can be useful in the process

of extracting axioms. From the total of tested tools, AlchemyAPI shows the better performance in the

identification of instances, as a consequence, it was selected for our purposes in the hypothesis verification

on learning axioms.

According to the experiments, we observed that the identified instances that belong to a specific class

could be considered as the extensional definition of this class and which is then described by the named

entities associated with it. However, the method must take into account the fact that the incorrect

identification of instances can derive erroneous disjoint or equivalence relations. For example, other relations

such as subClassOf and partOf were learned instead as a disjointWith relation. Such is the case of

Organization/Company and Region/Country classes that meet a subClassOf relation. Also, a specific

object property between Person and HealthCondition (Person has HealthCondition) was wrongly derived

Page 23: Ontology learning from text: Method for learning axiomsarios/docs/seminar... · Figure 1: Classi cation of ontology learning approaches methods and techniques used for building an

OL from text: Method for learning axioms 17

as a disjointness relation. In other case, AlchemyAPI:GeographicFeature / OpenCalais:NaturalFeature and

AlchemyAPI:Person / OpenCalais:Holiday classes that meet a subClassOf or disjointWith relation were

wrongly derived as equivalentClass relation.

Figure 6: Schedule of activities for doctoral studies

The main achieved activities were obtaining and implementating a technique to get axioms from texts.

The proposed approach was evaluated with two corpus: Tourism and SportEvent domain. According to the

schedule of activities (see Figure 6) these activities correspond to the third phase of our research work.

The state of the art about discovering axiomatic relations has been reported in one book chapter:

� Ana B. Rios-Alvarado, Ivan Lopez-Arevalo, and Victor Sosa-Sosa. “An overview on ontology learning

methods from textual resources towards the acquisition of axioms”, Innovative Ways of Knowledge

Representation and Management, Universidad de Medellın, Colombia, 2012.

Besides, the next phase in the doctoral studies considers:

� Adapt of method to extract relations as axioms

� Integrate the methods to obtain the ontology learning approach

� Write a thesis

� Submit a dissertation

Page 24: Ontology learning from text: Method for learning axiomsarios/docs/seminar... · Figure 1: Classi cation of ontology learning approaches methods and techniques used for building an
Page 25: Ontology learning from text: Method for learning axiomsarios/docs/seminar... · Figure 1: Classi cation of ontology learning approaches methods and techniques used for building an

Bibliography

[1] Philipp Cimiano. Ontology Learning and Population from Text: Algorithms, Evaluation and

Applications. Springer-Verlag New York, Inc., Secaucus, NJ, USA, 2006.

[2] Luis Del Vasto Terrientes, Antonio Moreno, and David Sanchez. Discovery of relation axioms from

the web. In Proceedings of the 4th international conference on Knowledge science, engineering and

management, KSEM’10, pages 222–233, Berlin, Heidelberg, 2010. Springer-Verlag.

[3] A. Maedche and S. Staab. Ontology learning for the semantic web. Intelligent Systems, IEEE,

16(2):72–79, mar-apr 2001.

[4] Robert Neches, Richard Fikes, Tim Finin, Tom Gruber, Ramesh Patil, Ted Senator, and William R.

Swartout. Enabling technology for knowledge sharing. AI Mag., 12:36–56, September 1991.

[5] David Sanchez Ruenes. Domain ontology learning from the web. PhD thesis, Universitat Politecnica

de Catalunya, 2007.

[6] Alexander Schutz and Paul Buitelaar. Relext: A tool for relation extraction from text in ontology

extension. In International Semantic Web Conference, pages 593–606, 2005.

[7] Mehrnoush Shamsfard and Ahmad Abdollahzadeh Barforoush. Learning ontologies from natural

language texts. Int. J. Hum.-Comput. Stud., 60:17–63, January 2004.

[8] Steffen Staab and Rudi Studer. Handbook on Ontologies. Springer Publishing Company, Incorporated,

2nd edition, 2009.

[9] Rudi Studer, V. Richard Benjamins, and Dieter Fensel. Knowledge engineering: principles and methods.

Data Knowl. Eng., 25:161–197, March 1998.

[10] F.M. Suchanek, G. Kasneci, and G. Weikum. Yago: A large ontology from wikipedia and wordnet.

Web Semantics: Science, Services and Agents on the World Wide Web, 6(3):203–217, 2008.

19

Page 26: Ontology learning from text: Method for learning axiomsarios/docs/seminar... · Figure 1: Classi cation of ontology learning approaches methods and techniques used for building an

20 BIBLIOGRAPHY

[11] Johanna Volker, Pascal Hitzler, and Philipp Cimiano. Acquisition of owl dl axioms from lexical

resources. In Enrico Franconi, Michael Kifer, and Wolfgang May, editors, The Semantic Web: Research

and Applications, volume 4519 of Lecture Notes in Computer Science, pages 670–685. Springer Berlin

/ Heidelberg, 2007. 10.1007/978-3-540-72667-8 47.

[12] Johanna Volker and Sebastian Rudolph. Lexico-logical acquisition of owl - dl axioms. In Raoul Medina

and Sergei Obiedkov, editors, Formal Concept Analysis, volume 4933 of Lecture Notes in Computer

Science, pages 62–77. Springer Berlin / Heidelberg, 2008. 10.1007/978-3-540-78137-0 5.

[13] Johanna Volker, Denny Vrandecic, York Sure, and Andreas Hotho. Learning disjointness. In

Proceedings of the 4th European conference on The Semantic Web: Research and Applications,

ESWC ’07, pages 175–189, Berlin, Heidelberg, 2007. Springer-Verlag.

[14] W. Wang, P.M. Barnaghi, and A. Bargiela. Probabilistic topic models for learning terminological

ontologies. IEEE Transactions on Knowledge and Data Engineering, 2009.

[15] Sung-Shun Weng, Hsine-Jen Tsai, Shang-Chia Liu, and Cheng-Hsin Hsu. Ontology construction for

information classification. Expert Systems with Applications, 31(1):1–12, 2006.