Phone: (+61 8) 6488 2839
Learning Lightweight Ontologies from Text across Different Domains using the Web as Background Knowledge
The ability to provide abstractions of documents in the form of important concepts and their relations is a key asset, not only for bootstrapping the Semantic Web, but also for relieving us from the pressure of information overload. At present, the only viable solution for arriving at these abstractions is manual curation. In this research, ontology learning techniques are used to automatically discover terms, concepts and relations from text in documents. Ontology learning techniques rely on extensive background knowledge, ranging from unstructured data such as text corpora, to structured data such as a semantic lexicon. Manually-curated background knowledge is a scarce resource for many domains and languages, and the effort and cost required to keep the resource abreast of time is often high. More importantly, the size and coverage of manually-curated background knowledge is often inadequate to meet the requirements of most ontology learning techniques. This thesis investigates the use of the Web as the sole source of dynamic background knowledge across all phases of ontology learning for constructing term clouds (i.e. visual depictions of terms) and lightweight ontologies from documents. To appreciate the significance of term clouds and lightweight ontologies, a system for ontology-assisted document skimming and scanning is developed.