A method for exploring implicit concept relatedness in biomedical knowledge network

Most existing biomedical knowledge repositories can be classified into two categories: non-structural (e.g. research papers) and structural (e.g. semantic network, knowledge graph, ontologies, etc.). Research on knowledge representation and discovery with these two types of knowledge has been making encouraging progresses in recent years.

Non-structural biomedical knowledge discovery Literature is a main form of non-structural knowledge, such as research publications, clinical guidelines, clinical trials, and reports of case studies. Increasing efforts have been made to extract various types of disease-related knowledge from these relatively unstructural materials. Liu and Hu [9] developed a distant supervised model to extract gene expression relationship between genes and brain regions from literature. Marwah et al. [10] implemented a context-specific Bayesian framework for computing functional relationships as links between ontologies, based on the statistics of co-occurrence of terms in the literature. Xu et al.’s work [11] focuses on extracting disease-manifestation relationships from the literature, while De la Iglesia et al. [12] deal with ontology concept extraction in the context of classification of clinical trial information. According to Seyfang et al. [13] and Isern et al. [14], ontologies can be developed to represent formal guidelines. Cheng et al. [15] have also made progress in establishing semantic associations among disease related databases to provide a more global view of human diseases.

Semantic network and semantic web Semantic Network [16] is a network representing knowledge in terms of concepts and their semantic relations. WordNet [17] is one of well-known examples of semantic network. Non-Axiomatic Reasoning System (NARS) [18] also represents knowledge in the form of network. Semantic Web [19], on the other hand, provides a common framework over the Web for knowledge sharing and reuse across applications, enterprises, and community boundaries. Chen et al. [20] conduct fruitful research on semantic web based biomedical data analysis.

Knowledge graph Knowledge Graph (KG) is a representational model proposed by Google to capture and graphically represent the semantics of real-world entities and their relationships [21], which supports more informative keyword based search. A number of knowledge graphs have been built, such as YAGO [22], DBpedia [23], NELL [24], Freebase [25]. Efforts have been made to build biomedical knowledge bases in the form of KG [26].

Artificial Neural Network Artificial Neural Network and Deep Learning have made a significant leap in the performance of AI systems. For example, the 152-layer neural network developed by Microsoft Research Asia achieves an error rate of 3.57 % on the test set of ImageNet [27]. Recently, AlphaGo, a computer Go program that uses ‘value networks’ to evaluate board positions and ‘policy networks’ to select moves, defeated the human world Go champion [28]. However, the cover story of a recently published Science Magazine pointed out that, people learning new concepts can often generalize successfully from just a single example based on already learned knowledge, yet machine learning algorithms typically require tens or hundreds of examples to perform with similar accuracy [29].

Biomedical knowledge representation and discovery in ontology Ontology is a main form of structural knowledge system, and a formal, explicit specification of shared conceptualization [30]. It’s main function is sharing and reuse of knowledge [31]. Many biomedical ontology systems have been built such as Gene Ontology [4], Disease Ontology [5], Human Phenotype Ontology [32], Environment Ontology [33], Protein Ontology [34], etc. Mohammed et al. [35] align the Diseases Ontology with the Symptoms Ontology by exploring links between diseases and symptoms. Concepts in these and other biomedical ontologies are organized primarily using hierarchical “is-a” relationships, while other valuable relationships such as “may-have-complication” and “may-have-side-effect” are mostly missing for they are usually weak and statistical in nature. For knowledge reuse, techniques like ontology mapping [36] and ontology alignment [37] enable us to bridge different biomedical ontologies by identifying concepts that share the same meaning. Some research studies ontology systems for a specific domain by applying network structure analysis. Wang et al. [38] propose a Network Ontology Analysis (NOA) method to perform gene ontology enrichment analysis on biological networks. Weng and Chang [39] apply the technique of ontology network analysis to document recommendations. Other studies, like Chen [8] and Liu et al. [40], analyze ontology networks by applying methods developed for complex networks or social network.

The above mentioned research and many other similar studies in structural knowledge representation and discovery are mostly focusing on the development of new biomedical knowledge systems and improvement of the existing ones. These systems to date remain independent or even isolated from one another. Furthermore, most existing works with multiple ontologies are exploring direct and explicit relationships between concepts by mapping and integrating different ontologies. Much less attention has been paid to the development of a unified knowledge representation framework linking semantically all biomedical knowledge ontologies. In recent years, work on integrating different knowledge repositories (both structural and non-structural) to explore indirect relatedness between concepts starts emerging. Corinna Vehlow et al. [41] developed a method to visualize and analysis of existing knowledge (from databases and the literature) and experimental data together in a network model. Spangler and Han et al. [42, 43] focus on mining relevance between heterogeneous biomedical entities from literature. These studies mostly use statistical methods (e.g. co-occurrence) to explore relationships between concepts. However, important relationships between concepts sometimes can only be revealed by examining indirect relatedness.

In this paper, a network based biomedical knowledge representation framework and a corresponding computational model are proposed to address the issue of implicit relatedness computing.