A common grievance of most enterprises is that while data is abundant, there is not enough knowledge. Data is the symbolic representation of the observable properties of real-world entities and, on its own, yields limited practical value. Knowledge, on the other hand, is ‘meaningful data’ created through cognitive processing mechanisms. Generating actionable knowledge from raw data is a complex exercise fraught with multiple challenges. This is where Knowledge Graphs can play an important role, especially in this age of Big Data and Artificial Intelligence.
[siteorigin_widget class=”thinkup_builder_divider”][/siteorigin_widget]
Introduction to Knowledge Graphs
Knowledge Graphs are a special type of information networks that generally represent knowledge through a set of three entities in the form of RDF (Resource Description Framework)-based Semantic Triples (h, r, t). Here, ‘h’ refers to a head entity or subject, ‘t’ to a tail entity or object, and ‘p’ to the predicate or the relationship between the head and tail entities. For instance, “Shakespeare is the author of Macbeth” can be represented as (Shakespeare, authorOf, Macbeth). Knowledge, broadly implemented as multi-relational data, is represented through directed graphs where the nodes represent real-world entities, and the edges represent the relationships among those entities.
Knowledge Graphs are not new, but the focus on them significantly increased after Google announced its Knowledge Graph to enhance the search experience of users. Cyc is another well-known example, and is often regarded as the longest-running AI project. Other well-known examples include DBpedia, DeepDive, FIBO, Facebook Entities, Freebase, Knowledge Vault, NELL, Open IE, Satori, WebChild, WordNet, Wikidata and YAGO. While the expression of Knowledge Graphs is limited to triples in practical use, more complex forms of expression are also being developed. Some researchers further categorize Knowledge Graphs as Data Graphs, Information Graphs, and Wisdom Graphs to highlight the progressive growth of knowledge creation/capture.
Knowledge Graphs are deployed for a wide array of use cases today – building improved recommendation engines; developing greater context in conversational AI platforms; encoding domain knowledge in expert systems; enhancing enterprise data assets by adding new information and integrating fragmented data; harmonizing data from diverse sources; increasing AI explainability in predictive modeling; identifying comprehensive fraud and security-related patterns; and so on.
The Relevance of Knowledge Graphs in Artificial Intelligence
Machine Learning and other AI technologies are powerful and disruptive, but they suffer from four key limitations (among several others) in their current forms.
- Despite recent advances in AI technologies, many AI systems still do not adequately address the point of ‘data context’. Data assets, even in high volumes, have limited value without their context. As a result, actual knowledge creation from data is still a challenge in many applications.
- Modern AI use cases are generally complex in nature, and cannot be addressed through traditional (e.g., simple or linear) models. The creation of complex learning systems necessitates the development of complex models – this requires advanced levels of expertise that may not always be available to organizations. Moreover, increasing the model complexity also creates the risk of model instability.
- AI models, particularly the ones based on Deep Learning architectures, require significant amounts of data to get trained and optimized. This over-reliance on data is a big impediment in AI development as many critical use cases cannot be efficiently addressed due to a lack of adequate training data.
- The lack of explainability is becoming a major operational bottleneck for many AI applications. While there are techniques available to address this issue ‘to some extent’, there is still a long way to go.
Knowledge Graphs offer the potential to address the above limitations, partly or fully. For instance, they can enhance the quality of data by making it more context-based; reduce the need for complex models by making simple models more effective through improved data representation; augment sparse training data with additional information; and enable higher explainability by augmenting the quality of model inference.
Furthermore, four elements are critical to the development of efficient AI systems: Learning, Knowledge Representation, Reasoning and Memory. While Learning is primarily the domain of Machine Learning, Knowledge Graphs offer a compelling option to address Knowledge Representation, Reasoning, and Long-Term Memory Development. Complex tasks like active learning, bi-directional information extraction, data embeddings and transfer learning benefit from the knowledge that is encoded within Knowledge Graphs.
Key AI-centered Applications of Knowledge Graphs
- Recommender Systems: Knowledge Graph-based embedding approaches learn the semantic representations of entities (e.g., items and users) and the relationships among them to accurately understand user preferences for different items, and produce high-quality recommendations.
- Computer Vision: Knowledge Graphs enable modern object detectors to leverage external knowledge instead of merely relying on features that are detected within images. This leads to improved optimization and higher semantic consistency – for instance, understanding the semantic relationships among objects significantly improves the ability to understand video scenes. Other use cases include interaction detection, region classification, semantic segmentation, and visual reasoning.
- Natural Language Processing (NLP): Knowledge Graphs are increasingly used in developing NLP solutions, especially for dialog management, question-answering, and text classification use cases. Examples include AMR-to-text generation, encoding-decoding questions, event and relation extraction, multi-hop reading comprehension, neural machine translation, relational reasoning, semantic labeling, sentiment classification, etc.
- Anomaly Detection: Several industries (e.g. finance, healthcare & technology) have started to leverage Knowledge Graphs for anomaly detection applications, such as accounting inefficiencies, credit card/insurance fraud, data center monitoring, false advertising, network intrusion detection, rare event analysis, supply chain leakages, tax evasion, video surveillance, etc.
- Information Search & Retrieval: Knowledge Graphs enable intelligent discovery of online information, such as product searches. Their ability to capture data context enables systems to go beyond the limitations of traditional search and retrieval, and provide a ‘more conversational-type experience’ to a ‘search’ user.
Knowledge Graph Architecture
Knowledge Graphs are generally ontology-driven with two primary approaches to constructing them.
- Top-Down Approach, where the ontology and schema are first defined, and the knowledge instances are then added into the knowledge base.
- Bottom-Up Approach, where the knowledge instances are extracted from different sources, fused together and then the ontology is developed.
While the Top-Down approach is easier to implement, it cannot effectively address most of the use cases of today, especially those that involve Big Data or where the data structures are not always known in advance. The Bottom-Up approach has greater applicability – here, the knowledge construction happens through an iterative process of Knowledge Extraction (Entity, Attribute, Relation) and Knowledge Fusion (Data Integration, Ontology Construction). Below is a modified version of the architecture proposed by Zhao et al, 2018.
A Generic Knowledge-Graph Architecture
Knowledge Graphs are best stored using graph databases, particularly those that possess context-based search capabilities and can enable efficient data mining. Examples of graph databases include Neo4J, Neptune, AnzoGraph and JanusGraph.
[siteorigin_widget class=”thinkup_builder_divider”][/siteorigin_widget]
Current Challenges and Future Direction
While Knowledge Graphs are powerful tools for representing knowledge with an expression mechanism that is close to natural language, there are several challenges to designing and developing them. Some of these challenges are highlighted below, along with the future direction of research and development in this field.
1. Complex Input Streams and Graph Neural Networks
Machine Learning is primarily designed for input data in the form of flat arrays, grids and simple sequences. However, when data is represented as graphs (e.g. geographical maps, molecular structures, social networks, web-link data, etc.), they often possess complex topological structures. Traditional algorithms and neural network architectures cannot optimally process these kinds of complex inputs.
Graph Neural Networks emerged as a solution for the complex input problem, and has witnessed reasonable progress in the past few years. However, there are several challenges with Graph Neural Networks. For instance, Node Classification and Link Prediction are two critical aspects of graph networks, but these are highly complex tasks that often suffer from technical limitations. [Node Classification aims at determining the labels of nodes based on information about other labeled nodes and the network topology. Link Prediction aims at predicting missing links or identifying new links that may emerge in the future.]
2. Knowledge Extraction and Fusion Challenges
Extraction of knowledge from huge or diverse corpora of unstructured and semi-structured real-world data is a complex task. Advanced semi-supervised and unsupervised algorithms need to be deployed. Furthermore, there are practical challenges in addressing the issue of ‘tail data’, i.e. the data related to less-used or less popular entities and attributes.
The primary challenges in Knowledge Fusion are (i) to integrate the knowledge extracted from multiple sources using multiple extractors, and (ii) to correct the erroneous data that gets cascaded from the original sources as well as remediate errors that get generated during the extraction process. While both are complex tasks, the latter is much more difficult. Addressing the issue of missing knowledge links is a major challenge.
3. Issues in Quality and Accuracy Evaluation
Evaluating the quality and accuracy of graph construction and operationalization are major challenges today. First of all, there is no holistic evaluation metric that captures all the critical assessment needs. Most metrics are related to these key factors – availability, interlinking, performance and security. Other important factors like scalability and maintainability are sometimes ignored. Additionally, some metrics have inherent design flaws, such as over-reliance on human judgment for evaluation.
Moreover, manual evaluation of Knowledge Graphs is very expensive and not feasible for most use cases. In such situations, the norm is to conduct sample accuracy or quality tests, and extrapolate the results of these tests over the entire data. However, the selected sample may not consistently represent the population due to several reasons, such as data or graph complexity. Finally, it is often observed that error detection is not followed by error correction – this beats the purpose of having evaluations in the first place.
4. Addressing the Non-Static Nature of Knowledge
Most Knowledge Graphs store data at a specific point-in-time, and cannot efficiently address dynamic and fast-evolving knowledge assets. Firstly, defining and standardizing the concept of evolution itself is a challenge. Secondly, capturing and managing the provenance of high-volume data are complex tasks. Thirdly, there are several technical limitations to achieving changes in schema or type systems without creating inconsistencies with existing knowledge. Finally, computation and storage limitations also come in the way of creating dynamically evolving graphs that can represent real-world systems.
5. Other Challenges
Apart from the above, there are several other challenges in designing, developing and deploying Knowledge Graphs. Some of them are mentioned below.
- Practical challenges in entity disambiguation where the objective is to assign unique normalized identities and appropriate types such that entities are linked to their own facts, and not to those of others with similar surface forms.
- Designing multi-lingual knowledge systems has its own challenges, particularly when there are variations in how different cultures conceptualize real-world entities and their inter-relationships.
- Security and privacy issues in the case ‘personalized’ Knowledge Graphs, or when the graph deployment/implementation is on Edge devices.
- Absence of standard graph construction frameworks and universally agreed best practices.
R&D and Future Direction
Graph Neural Network is a major focus area of AI R&D today. Significant emphasis is on four classes of architectures: (i) Graph-based Autoencoders, (ii) Convolutional Graph Networks (Spectral-based or Spatial-based), (iii) Recurrent Graph Networks (Echo State, Gated, LSTM, etc.) and (iv) Spatio-temporal Graph Networks. Several frameworks, such as Message Parsing Network, are being developed to integrate different models into a singular model framework. Graph networks often have shallow structures because deeper networks tend to cause over-smoothing, and efforts are in progress to overcome this limitation.
Significant attention is paid to dynamic graph network design, extraction-error remediation, missing link prediction, node classification, graph network partitioning and simplification. Moreover, graph embedding techniques are shifting away from the classical factorization methods and random walk approaches towards neural network-based architectures. Another area of focus is symbolic reasoning to achieve better generalization and interpretability through the generation of additional information from existing facts.
In my opinion, the future direction in this field will involve improvements and new development around three broad themes:
- Innovations in Knowledge Extraction and Integration/Fusion – for instance, dynamic extraction of entities and relationships; designing neural networks for heterogeneous graphs; integration of multi-modal entities that evolve over time; enabling temporal and domain context-awareness throughout the graph construction process; etc.
- Enhancing the scalability, operational-ability and production performance of graph networks and storage systems for building and maintaining Enterprise Knowledge Graphs (such as those operating on distributed or Big Data streaming architectures) on an industrial scale.
- Improving the generalizability and causal inference capabilities of Knowledge Graphs for highly complex use cases, such as hidden patterns in dark data, anomaly attribution (i.e. root cause of detected anomalies), etc.
Closing Comments
Artificial Intelligence, as a set of multiple disruptive technologies, has started to massively influence and transform businesses, societies and personal lives. Major progress is witnessed in several AI domains such as Computer Audition & Vision, Decision Sciences, Natural Language Processing, and Motion & Manipulation. However, limited progress has been made in an important aspect of intelligence – the ability to generalize the learning from one domain (or experience), and efficiently apply it to another.
Structured representations, intelligent data extraction & fusion, and dynamic computational processing are critical to addressing the above gap. Hence, innovations in Knowledge Representation Learning or Graph Embedding, Graph Neural Networks and Graph-centric Databases, in conjunction with massively parallel processing systems, become increasingly relevant. Heterogeneous information, dynamic processing, higher semantics, rich ontologies, and greater reasoning capabilities enable the development of complex AI systems with high context and knowledge awareness.
In the foreseeable future, Knowledge Graphs are expected to continue to be explored as a means of connecting diverse units of information in a unified global space. In the longer-term future, they will likely serve as a means of enabling combinatorial generalization by generating new inferences and actions from existing blocks of knowledge. This will be the next step towards advanced machine cognition and human-level general intelligence.
Acknowledgments:
Architecture of Knowledge Graph Construction Techniques – Zhao et al (2018), International Journal of Pure and Applied Mathematics
Industry-scale knowledge graphs: lessons and challenges – Noy et al (2019), Communications of the ACM
Knowledge Graphs: New Directions for Knowledge Representation on the Semantic Web – Bonatti et al (2018), Dagstuhl Seminar