Actions

Ontolog Forum

Knowledge Graph Definition Discussion

Paul Tyson

Jul 2, 2020 at 11:02 AM

How about simply:

“A knowledge graph (KG) is a directed labeled graph that is used to represent concepts and propositions.”

Ravi Sharma

Jul 2, 2020, at 11:09 AM

Paul

Yes getting closer but do "concepts and propositions" imply Knowledge Base or Knowledge representation?

John F. Sowa

Jul 2, 2020, at 11:24

Ravi,

The word 'knowledge' in "knowledge graph" is a meaningless buzz word that is intended to make it sound "knowledgeable". Philosophers used to define "knowledge' as "justified true belief". But even that phrase is problematical.

Recommended definition:

A knowledge graph (KG) is a directed labeled graph that is used to represent knowledge derived from various documents. A KG may be used at either of two levels: (1) factual level or (2) definitional level. A factual KG (level 1) represents facts derived from some set of documents. A definitional KG (level 2) represents definitions of the words or other symbols used in factual KGs. For software that uses Semantic Web tools. a factual KG may be mapped to or from RDF; a definitional KG may be mapped to or from RDF Schema (RDFS).

Following that definition, I suggest the diagram hexgon1.png (attached below). This diagram is discussed in Section 1 of http://jfsowa.com/talks/eswc.pdf . But in that section, the corner labeled "knowledge graph" is labeled "symbolic model". For simplicity, I suggest hexagon1 instead of the diagram in eswc.pdf. Ram extracted some information from the eswc slides, and I suggest that you start with the file that Ram produced.

John

Sriram, Ram D. (Fed)

Thursday, July 2, 2020 3:41 PM

As I mentioned, I would add add semantics to labels. For example, if A and B have a part-of relation to C then if I do something with A or B it will impact C in an appropriate manner (as defined by the semantics of part-of).

Ram

Janet Singer

Thursday, July 2, 2020 5:43 PM

Ravi, Paul, John, Mike, Matthew, Ed,

The problem (and opportunity) is that ‘knowledge graph’ has been loosely used as all three of 1) the form of presentation, 2) the organized content (knowledge base?) and 3) the physical storage mechanism. An implementation-neutral definition could focus on the first and second aspects, using KG and KB together:

“A knowledge graph is a labeled directed graph that presents data in a systematic form usable by an automated or human reasoner.

A large-scale knowledge graph draws data from a knowledge base (i.e., an organized, curated store of facts, definitions, rules, links to multimedia content, etc., derived from documents and other sources).”

Janet

Jack Hodges

On Thu, Jul 2, 2020

I have not been taking part in this discussion but suddenly see a bunch of definitions of knowledge graph being bandied about. I certainly agree that different people and interests have defined knowledge graph in ways that suits their interests, so an agreement on the definition is certainly needed.

It is also true that knowledge graph in the way it is being used is as a data graph. That said, and the way we have been using it in Siemens CT, is that the data are instances of ontologies, and the labeled arcs are the properties representing the class dependencies in an ontology.

As a result, our definition of a knowledge graph would be an instance graph associated with a collection of ontologies. That is, the data are all compliant with the associated ontologies. As such it is much more than just a labeled graph.

Respectfully submitted,

Jack Hodges

Pascal Hitzler

Jack,

Your terminology is what I use as well. Just that in some quarters I now resort to the term "knowledge graph schema" instead of ontology.

Pascal Hitzler

Krzysztof Janowicz

I am for:

"Knowledge Graph (KG): A combination of technologies, specifications, and data cultures for densely interconnecting (Web-scale) data across domains in a human and machine readable and reasonable way. The term knowledge graph itself does not prescribe any particular technology stack. More formally, a knowledge graph (as a set of statements) can be thought of as a node and edge labeled directed multigraph. The largest publicly available knowledge graph is the so-called Linked Data cloud based on the RDF/Semantic Web technology stack."

https://github.com/C-Accel2019/Glossary/blob/master/glossary.md

Jano

Krzysztof Janowicz

Jul 3, 2020, 12:25 PM

to ontology-summit Hi Even,

Glad you like the definition! I think it is important to make sure such definitions include both the formal as well as the cultural/contextual part.

> How should I attribute this definition?

You can point to the github glossary we did (https://github.com/C-Accel2019/Glossary/blob/master/glossary.md). I am pretty sure that I wrote (most of) the text for the KG entry. In this case, you can simply take the definition and use it without any attribution.

All the best, Jano


Wallace, Evan K. (Fed)

On 7/3/20 6:59 AM Jano,

This is very good and helpful. Much better than any definition for knowledge graph I have seen before (though we need a more concise version as well). My experience has been that folks tend to use Knowledge Graph for simplified representations of conceptual models, usually in the form of a computational ontology + instances, but they never define the term, so it comes across as just rebranding of something, without being clear what that something is. This is O.K. in the short term, but eventually people need to know what you are really doing or if there is any real shared meaning for the term “Knowledge Graph” (rather than simple a means of buzzword compliance). We should get the description below for KG out “there”, and try and get some community using it. I will bring into OMG Ontology subgroup discussions and send it to the Enterprise Knowledge Graph Foundation group spun off from the FIBO effort to see if they would find it useful. How should I attribute this definition?

-Evan

Alessandro Oltramari

Fri, Jul 3, 2:08 PM

to ontology-summit Hello Everyone. I endorse Jano's comprehensive definition, but I've always considered knowledge graphs being semantic networks "under steroids" (paraphrasing John Sowa's comments, if you want). For instance, I wouldn't call the RDF representation of WordNet a knowledge graph, rather a RDF (lexical) semantic network; ConceptNet is often considered (and was originally defined as) a semantic network, but these days you find it referred to as a knowledge graph in scientific papers. From an operational standpoint, Jack Hodges' account of what a knowledge graph is captures well the practitioner's viewpoint. My two cents, Alessandro

Ram D. Sriram

Let us see at the following part of  your definition 

More formally, a knowledge graph (as a set of statements) can be thought of as a node and edge labeled directed multigraph. Are you saying a KG involves multiple graphs. What is the role of the label. What makes it different from Semantic Nets that were in vogue in the 1980s? What about Judea Pearl type graphs?

Ram

Krzysztof Janowicz

Fri, Jul 3, 2:59 PM

to ontology-summit Dear Ram,


On 7/3/20 11:49 AM, 'Sriram, Ram D. (Fed)' via ontology-summit wrote:

> Janowicz

> Let us see at the following part of your definition

> More formally, a knowledge graph (as a set of statements) can be thought of as a node and edge labeled directed multigraph. Are you saying a KG involves multiple graphs.

No, I mean multigraph in the mathematical sense: https://mathworld.wolfram.com/Multigraph.html .

> What is the role of the label.

The edge labels are your predicate labels such as 'partOf'. The node labels can be class labels or labels of individuals.

> What makes it different from Semantic Nets that were in vogue in the 1980s? What about Judea Pearl type graphs?

All the other parts of the definition. Semantic Nets and KG are certainly highly related but in my reading SN work was less focused on having a formal and shared specification for the schema, i.e., the semantics of the used labels. Also, they did not really focus on Web-scale systems and all the important aspects this brings to the game. The same, btw, goes for the distinction between a 'network' and a 'graph'. It is often the communities that use these terms and the ways in which these communities bring about change using these terms that matter more than the formal differences between them.

Jano

Krzysztof Janowicz

Fri, Jul 3, 3:11 PM

On 7/3/20 11:08 AM, Alessandro Oltramari wrote:

> Hello Everyone.

> I endorse Jano's comprehensive definition,

Thanks a lot. IMHO, having these discussions using the github's issue tracker or change requests would be perfect as it would invite many outside contributors that do not subscribe to this list here. There have been several other teams from NSF's convergence accelerator that contributed to our initial definitions. Another great aspect of doing this over github is that it is easier to reach convergence as we would all 'own' (so to speak) the resulting definition. If people are jointly *invested* in something, they are often more *interested* in reaching agreement. I am adding Chaitan here as the new Track D of NSF's 2020 solicitation will also be closely aligned to knowledge graphs and semantics, and this means that many other researchers may have interest i contributing to these definitions early on.

> I wouldn't call the RDF representation of WordNet a knowledge graph

This is actually something that we struggled with as well. Our approach to this was to lead with the non-technological part of the definition. I saw others saying that KG have to be about real-world entities, but, IMHO, this is too narrow.

Jano

Janet Singer

Jul 3, 2020, 6:16 PM

Jano – Note that the Wolfram section on ‘multigraph’ you linked contains cautions against use due to ambiguity : (

The term multigraph refers to a graph in which multiple edges between nodes are either permitted (Harary 1994, p. 10; Gross and Yellen 1999, p. 4) or required (Skiena 1990, p. 89, Pemmaraju and Skiena 2003, p. 198; Zwillinger 2003, p. 220). West (2000, p. xiv) recommends avoiding the term altogether on the grounds of this ambiguity.

Some references require that multigraphs possess no graph loops (Harary 1994, p. 10; Gross and Yellen 1999, p. 4; Zwillinger 2003, p. 220), some explicitly allow them (Hartsfield and Ringel 1994, p. 7; Cormen et al. 2001, p. 89), and yet others do not include any explicit allowance or disallowance (Skiena 1990, p. 89; Gross and Yellen 1999, p. 351; Pemmaraju and Skiena 2003, p. 198). Worse still, Tutte (1998, p. 2) uses the term "multigraph" to mean a graph containing either loops or multiple edges.

As a result of these many ambiguities, use of the term "multigraph" should be deprecated, or at the very least used with extreme caution.

Janet

Krzysztof Janowicz

Fri, Jul 3, 6:31 PM

Thanks a lot, Janet. I meant the first part 'The term multigraph refers to a graph in which multiple edges between nodes are either permitted', compare also to Wikipedia's 'In mathematics, and more specifically in graph theory, a multigraph is a graph which is permitted to have multiple edges (also called parallel edges[1]), that is, edges that have the same end nodes. Thus two vertices may be connected by more than one edge.' I see the term used often these days in the literature. Maybe our definition could be rephrased as follows:

"A combination of technologies, specifications, and data cultures for densely interconnecting (Web-scale) data across domains in a human and machine readable and reasonable way. The term knowledge graph itself does not prescribe any particular technology stack. More formally, a knowledge graph (as a set of statements) can be thought of as a node and edge labeled directed multigraph, thereby allowing multiple, heterogeneous edges for the same nodes. The largest publicly available knowledge graph is the so-called Linked Data cloud based on the RDF/Semantic Web technology stack."

Thanks again, all the best, Jano

John F. Sowa

Jul 5, 2020, 12:29 AM

Jano and Janet,

I agree that the kind of information in the following paragraph shoud be mentioned in the communique:

Jano>"A combination of technologies, specifications, and data cultures for densely interconnecting (Web-scale) data across domains in a human and machine readable and reasonable way. The term knowledge graph itself does not prescribe any particular technology stack. More formally, a knowledge graph (as a set of statements) can be thought of as a node and edge labeled directed multigraph, thereby allowing multiple, heterogeneous edges for the same nodes. The largest publicly available knowledge graph is the so-called Linked Data cloud based on the RDF/Semantic Web technology stack."

But it confirms my preference for the simplest possible first line: "A knowledge graph (KG) is a notation for representing information derived from structured or unstructured sources,"

Next: "A KG may represent two kinds of information: (1) simple facts with no negations or Boolean operators other than conjunction (AND); (2) definitions of the words or symbols that may occur in factual KGs. A collection of one or more definitional KGs is called a KG schema."

Then: A discussion of the open-ended variety of notations that may be used to represent KGs for various purposes: (1) human readability in linear or graphic forms; (2) human writability with various aids for entering, checking, organizing, and finding the information; (3) various data structures that are optimized for storage, transmission, computation, security, interoperability, reliability, recovery...

Finallly, the open-ended variety of high-powered technology for generating, processing, and using the information in KGs.

John

Ravi Sharma

Sun, Jul 5, 4:46 AM

to ontology-summit John has provided us comprehensive sentence on definition of knowledge graph, If Janet and Ken agree, we will put it in What (current) and also lead to itat end of Whence. In the follow-on sentences we will take nuggets from various inputs from speakers, courses and last few days several emails. It is time to move on on these two aspects and agree on supplemental follow-on material next. Regards

Sjir Nijssen

Jul 5, 2020, 7:22 AM

to John, ontology-summit@googlegroups.com Indeed John has provided an excellent step towards a description for KG.

However I propose that the description is first validated with a representative example as a test before it is adopted.

I offer to provide what John calls Factual KGs and Definitional KGs for the Nobel Prize process and use this example to illustrate all the terms used by John and check that the set is powerful enough to describe the factual and definitional KGs of the Nobel Prize.

Regards

Sjir Nijssen

Ravi Sharma

Are you going to apply John's definition to the Nobel Prize Process? Assuming that you have access to the process, either documented text or BPMN (UML2) diagram that would mean

definitional inputs for KG,

then from various years and domains of Nobel Prize winners' selection data as instances you will create KGs as

Factual inputs for KGs

Even if you apply to a couple of domains such as Physics as factual KGs these would be valuable? Others please chime in. In the meantime we plan to proceed with these top level sentences and include results of your findings as use case for the communique' - right Ken and Janet? Regards

Krzysztof Janowicz

Hi Sjir,


Yes!


Definitions are easier to understand when reinforced by easy to understand examples.


We should pursue a very simple example that's self demonstrating.


Here are some examples that I use [1][2].


For maximum effect, you should install our Structured Data Sniffer Extension [3][4] to your browser as it reveals Structured Data Islands embedded in HTML using <script/>. The idea being, you simply click on a hyperlink that denotes a concept and then follow-your-nose across the massive LODCloud Knowledge Graph for additional insights.


Links:


[1] https://www.openlinksw.com/data/turtle/general/knowledge-graph-manifestation-turtle-jsonld.html -- An HTML document that's a Knowledge Graph in its own right


[2] https://www.openlinksw.com/data/turtle/general/covid-19-knowedge-graph.html -- A Knowledge Graph about COVID-19


[3] https://chrome.google.com/webstore/detail/openlink-structured-data/egdaiaihbdoiibopledjahjaihbmjhdj?hl=en -- Structured Data Browser Extension for Chrome and other Web Extensions compliant Browsers (Edge, Brave, Opera, and others)


[4] https://addons.mozilla.org/en-US/firefox/addon/openlink-structured-data-sniff/ -- Firefox


Kingsley

Kingsley Idehen

Jul 5, 2020, 3:18 PM

On 7/5/20 12:29 AM, John F. Sowa wrote:

"A combination of technologies, specifications, and data cultures for densely interconnecting (Web-scale) data across domains in a human and machine readable and reasonable way. The term knowledge graph itself does not prescribe any particular technology stack. More formally, a knowledge graph (as a set of statements) can be thought of as a node and edge labeled directed multigraph, thereby allowing multiple, heterogeneous edges for the same nodes. The largest publicly available knowledge graph is the so-called Linked Data cloud based on the RDF/Semantic Web technology stack."


Hi John,

Little name correction:

It is the Linked Open Data (LOD) Cloud :)

Yes, it is by far the largest publicly available Knowledge Graph on earth at the current time.

Links:

[1] https://medium.com/virtuoso-blog/what-is-the-linked-open-data-cloud-and-why-is-it-important-1901a7cb7b1f -- What is the Linked Open Data Cloud, and why is it important?

[2] https://medium.com/openlink-software-blog/what-is-dbpedia-and-why-is-it-important-d306b5324f90 -- What is DBpedia, and why is it important?

-- Regards,

Kingsley Idehen

Janet Singer

Jul 5, 2020, 5:40 PM

to ontology-summit Ravi — Given the many good ideas re KG definition and validation plus the suggestion that the discussion be opened up to the broader KG community, one path forward is:

1) This year’s Communiqué can use one or more definitions, but postpone presenting one summary sentence as ‘authoritative’ over the others;

2) The development of a well-vetted definition (or family of definitions) for KG(s) could be the starting issue for the 2021 Ontology Summit. Crafting and testing definitions to be (technically and culturally, broadly and deeply) rigorous, robust and relevant is arguably the core of ontology work, so this would fit well with the ‘methodology’ focus that has been mentioned a few times as a possible topic for 2021;

3) The Communiqué could proceed now without short-changing time to get requisite variety of input and community buy-in from opening up a broader discussion, validation tests of various proposals, etc.


Janet

Krzysztof Janowicz

Sun, Jul 5, 7:00 PM

to John, ontology-summit Dear John, Kingsley, all,

I tried to combine feedback from the past days in this novel version:

"Knowledge Graph (KG): A combination of technologies, specifications, and data cultures for densely interconnecting (Web-scale) data across domains in a human and machine readable and reasonable way. As a notation for representing statements derived from structured or unstructured sources, the term knowledge graph itself does not prescribe any particular technology stack. More formally, a knowledge graph (as a set of statements) is a node and edge labeled directed multigraph allowing multiple, heterogeneous edges for the same nodes. A collection of definitional statements specifying the meaning of the vocabulary used in a knowledge graph is called its (KG) schema. The largest publicly available knowledge graph is the so-called Linked Open Data cloud based on the RDF/Semantic Web technology stack."

Jano

Kingsley Idehen

Hi Jano,

The last sentence regarding LOD would be a little clearer if it stated as follows :

The largest publicly available Knowledge Graph is the Linked Open Data (LOD) Cloud comprising RDF sentence collections deployed using Linked Data Principles [1].

Linked Data is simply about:

[1] Naming anything unambiguously using a resolvable identifier (e.g., a hyperlink)

[2] Describing everything using sentences where the subject, predicate, and object (optionally) are denoted using a resolvable identifier.

The LOD Cloud is a massive collection of RDF sentences, from a variety of sources, deployed using Linked Data principles.

Links:

[1] https://www.w3.org/DesignIssues/LinkedData.html -- Linked Data Principles Design Issues doc by TimBL

[2] https://medium.com/openlink-software-blog/simple-linked-data-deployment-tutorial-a532e568c82f -- Simple Linked Data Deployment Tutorial

Kingsley

Krzysztof Janowicz

Sun, Jul 5, 8:12 PM

> The largest publicly available Knowledge Graph is the Linked Open Data (LOD) Cloud comprising RDF sentence collections deployed using Linked Data Principles [

Should we say RDF statements instead of sentences?

Jano

Doug Foxvog

John, Why do you insist on this restriction? If a kb is simply a "node and edge labeled directed multigraph, that allows nodes to represent sentences. Once that is allowed, then the sentence could have OR edges to other sentences or a NOT edge to another sentence.

A =NOT=> B =OR=> C =OR=> D

C =ISA=> giving

C =GIVER=> Juan

C =GIVEE=> Markka

C =GIFT=> BrooklynBridge

C =DATE=> 1987/6/5

D - some sentence

With concerns about completeness you could label OR edges by the number of disjuncts; in the above case using OR2 since it is an OR of two disjuncts.

-- doug foxvog

Azamat Abdoullaev

Mon, Jul 6, 5:16 AM

to ontology-summit Somehow missed such a hot discussion. As to the point. It is hardly any definition, extensional, intensional or ostensive; real or nominal; operational or theoretical. It looks like a commentary with some explanations, comments and definitions. Any good definition makes one simple statement of the essence of a thing, construct, term, or name. Thus to have some type of validity: construct, content, logical or criteria validity. It is like following examples: KG is a graphical representation of knowledge, facts, primitives, definitions, axioms, rules, laws, etc. KG is an extensive network of entity descriptions; KG is to acquire and integrate world's information into ontologies intelligently processed by human minds or machines... True definition must underpin all possible KG scope and applications: from mental schemas to enterprise KGs to bigtech KGs to Global KG. Again, OntoGraph might look more original than KG, such an overused buzzword.

Jack Hodges

Mon, Jul 6, 11:35 AM

to ontology-summit@googlegroups.com You guys, for whatever reason, are diving into the semantics underlying the facts in a knowledge graph. Jano's definition was trying to stay above that. I think we would all like to see Jano's definition be shorter but can it be and still be clear? For example, he could elaborate on what "reasonable" means and it would be even longer.

No disrespect intended here...

Jack

Krzysztof Janowicz

Mon, Jul 6, 12:27 PM

> You guys, for whatever reason, are diving into the semantics underlying the facts in a knowledge graph. Jano's definition was trying to stay above that. I think we would all like to see Jano's definition be shorter but can it be and still be clear? For example, he could elaborate on what "reasonable" means and it would be even longer.

Exactly.

The point of a definition (as compared to an encyclopedic entry) is to be brief and reach a broad audience. Below is the newest version including feedback from the past days, but in a more condensed form. More specifically, I managed to include John's text snippets without breaking the flow of the text.

"Knowledge Graph (KG): A combination of scalable technologies, specifications, and data cultures for representing densely interconnected statements derived from structured or unstructured sources across domains in a human and machine-readable and reasonable way. More formally, a knowledge graph (as a set of statements) forms a node and edge labeled directed multigraph allowing multiple, heterogeneous edges for the same nodes. A collection of definitional statements specifying the meaning of the knowledge graph 's vocabulary is called its (KG) schema. The largest publicly available knowledge graph is the so-called Linked Open Data cloud based on the RDF/Semantic Web technology stack."

Are we converging?

Jano

Kingsley Idehen

Hi Jano,

Statements is fine. At the end of the day, when you deploy RDF using Linked Data Principles you do have statements (parts of comprising terms) rather than sentences (parts of speech comprising words) in play :)

Entities denoted using words aren't implicitly connected to their connotation.

Entities denoted using terms are implicitly connected to their connotation.

Linked Data is all about entity denotation implicitly connected to connotation via indirection facilitated by a resolvable identifier (e.g, an HTTP URI).

Kingsley

Kingsley Idehen

TypoFix version:

Statements is fine.

At the end of the day, when you deploy RDF using Linked Data Principles you do have statements ("parts of speech" comprising terms) rather than sentences ("parts of speech" comprising words) in play :)

Entities denoted using words aren't implicitly connected to their connotation.

Entities denoted using terms are implicitly connected to their connotation.

Linked Data is all about entity denotation implicitly connected to connotation via indirection facilitated by a resolvable identifier (e.g, an HTTP URI).

Krzysztof Janowicz

Exactly. This is why I suggested to say statements instead of sentences. Also, and IMHO, key to the success of LOD is that we focus on statements and not facts (like in 'true' statements).

Jano

Kingsley Idehen

You can lookup a description (connotation) of whatever is denoted when you apply Linked Data principles to RDF statement collection deployment. Naturally, you can also deploy RDF without adhering to said principles, but at significant cost to discourse clarity and access etc..

Doug Foxvog

The term "knowledge graph" should be implementation independent. It should not be wedded to (restrictive) RDF sentences. (imho)

Krzysztof Janowicz

Jul 6, 2020, 2:20 PM

to ontology-summit, doug Dear Doug,

> The term "knowledge graph" should be implementation independent. It should not be wedded to (restrictive) RDF sentences. (imho)

I agree, check the definition. Previously it was:

"A combination of technologies, specifications, and data cultures for densely interconnecting (Web-scale) data across domains in a human and machine readable and reasonable way. The term knowledge graph itself does not prescribe any particular technology stack. More formally, a knowledge graph (as a set of statements) can be thought of as a node and edge labeled directed multigraph, thereby allowing multiple, heterogeneous edges for the same nodes. The largest publicly available knowledge graph is the so-called Linked Data cloud based on the RDF/Semantic Web technology stack."


So, we addressed the the implementation independence explicitly.


The new version still does so, but implicitly:


"Knowledge Graph (KG): A combination of scalable technologies, specifications, and data cultures for representing densely interconnected statements derived from structured or unstructured sources across domains in a human and machine-readable and reasonable way. More formally, a knowledge graph (as a set of statements) forms a node and edge labeled directed multigraph allowing multiple, heterogeneous edges for the same nodes. A collection of definitional statements specifying the meaning of the knowledge graph 's vocabulary is called its (KG) schema. The largest publicly available knowledge graph is the so-called Linked Open Data cloud based on the RDF/Semantic Web technology stack."

The only part that mentions RDF is the example sentence at the very end which talks bout one specific KG.

Best, Jano

Kingsley Idehen

Hi Jano,

To keep the notion of a Knowledge Graph distinct from RDF, which is very important, I would encourage you to consider:

"..The largest publicly available knowledge graph is the so-called Linked Open Data cloud based on statements deployed using Linked Data principles."

Why?

Due to the following:

[1] RDF isn't always deployed using Linked Data principles

[2] Graphs aren't always constructed with RDF in mind i.e., vertices (nodes) and edges (connections) aren't always denoted using IRIs (which is what RDF adds to the good old Entity-Attributes-Values paradigm, alongside its formalized semantics )

Ultimately, we need a definition that is generic, succinct, and uncontroversial (to the degree that is possible) :)


Kingsley

Krzysztof Janowicz

Here is the new version:

"Knowledge Graph (KG): A combination of technologies, specifications, and data cultures for densely interconnecting (Web-scale) data across domains in a human and machine readable and reasonable way. As a notation for representing statements derived from structured or unstructured sources, the term knowledge graph itself does not prescribe any particular technology stack. More formally, a knowledge graph (as a set of statements) is a node and edge labeled directed multigraph allowing multiple, heterogeneous edges for the same nodes. A collection of definitional statements specifying the meaning of the vocabulary used in a knowledge graph is called its (KG) schema. The largest publicly available knowledge graph is the so-called Linked Open Data cloud based on statements deployed using Linked Data principles."

Best, Jano

Janet Singer

Ravi — Validation of definition proposals is a great idea and there clearly is interest in this topic, but I think it should motivate next year’s summit rather than be tacked on to this year’s.

I suggested this in my 2:40 email to the group (copied here since Sjir wouldn’t have seen it), but responses so far have only continued the content debate.

Given the many good ideas re KG definition and validation plus the suggestion that the discussion be opened up to the broader KG community, one path forward is:

1) This year’s Communiqué can use one or more definitions, but postpone presenting one summary sentence as ‘authoritative’ over the others;

2) The development of a well-vetted definition (or family of definitions) for KG(s) could be the starting issue for the 2021 Ontology Summit. Crafting and testing definitions to be (technically and culturally, broadly and deeply) rigorous, robust and relevant is arguably the core of ontology work, so this would fit well with the ‘methodology’ focus that has been mentioned a few times as a possible topic for 2021;

3) The Communiqué could proceed now without short-changing time to get requisite variety of input and community buy-in from opening up a broader discussion, validation tests of various proposals, etc.


Janet

Kingsley Idehen

Hi Jano,

One little suggestion re (Web-scale), if I may:

"Knowledge Graph (KG): A combination of technologies, specifications, and data cultures for densely interconnecting (Web-scale) data across domains in a human and machine readable and reasonable way."

becomes:

Knowledge Graph (KG): A combination of technologies, specifications, and data cultures for densely interconnecting data across domains in a human and machine readable and reasonable way, that scales to the Web.

Why?

A lot of knowledge graphs are intranet-scale solely due to design for deployment behind enterprise firewalls :)

Kingsley

Krzysztof Janowicz

Thanks a lot Kingsley. I changed this by moving it to the first sentence in the version circulated today:

"...A combination of scalable technologies, specifications, and data cultures for representing densely interconnected statements derived from structured or unstructured sources across domains in a human and machine-readable and reasonable way. ..."

Does this work for you?

Jano

Matthew West

+1

Sjir Nijssen

To all interested in an actionable definition of KGs

In recent days there have been several good suggestions wrt KGs.

I propose to start with the proposal of John, go step by step forward using illlustrations and continuous validation.

John wrote:

Next: "A KG may represent two kinds of information: (1) simple facts with no negations or Boolean operators other than conjunction (AND); (2) definitions of the words or symbols that may occur in factual KGs. A collection of one or more definitional KGs is called a KG schema."

I like this distinction in factual and definitional KGs as it is very useful in practice.

I. Simple facts with no negation. Examples: Mozart was born in Austria. Verdi was born in Italy. I propose to use the term elementary to refer to such facts. It is a sentence with one verb and one or more instantiated variables.

II. simple facts with no … Boolean operators other than conjunction (AND); Examples: Mozart was born in 1756 and died in 1791. Verdi was born in 1813 and died in 1901. Proposed term for such facts: (nicely) combined fact.

III. (2) definitions of the words or symbols that may occur in factual KGs. A collection of one or more definitional KGs is called a KG schema."

Above we have the following patterns of fact types: <famous composer> was born in <country> and <famous composer> was born in <year-born> and died in <year-died>. Between < …> is a variable. Besides fact types we need definitions of

a. famous composer,

b. country

c. was born in

d. was born in and died in.

In a definitional KG we need also data quality or integrity rules: Examples of such rules: Each famous composer has exactly one birth country. Each famous composer has exacly one birthyear and may have exactly one deathyear. The deathyear of a famous composer is at least 10 years after his or her birthyear. Proposal: use the term data quality and integrity rules as synonyms.

Let us find out later which other constructs are needed in definitional KGs if any.

The proposal by John: A collection of one or more definitional KGs is called a KG schema. I believe this has many interesting aspects when combining existing KGs, something that occurs often in practice.

Let me stop here for this increment. Vafrious other aspects need to come in the next increments as suggested by the various contributors in recent days.

I welcome feedback.

Kind regards

Sjir Nijssen

Doug Foxvog

The below suggested definition starts with a no-definitional description of properties of the technology. It then "more formally" gives a definition. Is an individual Knowledge Graph actually

> "A combination of scalable technologies, specifications, and data cultures "?

I would say not. Rather, a KG may use such a combination.

I suggest starting with the definition, following that by a description of properties. E.g., "Knowledge Graph - A representation of a set of statements in the form of a node and edge labeled directed multigraph allowing multiple, heterogeneous edges for the same nodes. A Knowlege Graph may use a combination of scalable technologies, specifications, and data cultures for representing densely interconnected statements derived from structured or unstructured sources across domains in a human and machine-readable and reasonable way.

A collection of definitional statements specifying the meaning of the knowledge graph 's vocabulary is called its (KG) schema. The largest publicly available knowledge graph is the so-called Linked Open Data cloud based on the RDF/Semantic Web technology stack."

-- doug foxvog

Kingsley Idehen

On 7/6/20 8:57 PM, Krzysztof Janowicz wrote:

> Thanks a lot Kingsley. I changed this by moving it to the first sentence in the version circulated today:

> "...A combination of scalable technologies, specifications, and data cultures for representing densely interconnected statements derived from structured or unstructured sources across domains in a human and machine-readable and reasonable way. ..."

> Does this work for you?

> Jano

You might have pasted the wrong excerpt.

Please paste the entire revised paragraph for clarity.

Kingsley

Ravi Sharma

Janet and Ken I am in agreement with you that we should not be too far from what happened, after all we are reporting what happened and will use these additional thoughts and inputs towards detailed papers that are also being contemplated by you later in the yea. I have a comment on next year's topic. While I in particular like continuing ithKG related summit for 2021 but I think we need wider (our forum) community support and selection of next year's topic, I am open. Only pain is that while this year's and even the definitions that are being suggested move around respective authors speakers and colleagues point of view, non other than John have so emphatically included ontology in the WHat part of KG! For example even the latest from LOD statement ignored ontology. A related point is overlooking the importance of how knowledge is embedded in KG and does it become useful only through Inference and Reasoning (Query) or it is combination of how nodes and edges use vocabularies and derive information from data as well as the operators on data to bring value in reults from KGs. I will start editing and circulating to our smaller group the Whence, current and a bit about problems and whither but I am afraid time is not available till after July 15. Thanks Ravi

Kingsley Idehen

Hi Sjir,

Regarding:

"Above we have the following patterns of fact types: <famous composer> was born in <country> and <famous composer> was born in <year-born> and died in <year-died>. Between < …> is a variable."

How about: Above we have the following patterns of fact types: ?FamousComposer was born in ?Country and ?FamousComposer was born in ?BirthYear and died in ?DeathYear. Where "?" indicates a variable.

Why? It makes the role of a variable a little more obvious without distraction.

This is also consistent with a broadly used declarative query language (i.e., SPARQL) for operating on Knowledge Graphs constructed from RDF statements i.e., it creates consistent flow as the subject-matter expands.


Kingsley

Krzysztof Janowicz

Tue, Jul 7, 2:04 PM

On 7/7/20 5:47 AM, doug foxvog wrote:

> The below suggested definition starts with a no-definitional description of properties of the technology. It then "more formally" gives a definition. Is an individual Knowledge Graph actually

>> "A combination of scalable technologies, specifications, and data cultures "?

> I would say not. Rather, a KG may use such a combination.

> I suggest starting with the definition, following that by a description of properties. E.g., "Knowledge Graph - A representation of a set of statements in the form of a node and edge labeled directed multigraph allowing multiple, heterogeneous edges for the same nodes. A Knowlege Graph may use a combination of scalable technologies, specifications, and data cultures for representing densely interconnected statements derived from structured or unstructured sources across domains in a human and machine-readable and reasonable way. A collection of definitional statements specifying the meaning of the knowledge graph 's vocabulary is called its (KG) schema. The largest publicly available knowledge graph is the so-called Linked Open Data cloud based on the RDF/Semantic Web technology stack."


Thanks Doug,

> I suggest starting with the definition, following that by a description of properties. E.g., "Knowledge Graph - A representation of a set of statements in the form of a node and edge labeled directed multigraph allowing multiple, heterogeneous edges for the same nodes. A Knowlege Graph may use a combination of scalable technologies, specifications, and data cultures for representing densely interconnected statements derived from structured or unstructured sources across domains in a human and machine-readable and reasonable way. A collection of definitional statements specifying the meaning of the knowledge graph 's vocabulary is called its (KG) schema. The largest publicly available knowledge graph is the so-called Linked Open Data cloud based on the RDF/Semantic Web technology stack."

I actually like this a lot. I will update the text (as it has changes slightly based on other feedback). It looks like we are converging towards a final version.

Ed Barkmeyer

Sun, Jul 12, 12:37 AM

to ontology-summit@googlegroups.com

Because this contribution is about textual notations rather than graphical notations, it is a bit off-topic, and I would have preferred to take the following points offline.

Without regard to notation, Sjir is trying to identify the basic elements of knowledge that can be conveyed, regardless of expression form.

>Above we have the following patterns of fact types:

<famous composer> was born in <country> and

<famous composer> was born in <year-born> and died in <year-died>. Between < …> is a variable.


SBVR refers to the above bracketed expressions as “placeholders”, and I agree with Sjir that they do represent “variables”, in one sense. I would have said, again a la SBVR, that they represent “roles” in a “verb concept” aka “fact type”. OTOH, I can also agree that they represent “arguments” to a “predicate”. Regardless of what we call it, and how it is represented, it is a basic feature of KGs, and indeed of pretty much any mechanism for representing ‘relationships’, ‘attributes’, and ‘simple facts’ in concept systems.


I would point out that a concept like: <person> is present in <location> at


That said, I do have a problem with this example:


> d. was born in and died in


As used in the example, this is not a ‘fact type’, but rather a conjunction of two fact types:

(person) was born in (calendar year)

(person) died in (calendar year)

In stating the fact, Sjir used an English *convention* to eliminate the subject from the second conjugend: [Mozart] died in 1791. That is a natural language trick, which carries over the subject of the first fact to be the implied subject of the second fact. Yes, this is nit-picking, but we don’t want to muddle the concept ‘fact type’ or ‘verb concept’ with a dubious example. Elements should be elementary.


I reply to Kingsley’s response, because ?FamousComposer or ?calendarYear is a typical formal language approach to concept representation, as are Sjir’s brackets or the parentheses above. The point of the placeholder expressions is not to name the variable, but rather to clarify the class of the object playing the role that the placeholder represents (again, per SBVR). In KG terms, it identifies the class symbol to which the verb concept symbol is linked. And we may further note that in many KGs the class of the role/argument provides a namespace for the interpretation of the verb concept symbol. That is, the same verb term with a different subject class may have an entirely unrelated interpretation.


Note also that disallowing spaces in names/terms is another formal language convention, and Sjir clearly did not intend that. The doubly delimited placeholder <famous composer> is notationally comparable to the predicate ‘was born in’, which Kingsley did not spell ‘wasBornIn’ or ‘was_born_in’, and there is even less reason for KGs to use such a convention, since terms are typically delimited by the graphical notation conventions. (People often use formal language naming conventions, because they think of rendering the knowledge model directly into some implementation language. Or, as my colleague Jim Odell once observed, “software people tend to forget that Analysis and Design are two different activities.”)


-Ed

John F. Sowa

Ed, Sjir, Ravi, Kingsley, ...

This discussion of SBVR shows the power of hype and buzz in selling technology. Many of us have devoted years to issues of interoperability by standardizing notations, formats, schemas, logics, ontologies, frameworks, patterns...

Then a hot new buzz word comes along, and the people with money jump on it. They pump new money into the same old, same old without any regard for the previous 50 years of work.

I have no vested interest in KGs as a buzz word. But I believe that it's better to leap to the front of any herd of buzzers and try to steer them than to nip at their heels.

PS: For anybody who hasn't been around for the past 50 years, or even the past 20 years, see https://en.wikipedia.org/wiki/Semantics_of_Business_Vocabulary_and_Business_Rules

Ed> Without regard to notation, Sjir is trying to identify the basic elements of knowledge that can be conveyed, regardless of expression form.

SJir> we have the following patterns of fact types: <famous composer> was born in <country> and <famous composer> was born in <year-born> and died in <year-died>. Between < …> is a variable.

Ed> SBVR refers to the above bracketed expressions as “placeholders”, and I agree with Sjir that they do represent “variables”, in one sense. I would have said, again a la SBVR, that they represent “roles” in a “verb concept” aka “fact type”. OTOH, I can also agree that they represent “arguments” to a “predicate”. Regardless of what we call it, and how it is represented, it is a basic feature of KGs, and indeed of pretty much any mechanism for representing ‘relationships’, ‘attributes’, and ‘simple facts’ in concept systems.

All this discussion is the main reason why I like DOL. It is based on solid math & logic, and nobody with any money has a clue about what it means. Therefore, my proposed strategy is to slip it into the foundation of any KG policy. The people who don't know math or logic won't notice it. But anybody who knows what they're doing can use it as a place to stand when they push the lever that moves the earth.

Current versions of KGs are at the level of RDF and RDFS. But the biggest and grandest implementations use a huge amount of additional technology to process the KGs. But none of that technology is standardized. Sooner or later, people will start talking about standardizing it.

That's when the DOL foundation can be unleashed. It's already an OMG standard. It not only defines the logic for business rules, it also has a software stack of open-source tools for them. And it can link to an open-ended variety of other logic-based tools -- or even legacy systems for which the logic and ontology are implicit.

John

Janet Singer

Sun, Jul 12, 8:01 PM

to ontology-summit John,

Great – this is an insightful, historically grounded vision for the trajectory of KGs that can guide the ‘Whence’ section in the Communiqué. The goal there is to sketch the contextual elements of the 50-year history in a way that supports (or at a minimum avoids obstructing) future developments. DOL is a good lodestar for that.

The next 90% of effort outside of the Whence section is in the details of framing, e.g., a pithy general definition, explanatory text/notes, and examples for various audiences. But the vision should make working on those details easier, whether now in the other Communiqué sections or as part of next year’s Summit.

Janet

Ravi Sharma

Mon, Jul 13, 12:31 AM

Janet Fully agree with you except last sentence where I agree only to extent of future direction of KGs but again we are assuming that KG related subject will be topic of next year's summit? What do you have in mind? I also fully agree with you that various scholars are dwelling only on What is KG where the Communique should at least report on what was discussed and presented over past ONE YEAR. I will send my inputs, as assigned, after 15th July. Regards Ravi

Alex Shkotin

Wed, Jul 15, 8:16 AM

John,

It looks like a fragment of interesting discussion outside of ontolog-forum google group. Did you get any progress on the KG definition? Todd Schneider has had an idea to put the definition to Whither part of Communique.

Alex

John F. Sowa

Alex> Did you get any progress on the KG definition?

The DOL standard says everything that can be said precisely: it shows how the many versions of logic are related to Common Logic and to each other.

For the semantics of any particular KG, there is only one serious question: Which of those logics represents its semantics?

For the many versions of KGs that have been implemented, the term 'knowledge graph' or the abbreviation 'KG' must be defined in the same way as typical words in any dictionary for ordinary language: provide a list of word senses for each of the many variations, each of which is defined by some logic listed in the DOL standard.

John

That is all that can be said about definitions. After that, the report can discuss various implementations in the past, the present, and possibilities for the future.

John

Alex Shkotin

Link to online document that I downloaded as whither.odt.

Michael DeBellis

Wed, Jul 15, 4:38 PM

This is something I've experienced recently and I thought it was such a perfect example of how hype dominates the business side of IT. For a long time I thought "knowledge graph" referred to some proprietary graph db technology that was similar to RDF/RDFS. But while there is another standard that Amazon (in Neptune, their hosted knowledge graph service) and some others support, the majority of the work (at least based on my limited experience) done in this area is all with the same technologies as the Semantic Web: RDF/RDFS, SPARQL, OWL, and SHACL (not much SWRL which is too bad because SWRL is awesome but I think it just doesn't scale for large graphs). People probably know this but those answer boxes you get in Google are done using knowledge graphs and using the W3C standard technologies.

For the longest time the colleagues I still am in touch with in the business world had no interest at all in the Semantic Web and I had kind of given up on it having a big business impact which I thought was a shame. But Google and Facebook embrace it and call it knowledge graphs and suddenly there is lots of interest. There is a good book on real world applications called The Knowledge Graph Cookbook: https://www.poolparty.biz/the-knowledge-graph-cookbook/ I've also attached an ACM article that is a good overview of some of the commercial applications. Ultimately, I don't care what people call it, just happy the ideas are finally starting to make an inroad.

Michael

Alex Shkotin

Jul 16, 2020, 4:20 AM

Michael,

Interesting links. KG looks like a response of industry to SW;-) compare Codd's R-algebra and SQL. And look here https://link.springer.com/chapter/10.1007%2F978-3-030-37439-6_1 "This section attempts to derive a definition for Knowledge Graphs by compiling existing definitions made in the literature and considering the distinctive characteristics of previous efforts for tackling the data integration challenge we are facing today." With amazing conclusion "Summing up the discussion we could state that Knowledge Graphs are very large semantic nets that integrate various and heterogeneous information sources to represent knowledge about certain domains of discourse."

Alex

John F. Sowa

Thu, Jul 16, 12:44 PM

Michael.

I again agree with your concerns, and I like the ACM article you included in your note. But this is just one more reason why logic, not notation, is essential for interoperability.

MDB> For the longest time the colleagues I still am in touch with in the business world had no interest at all in the Semantic Web and I had kind of given up on it having a big business impact which I thought was a shame. But Google and Facebook embrace it and call it knowledge graphs and suddenly there is lots of interest.

But Google only embraces the Semantic Web for exchanging *data* that can be expressed in RDF and RDFS. Those are the two simplest logics supported by DOL. In your previous note, you mentioned SWRL. None of the KGs used by Google and Facebook can express SWRL or any other rule-based logics. The technology they use to process the RDF and RDFS goes far beyond anything available for the Semantic Web.

On June 3, I presented a keynote address at ESWC20 -- that is the European Semantic Web Conference, for which the E is also used to represent *Extended*. The organizers and most of the particpants in ESWC recognize that the tools developed in 2005 are too limited. A huge amount of R & D in AI has been done in the past 15 years. And the notations of 2005 cannot begin to deal with it.

For my talk on June 3, I presented 34 slides. Since then, I extended my slides to address more issues that came up in the Q/A sessions in my talk and other talks that were presented. See http://jfsowa.com/talks/eswc.pdf .

Section 1 of eswc.pdf summarizes the issues. Section 2 describes DOL and includes the references. Sections 3 to 6 discuss issues that were not addressed in 2005. And Section 7 discusses current R & D projects that go far beyond the current SW tools.

Motto for the future: Before you can sell the solution (logic), you have to convince people that they have a problem (inadequate notations). That was the goal of my talk.

John

Michael DeBellis

Jul 16, 2020, 1:57 PM

John, Interesting, thanks. On another thread I asked you for refs to DOL because I hadn't seen this comment yet. I will check out your presentation. I agree with everything you said, the only thing I might quibble with is that I always expect there to be a gap between the ideal technical solution and what is currently embraced by industry. I still remember in the late 80's and early 90's as a consultant how difficult it could be to sell IT groups on OOP.

Michael

John F. Sowa

Michael,

MDB> I always expect there to be a gap between the ideal technical solution and what is currently embraced by industry. I still remember in the late 80's and early 90's as a consultant how difficult it could be to sell IT groups on OOP.

General principle: "If you want people to cooperate, make cooperation the path of least resistance."

In eswc.pdf, I discussed ways of making logic-based tools the path of least resistance:

1. Provide tools that enable people to get started on the right path while continuing to use their preferred notations as long as possible. DOL helps by defining conversion paths and providing free, open-source software that does the conversions.

2. Make the logic-based notations easier to learn. CLIP is a notation for Common Logic that is easily readable and writable for the RDF, RDFS, and OWL subsets of logic.

3. Proving that hand-coded specifications are consistent is difficult or even undecidable. But it's much easier to design automated and semi-automated tools that generate specifications that are guaranteed to be consistent. See section 7 of http://jfsowa.com/talks/eswc.pdf .

4. Take advantage of improved AI methods for processing language to provide better help and explanations for software developers. John

Kingsley Idehen

Mon, Jul 20, 4:01 PM

On 7/12/20 9:31 AM, John F. Sowa wrote:

> Ed, Sjir, Ravi, Kingsley, ...

> This discussion of SBVR shows the power of hype and buzz in selling technology. Many of us have devoted years to issues of interoperability by standardizing notations, formats, schemas, logics, ontologies, frameworks, patterns...

> Then a hot new buzz word comes along, and the people with money jump on it. They pump new money into the same old, same old without any regard for the previous 50 years of work.

> I have no vested interest in KGs as a buzz word. But I believe that it's better to leap to the front of any herd of buzzers and try to steer them than to nip at their heels.

> PS: For anybody who hasn't been around for the past 50 years, or even the past 20 years, see https://en.wikipedia.org/wiki/Semantics_of_Business_Vocabulary_and_Business_Rules

> Ed> Without regard to notation, Sjir is trying to identify the basic elements of knowledge that can be conveyed, regardless of expression form.

> SJir> we have the following patterns of fact types: <famous composer> was born in <country> and <famous composer> was born in <year-born> and died in <year-died>. Between < …> is a variable.

> Ed> SBVR refers to the above bracketed expressions as “placeholders”, and I agree with Sjir that they do represent “variables”, in one sense. I would have said, again a la SBVR, that they represent “roles” in a “verb concept” aka “fact type”. OTOH, I can also agree that they represent “arguments” to a “predicate”. Regardless of what we call it, and how it is represented, it is a basic feature of KGs, and indeed of pretty much any mechanism for representing ‘relationships’, ‘attributes’, and ‘simple facts’ in concept systems.

> All this discussion is the main reason why I like DOL. It is based on solid math & logic, and nobody with any money has a clue about what it means. Therefore, my proposed strategy is to slip it into the foundation of any KG policy. The people who don't know math or logic won't notice it. But anybody who knows what they're doing can use it as a place to stand when they push the lever that moves the earth.

> Current versions of KGs are at the level of RDF and RDFS. But the biggest and grandest implementations use a huge amount of additional technology to process the KGs. But none of that technology is standardized. Sooner or later, people will start talking about standardizing it.

> That's when the DOL foundation can be unleashed. It's already an OMG standard. It not only defines the logic for business rules, it also has a software stack of open-source tools for them. And it can link to an open-ended variety of other logic-based tools -- or even legacy systems for which the logic and ontology are implicit.

> John


Hi John and others,

I've recently published an article about Linked Data, Ontologies, and Knowledge Graphs [1].

Goal:

A collection of simple definitions accompanied by use-case and instance demonstrations.

Links:

[1] https://www.linkedin.com/pulse/linked-data-ontologies-knowledge-graphs-kingsley-uyi-idehen/ -- LinkedIn Edition

[2] https://medium.com/virtuoso-blog/linked-data-ontologies-and-knowledge-graphs-a3d0ad6d6f66 -- Medium Edition