Skip to main content
SearchLoginLogin or Signup

Workshop notes: Federation and Open Knowledge Networks

Workshop on federating knowledge base construction

Published onMay 22, 2019
Workshop notes: Federation and Open Knowledge Networks
·

Workshop and submission details (from the conference site)

Provocations

Please answer these questions while listening to talks:

  • What original knowledge do you want others to use?

  • What refinements can you contribute upstream?

  • What conversations about models and approaches can you make visible?

  • What tools are you making that can augment human curation?

Schedule + notes

This is an editable document; feel free to add notes below

Opening remarks, welcome

The OKN framework and concept was launched this year in collaboration w/ NSF - we hope to form collaborations here among groups that want to contribute to such a network.

François Scharffe - Open Knowledge Embeddings (abstract)

Example OKN components: Datalift, 2010-14, mandated by France

Francois mentioned the KG workshop at Columbia - Juan’s trip report on that is up at :http://www.juansequeda.com/blog/2019/05/11/2019-knowledge-graph-conference-trip-report/

More on Knowledge Graph Embeddings: https://mnick.github.io/project/knowledge-graph-embeddings/

Looking for collaborators on graph embedding this summer; has some model problems to share that would make good next steps. RPI - McCusker, McGuinness are interested in particular for those problems (a phd student in my group, in particular, is interested)

Daniel Garijo - Towards an OKN of Reusable Software Descriptions

Focused on software attribution — we want to credit all sorts of work; software needs this keenly; and you need to know well your software tools to know how you understand + can reproduce data. (cf: Whole Tale, &c)

Say you find a github repo and docker instance. That’s not enough — I need other details for transformations. Data don’t map directly into the software, and need to be massaged in.

Consider harder reuse caess: dependencies, sample runs, invocations, folder context, defaults, volumes, logins? Capture semantic structure of invocation!

We started w/ the Model Metadata Registry (OntoSoft [pdf, website])
Can be extended to schema.org, data cubes, NASA QUDT, DockerPedia, SciVar Ontology.
Work done so far: Making a KG for software, via this registry. adding auto unit transforms, software image descriptions. Requirements??, Automated software composition - find out what software is available (with a possible data transformation).

If we generate new metadata, we can push updates to WD, other specialized metadata repositories. How far are we from OKN?
> Get licenses right! resources, metadata.
> Automate generation (now mostly manual) — build the botnets
> Decentralized creation
> Enrich prov traces
> Bettre interfaces for lay users? Faceted search?

Enterprise OKN: A Federated KG for Financial Data

Avoid a future recession or depression.
Degragment corporate knowledge bases, for research and policy analysis.

Ex: how do patent filings impact long-term profitability?
Which companies are most likely to cause the next fin crisis?

Looking at patents: entity recog, resolution, linking. using Dedupe.io

We’re looking for ways to capture sequences over time to help embed/classify items that change (companies, entities, terms). See a bonus from getting a few expert signals vs lots of weak ones. Combine w/ statistical (and weak?) models

OKN proposal: workshops for regulators+policymakers; tech contributors; corporate integration!

  • USPTO suggested sale of patents is relevant. Corps don’t have to tell you when they sell patents. How can we guess at this?

  • Beyond patents: could also look into R&D data, visa applications. [David C? has visa data]

  • People mined patent lit for emerging tech — FUSE (via a gov grant?) Worth following up. [Andrew also worked on this!]

FoodKG: A Semantic KG for Food Recommendation

bit.ly/akbc-foodkg (An RPI project using WhyIs)

https://foodkg.github.io/

  • Brings together Recipes, nutrition, food taxonomies

  • Links to existing ontologies

  • Straightforward to use, modular and reusable

  • Provenance of facts in FoodKG

Licensing is a bottleneck. In many cases, expensive (for a collab)
But interested in research collabs — check in w/ Chef App! and yummy.com and more. What sources are open? [dbpedia was too noisy]

—> checkout https://github.com/DHLab-nl/historical-recipe-web

break + stretch

Vicki Tardif - Challenges in KG Construction for Q & A

FreeBase —> GKG: 500M entities + 3B facts ('12) —> 1B/70B (‘19)

"How do we make computers as predictably inconsistent as people?” —> How long is Harry Potter? The book or movie? What units?

How do we interpret facets of a single “entity”?

How can we get better knowledge out of what is essentially a fact-graph?

Q: Don’t you find meaningful consensus? e.g. in Wikidata?
Q: Does the community propose schemas often? A: yes, it’s no longer just participant companies.
A: s.org is not particularly good at specialized schemas, this is the tension to resolve (communities often have well defined specialized needs).

Varun Embar - Collective Alignment of Large-scale Ontologies

4 Challenges:

  • Informally defined relations between classes and instances

  • Multiple textual representation

  • Different Semantics for same textual relationship

  • Ambiguity / Entity Resolution

Collective alignment using data and PSL. PSL gives a probability distribution over the set of all possible alignments.

James McCusker - A Provenance-Aware KG Framework for OKNs

(full author list: James P. McCusker, Sabbir M. Rashid, Nkechinyere Agu, Neha Keshan, Deborah L. McGuinness)

WhyIs - http://tetherless-world.github.io/whyis/

Importance of Provenance in KG

Deduction - Inference engine using SPARQL where & construct

bit.ly/get-whis

Systems bio graph : with bio interactions — we can infer a probability that an interaction occurs. That let us find 26 potential treatments for melanoma that were discarded b/c they weren’t working in the existing patients (but were specific to certain mutations) https://peerj.com/articles/cs-106/

Also computing trust score on network data for similar content.

Prov: capturing time of access? what does that look like?
A: creation for each nanopub helps.

Sam Klein - Federating Wikidata: enhancing current OKNs

http://listen.hatnote.com/

Separate underlays from overlays:

  • Capture observations & Assertions

Define shared services:

  • Disambiguation, resolution

  • Event feeds, decentralized archiving

  • Standard cloud services

Closing comments, next steps, thanks

Please share/link summaries of your own notes here!

Comments
0
comment
No comments here
Why not start the discussion?