Workshop: Provocations for OKN development

Published onJul 31, 2019
What original knowledge do you want others to use?

Indicate which project you’re part of after each item

  • Layers of analysed metadata extracted + inferred from patents — KFG, III

  • Metadata + inferences from scholarship + citation graphs — Scholia, SemScholar

  • Public identifiers (and a list of multi-identifiers) for entities — Wikidata, GRID

  • ..

What refinements can you contribute upstream?

Your sources may welcome corrections/refinement. Imagine there is a global event stream for these updates, and you don’t have to negotiate with sources about whether or how to accept your refinements — just share that they happened, let curators find them & apply them at some point in the future.

  • Article disambiguation: Article names, authors, titles — from fatcat, MAG;

  • Patent disambiguation: from III, Lens;

  • Citation graph refinement — from MAG, SemSchol?;

  • Share embeddings and related libraries

  • Publish preprocessed Wikipedia data : extend

What conversations about models, audiences, frames can you make visible?

This can be as part of provenance of a collection, or as hooks / spaces for conversation among parties.

  • Property crosswalks: LD4L + Wikidata — property discussions on Wikidata

  • Authority file crosswalks: LOC + Wikidata — via LOC ID addition to Wikidata (as of May 20) [but conversations are the exception]

  • Bot/script evaluation for AKBC: Wikidata/WP bot approval discussions (address false positive issues + ways to resolve them)

What tools + services could be shared across the network, to augment humans + machines?

And what's needed to integrate these into existing popular workflows or world models?

  • Tool templates for high-volume human curation: Toolforge (Wikidata)

  • Embedding services: (OKE/Data Chefs)

    • model embedding with learned vocabulary,

    • compute embedding over a dataset, or over the entire OKN

    • publish + store embedding : an archival store, a reference and index, so it may be found and consumed

  • An OKN search index? web crawling + indexing infra.

    • Include Common Crawl or better. Currently very unwieldy

      • Current monthly dumps are ~ ok.

    • Connect wiki draft spaces (e.g. for autogenerated articles!)

    • Consider a federated space for articles!

  • Build a fair-use + -nc friendly draft space? for glams + research on common crawl &c.

Potential next steps:

+Write up OKN improvements / from different perspectives
+Write to ML enthusiasts from the conf, to get feedback on what formats would make their work easier
++ Getting more data into UCI? (via N.Monath)
+ Tehink about how to get these data slices and services into popular tools
List challenges: it’s fine to use shared idents. When you start talking about similar things you’ll have arguments over standard representations…

