Validating Shapes Survey

SHACL and ShEx in the Wild: A Community Survey on Validating Shapes Generation and Adoption

Abstact

Knowledge Graphs (KGs) are widely used to represent heterogeneous domain knowledge on the Web and within organizations. Various methods exist to manage KGs and ensure the quality of their data. Among these, the Shapes Constraint Language (SHACL) and the Shapes Expression Language (ShEx) are the two state-of-the-art languages to define validating shapes for KGs. Since the usage of these constraint languages has recently increased, new needs arose. One such need is to enable the efficient generation of these shapes. Yet, since these languages are relatively new, we witness a lack of understanding of how they are effectively employed for existing KGs. Therefore, in this work, we answer How validating shapes are being generated and adopted? Our contribution is threefold. First, we conducted a community survey to analyze the needs of users (both from industry and academia) generating validating shapes. Then, we cross-referenced our results with an extensive survey of the existing tools and their features. Finally, we investigated how existing automatic shape extraction approaches work in practice on real, large KGs. Our analysis shows the need for developing semi-automatic methods that can help users generate shapes from large KGs.

Cite

Rabbani, Kashif; Lissandrini, Matteo; and Hose, Katja. SHACL and ShEx in the Wild: A Community Survey on Validating Shapes Generation and Adoption. In Companion Proceedings of the Web Conference 2022 (WWW'22 Companion), April 25-29 2022, Virtual Event, Lyon, France. ACM.


Poster Presented at TheWebConf-2022

Click here to open the poster page a new tab.

EXPERIMENTS

We have used the following datasets:

  1. DBPedia: We used dbpedia script to download all the dbpedia files listed here.
  2. YAGO-4: We downloaded YAGO-4 English version from https://yago-knowledge.org/data/yago4/en/.
  3. LUBM: We generated LUBM dataset following the guidelines available at LUBM's official Website.

You can download a copy of these datasets from our single archive.

SHACL Shapes

DOI

We have published the extracted SHACL shapes of all three datasets on Zenodo. Additionally, we have also made available an executable Jar file of our application on Zenodo to extract SHACL shapes from RDF datasets in .nt format.

Source Code

To be released soon! Keep visiting our GitHub repository.

Requirements

Java 8 or newer.