Lothbrok

Optimizing SPARQL Queries over Decentralized Knowledge Graphs

Abstract

While the Web of Data in principle offers access to a wide range of interlinked data, the architecture of the Semantic Web today relies mostly on the data providers to maintain access to their data through SPARQL endpoints. Several studies, however, have shown that such endpoints often experience downtime, meaning that the data they maintain becomes inaccessible. While decentralized systems based on Peer-to-Peer (P2P) technology have previously shown to increase the availability of knowledge graphs, even when a large proportion of the nodes fail, processing queries in such a setup can be an expensive task since data necessary to answer a single query might be distributed over multiple nodes. In this paper, we therefore propose an approach to optimizing SPARQL queries over decentralized knowledge graphs, called Lothbrok. While there are potentially many aspects to consider when optimizing such queries, we focus on three aspects: cardinality estimation, locality awareness, and data fragmentation. We empirically show that Lothbrok is able to achieve significantly faster query processing performance compared to the state of the art when processing challenging queries as well as when the network is under high load.

EXPERIMENTS

We ran our experiments on a server with 128 vCPU cores at 2.5GHz, 64KB L1 cache, 512KB L2 cache and 8192KB L3 cache each, and 2TB RAM. We use 128 nodes on the same server with resources split up evenly between them. We compare Lothbrok to ColChain and PIQNIC.

We have the following metrics: We use three differently sized WatDiv datasets; WatDiv-10M, WatDiv-100M, and WatDiv-1000M, as well as LargeRDFBench for data and queries for tests.

The WatDiv query loads are over WatDiv-100M; statistics for remaining query loads will follow shortly.

Triple Pattern count

Join vertex count (#JV)

Join vertex degree

Result cardinality

Mean TP selectivity

Scalability of Lothbrok when creating an equal number of characteristic set fragments and predicate-based fragments.

Throughput over WatDiv-10M (log scale)

Throughput over WatDiv-100M (log scale)

Throughput over WatDiv-1000M (log scale)

Number of timeouts over WatDiv-10M (log scale)

Number of timeouts over WatDiv-100M (log scale)

Number of timeouts over WatDiv-1000M (log scale)

Average workload time over WatDiv-10M (log scale)

Average workload time over WatDiv-100M (log scale)

Average workload time over WatDiv-1000M (log scale)

Results for each query load over each dataset.

Query execution time (QET) over WatDiv-10M (log scale)

Query execution time (QET) over WatDiv-100M (log scale)

Query execution time (QET) over WatDiv-1000M (log scale)

Query response time (QRT) over WatDiv-10M (log scale)

Query response time (QRT) over WatDiv-100M (log scale)

Query response time (QRT) over WatDiv-1000M (log scale)

Number of requests (REQ) over WatDiv-10M (log scale)

Number of requests (REQ) over WatDiv-100M (log scale)

Number of requests (REQ) over WatDiv-1000M (log scale)

Number of transferred bytes (NTB) over WatDiv-10M (log scale)

Number of transferred bytes (NTB) over WatDiv-100M (log scale)

Number of transferred bytes (NTB) over WatDiv-1000M (log scale)

Number of relevant fragments (NRF) over WatDiv-10M

Number of relevant fragments (NRF) over WatDiv-100M

Number of relevant fragments (NRF) over WatDiv-1000M

Number of relevant nodes (NRN) over WatDiv-10M

Number of relevant nodes (NRN) over WatDiv-100M

Number of relevant nodes (NRN) over WatDiv-1000M

Experimental results for each query group in LargeRDFBench.
Query group S

Query execution time (QET) for query group S (log scale)

Query response time (QRT) for query group S (log scale)

Query optimization time (QOT) for query group S (log scale)

Number of requests (REQ) for query group S (log scale)

Number of transferred bytes (NTB) for query group S (log scale)

Number of relevant fragments (NRF) for query group S

Number of relevant nodes (NRN) for query group S



Query group C

Query execution time (QET) for query group C (log scale)

Query response time (QRT) for query group C (log scale)

Query optimization time (QOT) for query group C (log scale)

Number of requests (REQ) for query group C (log scale)

Number of transferred bytes (NTB) for query group C (log scale)

Number of relevant fragments (NRF) for query group C

Number of relevant nodes (NRN) for query group C



Query group L

Query execution time (QET) for query group L (log scale)

Query response time (QRT) for query group L (log scale)

Query optimization time (QOT) for query group L (log scale)

Number of requests (REQ) for query group L (log scale)

Number of transferred bytes (NTB) for query group L (log scale)

Number of relevant fragments (NRF) for query group L

Number of relevant nodes (NRN) for query group L



Query group CH

Query execution time (QET) for query group CH (log scale)

Query response time (QRT) for query group CH (log scale)

Query optimization time (QOT) for query group CH (log scale)

Number of requests (REQ) for query group CH (log scale)

Number of transferred bytes (NTB) for query group CH (log scale)

Number of relevant fragments (NRF) for query group CH

Number of relevant nodes (NRN) for query group CH

The following figures show the experimental results in the setup where we did not create an equal number of fragments for Lothbrok. In other words, in this setup, we included a fragment for each characteristic set spanning at least 50 subjecet values.

Throughput over WatDiv-10M (log scale)

Throughput over WatDiv-100M (log scale)

Throughput over WatDiv-1000M (log scale)

Number of timeouts over WatDiv-10M (log scale)

Number of timeouts over WatDiv-100M (log scale)

Number of timeouts over WatDiv-1000M (log scale)

Average workload time over WatDiv-10M (log scale)

Average workload time over WatDiv-100M (log scale)

Average workload time over WatDiv-1000M (log scale)


Query execution time (QET) over WatDiv-10M (log scale)

Query execution time (QET) over WatDiv-100M (log scale)

Query execution time (QET) over WatDiv-1000M (log scale)

Query response time (QRT) over WatDiv-10M (log scale)

Query response time (QRT) over WatDiv-100M (log scale)

Query response time (QRT) over WatDiv-1000M (log scale)

Number of requests (REQ) over WatDiv-10M (log scale)

Number of requests (REQ) over WatDiv-100M (log scale)

Number of requests (REQ) over WatDiv-1000M (log scale)

Number of transferred bytes (NTB) over WatDiv-10M (log scale)

Number of transferred bytes (NTB) over WatDiv-100M (log scale)

Number of transferred bytes (NTB) over WatDiv-1000M (log scale)

Number of relevant fragments (NRF) over WatDiv-10M

Number of relevant fragments (NRF) over WatDiv-100M

Number of relevant fragments (NRF) over WatDiv-1000M

Number of relevant nodes (NRN) over WatDiv-10M

Number of relevant nodes (NRN) over WatDiv-100M

Number of relevant nodes (NRN) over WatDiv-1000M

The following figures show additional experimental results of optimization.

Number of relevant fragments (NRF) before optimization over WatDiv-10M

Number of relevant fragments (NRF) before optimization over WatDiv-100M

Number of relevant fragments (NRF) before optimization over WatDiv-1000M

Number of relevant nodes (NRN) before optimization over WatDiv-10M

Number of relevant nodes (NRN) before optimization over WatDiv-100M

Number of relevant nodes (NRN) before optimization over WatDiv-1000M

Number of pruned nodes (NRP) for each query over WatDiv-10M

Number of pruned nodes (NRP) for each query over WatDiv-100M

Number of pruned nodes (NRP) for each query over WatDiv-1000M

Number of nodes involved in the query processing (NIQ) for each query over WatDiv-10M

Number of nodes involved in the query processing (NIQ) for each query over WatDiv-100M

Number of nodes involved in the query processing (NIQ) for each query over WatDiv-1000M

INSTALLATION

Requirements

Java 8 or newer, Jetty server.

Installation

To install a Lothbrok node or reproduce our experiments with Lothbrok, please download the sources on our GitHub as well as the resources used on Zenodo, and follow the instructions there.

DOWNLOADS

  • Download the sources from our GitHub.
  • Download the resources, including data fragments, indexes, queries, setup files, and scripts, necessary to run our experiments on Zenodo.