Triple pattern count (#TP)
Join vertex count (#JV)
Join vertex degree (mean)
Result cardinality (#Results)
Triple pattern selectivity (mean)
Triple pattern selectivity (stdev)
SPARQL endpoints offer access to a vast amount of interlinked information. While they offer a well-defined interfacefor efficiently retrieving results for complex SPARQL queries, complex query loads can easily overload or crash endpoints asall the computational load of answering the queries resides entirely with the server hosting the endpoint. Recently proposedinterfaces, such as Triple Pattern Fragments, have therefore shifted some of the query processing load from the server to theclient at the expense of increased network traffic in the case of non-selective triple patterns. This paper therefore proposes StarPattern Fragments (SPF), an RDF interface enabling a better load balancing between server and client by decomposing SPARQLqueries into star-shaped subqueries, evaluating them on the server side. Experiments using synthetic data (WatDiv), as well asreal data (DBpedia), show that SPF does not only significantly reduce network traffic, it is also up to two orders of magnitudefaster than the state-of-the-art interfaces under high query load.
EXPERIMENTS
To run the clients, a virtual machine (VM) running all 128 clients concurrently was used. The VM had 128 vCPU cores with a clock speed of 2.5GHz, 64KB L1 cache, 512KB L2 cache, 8192KB L3 cache, and 2TB main memory. Each client was limited to use just one vCPU core and 15GB RAM. The LDF server and the SPARQL endpoint were run, at all times, on a server with 32 vCPU cores, with a clock speed of 3GHz, 64KB L1 cache, 4096KB L2 cache, and 16384KB L3 cache, and a main memory of 128GBTriple pattern count (#TP)
Join vertex count (#JV)
Join vertex degree (mean)
Result cardinality (#Results)
Triple pattern selectivity (mean)
Triple pattern selectivity (stdev)
Throughput over configurations (with timeouts)
Number of timeouts over configurations
Throughput over configurations (with timeouts)
Number of timeouts over configurations
Throughput over configurations (with timeouts)
Number of timeouts over configurations
Throughput over configurations (with timeouts)
Number of timeouts over configurations
Throughput over configurations (with timeouts)
Avg. server CPU load over configurations
Number of timeouts over configurations
Throughput over configurations (with timeouts)
Avg. server CPU load over configurations
Number of timeouts over configurations
Number of Requests to the Server over query loads for 64 concurrent clients (w/o timeouts)
Number of Transferred Bytes over query loads for 64 concurrent clients (w/o timeouts)
Number of Requests to the Server over query loads for 64 concurrent clients (w/ timeouts)
Number of Transferred Bytes over query loads for 64 concurrent clients (w/ timeouts)
Query Execution Time over query loads for 64 concurrent clients
Query Response Time over query loads for 64 concurrent clients
Throughput over configurations (with timeouts)
Number of timeouts over configurations
Throughput over configurations (with timeouts)
Number of timeouts over configurations
Throughput over configurations (with timeouts)
Number of timeouts over configurations
Throughput over configurations (with timeouts)
Number of timeouts over configurations
Throughput over configurations (with timeouts)
Avg. server CPU load over configurations
Number of timeouts over configurations
Throughput over configurations (with timeouts)
Avg. server CPU load over configurations
Number of timeouts over configurations
Number of Requests to the Server over query loads for 64 concurrent clients (w/o timeouts)
Number of Transferred Bytes over query loads for 64 concurrent clients (w/o timeouts)
Number of Requests to the Server over query loads for 64 concurrent clients (w/ timeouts)
Number of Transferred Bytes over query loads for 64 concurrent clients (w/ timeouts)
Query Execution Time over query loads for 64 concurrent clients
Query Response Time over query loads for 64 concurrent clients
Throughput over configurations (with timeouts)
Number of timeouts over configurations
Throughput over configurations (with timeouts)
Number of timeouts over configurations
Throughput over configurations (with timeouts)
Number of timeouts over configurations
Throughput over configurations (with timeouts)
Number of timeouts over configurations
Throughput over configurations (with timeouts)
Avg. server CPU load over configurations
Number of timeouts over configurations
Throughput over configurations (with timeouts)
Avg. server CPU load over configurations
Number of timeouts over configurations
Number of Requests to the Server over query loads for 64 concurrent clients (w/o timeouts)
Number of Transferred Bytes over query loads for 64 concurrent clients (w/o timeouts)
Number of Requests to the Server over query loads for 64 concurrent clients (w/ timeouts)
Number of Transferred Bytes over query loads for 64 concurrent clients (w/ timeouts)
Query Execution Time over query loads for 64 concurrent clients
Query Response Time over query loads for 64 concurrent clients
Throughput over configurations (with timeouts)
Number of timeouts over configurations
Throughput over configurations (with timeouts)
Number of timeouts over configurations
Throughput over configurations (with timeouts)
Number of timeouts over configurations
Throughput over configurations (with timeouts)
Number of timeouts over configurations
Throughput over configurations (with timeouts)
Avg. server CPU load over configurations
Number of timeouts over configurations
Throughput over configurations (with timeouts)
Avg. server CPU load over configurations
Number of timeouts over configurations
Number of Requests to the Server over query loads for 64 concurrent clients (w/o timeouts)
Number of Transferred Bytes over query loads for 64 concurrent clients (w/o timeouts)
Number of Requests to the Server over query loads for 64 concurrent clients (w/ timeouts)
Number of Transferred Bytes over query loads for 64 concurrent clients (w/ timeouts)
Query Execution Time over query loads for 64 concurrent clients
Query Response Time over query loads for 64 concurrent clients
Throughput over configurations
Avg. server CPU load over configurations
Number of timeouts over configurations
Number of Requests to the Server over query loads for 64 concurrent clients (w/o timeouts)
Number of Transferred Bytes over query loads for 64 concurrent clients (w/o timeouts)
Number of Requests to the Server over query loads for 64 concurrent clients (w/ timeouts)
Number of Transferred Bytes over query loads for 64 concurrent clients (w/ timeouts)
Query Execution Time over query loads for 64 concurrent clients
Query Response Time over query loads for 64 concurrent clients
SPECIFICATION
SPF is an extension of brTPF, and thus also TPF. The hypermedia specification is therefore adapted from TPF found on this link. Expand below to view the hypermedia specification.A Star Pattern Fragment is a Linked Data Fragment of a dataset that consists of three parts:
?s ?p1 ?o1 . ?s ?p2 ?o2 . ... . ?s ?pn ?on
”.
void:triples
predicate
that expresses the estimated total number of matches for the star pattern.
A Star Pattern Fragment of a dataset is fully determined
by its star pattern selector.
A star pattern selector consists of a subject
component and an arbitrary number of predicate
and object
components.
The number of predicate
and object
components MUST be the same. Henceforth we will refer to pi
as the i'th predicate
and oi
as the i'th object
.
The subject
MUST either be a variable or an IRI;
all predicates
MUST either be variables or IRIs;
the objects
MUST either be variables, IRIs, or literals.
These components MUST NOT be blank nodes.
A Star Pattern Fragment is considered empty if it does not contain any data triples (regardless of metadata and controls).
Star Pattern Fragments are not bound to a specific syntax, since the data, metadata, and controls can be represented in different ways. The server MUST, however, support at least one RDF-based representation. Servers MUST indicate the corresponding MIME type when responding with a Star Pattern Fragment, so clients can correctly parse it.
The server SHOULD choose to group triples that combined answer the entire star pattern when answering answering a star pattern request so clients can more efficiently parse the results.
For RDF syntaxes without multiple graph support (such as Turtle or N-Triples), the data, metadata, and controls MUST necessarily be serialized to the same graph. For RDF syntaxes with multiple graph support (such as JSON-LD, TriG or N-Quads), the data MUST be serialized to the default graph and the metadata and controls MUST be serialized to one or multiple non-default graphs.
NOTE: A Star Pattern Fragments server MAY be backwards compatible with a Triple Pattern Fragments, and/or a Bindings-Restricted Triple Pattern Fragments server, in that if a star pattern with only one triple pattern is received, the server MAY choose to invoke the triple pattern selector or bindings-restricted triple pattern selector. In this case, the fragment is still obtained by issuing a star pattern fragments request, and thus it follows the same specification for hypermedia controls and metadata, i.e. the ones described in this document.
The data of a Star Pattern Fragment is obtained by selecting all triples of a dataset that match the fragment's star pattern selector. These triples SHOULD be ordered in some consistent way, such that Star Pattern Fragments can be paged consistently. Data triples SHOULD be grouped such that each group of data triples represent a full answer to the star pattern. Data triples MUST NOT contain blank nodes, since these are scoped to a document, whereas clients need to be able to consistently refer to star components across different Star Pattern Fragments. The RECOMMENDED way of eliminating blank nodes is skolemization.
If the RDF syntax supports multiple graphs, data triples MUST be serialized to the default graph.
Each Star Pattern Fragment, and each page of a Star Pattern Fragment, MUST contain the estimated total number of results (grouped triple patterns that answer the star pattern) that match the fragment's selector. This MUST be expressed as a triple with the following components:
void:triples
predicateThe estimate MUST be a non-negative, finite, integer number with the following properties:
The metadata MAY additionally contain variations of the above triple.
For instance,
it is RECOMMENDED to add a triple with the same subject and object
and the hydra:totalItems
predicate.
If the RDF syntax supports multiple graphs,
metadata triples MUST be serialized to a non-default graph.
This non-default graph SHOULD be explicitly related to the Star Pattern Fragment
(for instance, using the foaf:primaryTopic
predicate),
so clients can interpret what resource this metadata belongs to.
This graph MAY be the same as the graph containing the hypermedia controls.
Each Star Pattern Fragment, and each page of a Star Pattern Fragment, MUST contain a hypermedia control that can generate the IRI of each other Star Pattern Fragment of the same dataset.
This control MUST act as a function that accepts the following parameters:
This control MUST be expressed as a form in the Hydra Core Vocabulary using triples with the following structure:
<http://example.org/dbpedia#dataset>
a
void:Dataset , hydra:Collection ;
void:subset <http://example.org/dbpedia> ;
hydra:search [ hydra:mapping [ hydra:property rdf:subject ; hydra:variable "s" ] ;
hydra:mapping [ hydra:property xsd:integer ; hydra:variable "triples" ] ;
hydra:mapping [ hydra:property xsd:string ; hydra:variable "star" ] ;
hydra:mapping [ hydra:property xsd:string ; hydra:variable "values" ] ;
hydra:template "http://example.org/dbpedia{?s,triples,star,values}" ] .
The IRI template to retrieve Star Pattern Fragments of the dataset is
http://example.org/dbpedia{?s,triples,star,values}
.
All of these MUST be adjusted to fit the configuration of a specific server.
Note that the form MUST be attached to the dataset,
as the form filters the dataset and not the fragment,
and the fragment MUST explicitly be listed as a subset of the dataset,
in order to indicate the relation between the two.
Star Pattern Fragments SHOULD be paged to avoid overly large responses. A page of a Star Pattern Fragment consists of the following three parts:
hydra:view
.
void:triples
predicate
that expresses the estimated total number of matches for the fragment's star pattern.
hydra:previous
.
The previous page SHOULD NOT be empty.
hydra:next
.
The next page SHOULD NOT be empty.
A page is considered empty if it does not contain any data triples (regardless of metadata and controls).
http://example.org/dbpedia{?s,triples,star,values}
http://example.org/dbpedia?triples=3&star=[p1,dbo:country;o1,dbr:norway;p2,dbo:award;p3,dbo:birthDate]
INSTALLATION
The SPF client is available in Node.js or Java 8, and the server is available in Java 8. We extended the TPF server to support SPF requests. You can view the sources on GitHub. Note, that it was implemented only as a prototype for testing the performance and availability of the server and client. The server can be run as a Jetty application. The client can be run as a standalone .js or .jar file or installed into your own projects as a library.Requirements
Java 8 or newer.Installation