PIQNIC

PPBFS

Decentralized Indexing over a Network of RDF Peers

Abstact

Despite the prospect of a vast Web of interlinked data, the Semantic Web today mostly fails to meet its potential. One of the main problems it faces is rooted in its current architecture, which totally relies on the availability of the servers providing access to the data. These servers are subject to failures, which often results in situations where some data is unavailable. Recent advances have proposed decentralized peer-to-peer based architectures to alleviate this problem. However, for query processing these approaches mostly rely on flooding, a standard technique for peer-to-peer systems, which can easily result in very high network traffic and hence cause high query response times. To still enable efficient query processing in such networks, this paper proposes two indexing schemes, which in a decentralized fashion aim at efficiently finding nodes with relevant data for a given query: Locational Indexes and Prefix-Partitioned Bloom Filters. Our experiments show that such indexing schemes are able to considerably speed up query processing times compared to existing approaches.

EXPERIMENTS

We ran our experiments on a server with 4xAMD Opteron 6376, 16 core processors at 2.3GHz, 768KB L1 cache, 16MB L2 cache and 16MB L3 cache each (64 cores in total), and 516GB RAM. We use 200 clients on the same server.

We have the following metrics:

Execution Time (ET): Execution time of a query in seconds.
Completeness (COM): Number of actual retrieved results divided by number of expected results.
Number of Messages (NM): Number of messages exchanged between nodes.
Number of Transferred Bytes (NTB): Number of transferred bytes between nodes.

We use LargeRDFBench for data and queries for tests. We use groups S, C, L, and CH.

Execution Time (ET) in Seconds for group S (log scale)

Execution Time (ET) in Seconds for group C (log scale)

Execution Time (ET) in Seconds for groups L and CH (log scale)

Completeness in Percentage for group S

Completeness in Percentage for group C

Completeness in Percentage for groups L and CH

Not shown for L and CH, since PIQNIC times out for most queries.

Number of Exchanged Messages for group S (log scale)

Number of Exchanged Messages for group C (log scale)

Not shown for C, L and CH, since PIQNIC times out for most queries.

Number of Transferred Bytes for group S (log scale)

Execution Time (ET) in Seconds for group S (log scale)

Execution Time (ET) in Seconds for group C (log scale)

Execution Time (ET) in Seconds for groups L and CH (log scale)

We tested with 0% replication (each fragment is only located on 1 node), 5% and 10%

Execution Time (ET) in Seconds for group S (log scale)

Execution Time (ET) in Seconds for group C (log scale)

Execution Time (ET) in Seconds for groups L and CH (log scale)

NEM and NTB are the same with the exception of small fluctuations due to the specific neighbourhoods of the tested nodes. This is due to the network structure being the same (same amount of nodes contacted).

We tested with 1, 5 and 10 neighbors.

Execution Time (ET) in Seconds for group S (log scale)

Execution Time (ET) in Seconds for group C (log scale)

Execution Time (ET) in Seconds for groups L and CH (log scale)