Efficient In-memory Indexing for Metadata-augmented RDF


Metadata, e.g., provenance, versioning, temporal annotations, etc., is vital for the construction, curation, and maintenance of large RDF datasets. Despite the importance of metadata in the RDF ecosystem, support for metadata-augmented RDF remains limited. Some solutions focus on particular annotation types but no approach so far implements arbitrary levels of metadata in an application-agnostic way. We take a step to tackle this limitation and propose an in-memory tuple store architecture that can manage RDF data augmented with any type of metadata. Our approach, called TrieDF, builds upon the notion of tries to store the indexes and the dictionary of a metadata-augmented RDF dataset in a compact way. Our experimental evaluation on three use cases shows that TrieDF outperforms state-of-the-art in-memory solutions for RDF in terms of main memory usage and retrieval time, while remaining application agnostic.


The source code and datasets used for our experiments can be found below.

TrieDF src and data (7 GiB download): triedf_src_experiments.tar.xz