Towards Fully-fledged Archiving for RDF Datasets


The dynamicity of RDF data has motivated the development of solutions for archiving, i.e., the task of storing and querying previous versions of an RDF dataset. Querying the history of a dataset finds applications in data maintenance and analytics. Notwithstanding the value of RDF archiving, the state of the art in this field is under-developed: (i) most existing systems are neither scalable nor easy to use, (ii) there is no standard way to query RDF archives, and (iii) solutions do not exploit the evolution patterns of real RDF data. On these grounds, this paper surveys the existing works in RDF archiving in order to characterize the gap between the state of the art and a fully-fledged solution. It also provides RDFev, a framework to study the dynamicity of RDF data. We use RDFev to study the evolution of YAGO, DBpedia, and Wikidata, three dynamic and prominent datasets on the Semantic Web. These insights set the ground for the sketch of a fully-fledged archiving solution for RDF data.


The datasets used for our experiments can be found below.

YAGO: yago.tar.xz

DBpedia: dbpedia.tar.xz

Wikidata: wikidata.tar.xz


The queries used for Ostrich's query experiments can be found here: queries.zip


The source code for RDFev and instructions to run it can be found at https://gitlab.com/opelgrin/rdfev