Publishing and interlinking historical data on the Semantic Web

Project conception and management: Francesco Beretta
Platform deployment and administration: Djamel Ferhod

The project (Système modulaire de gestion de l'information historique – has created an open modular platform for storing geo-historical information. The web-based platform allows researchers to share their data and texts in a collaborative environment. Students, scholars and research projects contribute to information collection and produce structured data concerning different domains, such as intellectual, economic, social, institutional or religious history. The datasets related to different projects are published on dedicated websites (Patrons de France ; SIPROJURIS). A historical atlas of political territories is also under development.

The richness and heterogeneity of the shared information requires a generic data model which has been designed with Merise (ERD) modeling method (cf. Beretta & Vernus 2012) and implemented in a relational PostgreSQL database to which users can connect via a user-friendly AJAX web application. In parallel, the project deployed an environment developed using eXist-db technologies for analysing, sharing and publishing XML/TEI encoded texts. The semantic annotation of named entities and knowledge units is achieved by linking the semantic tags defined in line with the Text Encoding Initiative (TEI) to the resources created in the relational database. Some of the texts are available on the public platform.

The interlinking of objects, information and texts is carried out using URIs to identify the database objects and information. We therefore created a website (to which this page belongs) to deliver to participating scholars, and to the public, the authority files for dereferencing URIs, identifying objects and publishing a portion of the knowledge units collected about historical objects.

However, this only allows for humans to read the data on a web page. The next step consists in providing dereferencing of objects and information in form of RDF data and, more widely, in connecting the project's data with cultural heritage information, such as that published by museums, libraries and archives, or with data produced by other research projects. The generic nature of the data model allows easy transposition to an OWL DL ontology. We therefore translated the generic data model in RDF and created a SPARQL-endpoint for a first dataset (cf. Francesco Beretta, Djamel Ferhod, Séverine Gedzelman, Pierre Vernus 2014).

This part of the project is still under development. So avoid at the moment using the generic ontology for structuring your own data because it could still evolve. You can of course query and use the data but be aware that they are made available under licence, namely Creative commons Attribution-NonCommercial-ShareAlike 4.

The schema of the generic ontology, both in RDF and PDF format, can be downloaded from this link.
If you want to visualize the generic ontology, unzip the downloaded folder, go to this website and upload the .rdf file using the "Ontology" button at the bottom of the page.

For an introduction to semantic web technologies and SPARQL see this wiki.

The SPARQL-endpoint comprises different graphs:

