To motivate the need for some version of an OBDA system, we start with two perspectives: the (end-)user and the database administrator.
A Database Administrator’s Perspective
Organizations normally have multiple database to store and manage their data; e.g., a PeopleSoft applications for student, course, degree, and grade management at a university, a database with employee data, a course management system like Moodle or Sakai for the university’s course content management, and so on. Or take a city’s public administration that wants to develop integrated services delivery and the separate databases of the individual services have to be integrated from separate electricity, sewerage, water, refuse collection, and cadastre databases. Health information systems sometimes need to be kept separate for privacy reasons, yet at the same time, some cross-database queries have to be executed. Hence, the databases have to be connected in some way, and doing all that manually is a tedious, time-consuming task in any case. Moreover, as database administrator, you will have to know how the data is stored in the database, write (very) large queries that can span pages, and there is no management for recurring queries.
Instead of knowing the structure of the database(s) by heart to construct such large queries, one can reduce the cognitive (over-)load by focusing only on what is in the database, without having to care about if, say, the class
Student has a separate table or not, whether it uses the full name
student or perhaps an abbreviation, like
stdnt, or whether the data about students is split into more than one table in more than one database. Provided the database was developed properly, there is a conceptual data model that has exactly the representation of what kind of data is stored in the database. Traditionally it is offline and shelved after the first implementation. However, this need not be the case, and OBDA can fill this gap.
The Case From the Viewpoint of the User
Did you ever not want to bother knowing how the data is stored in a database, but simply want to know what kind of things are stored in the database at, say, the conceptual layer? And did you ever not want to bother having to learn SQL in order to write queries in SQL (or, in the context of the Semantic Web, SPARQL), but have a graphical point-and-click interface with which you can compose a query using that ‘what layer’ of knowledge or some natural language interface such that the system will generate automatically the SQL query for you, in the correct syntax? (And all that not with a downloaded desktop application but in a Web browser?) Frustrated with rigid canned queries and pre-computed queries that limit your freedom to analyze the data? You don’t want to keep on bothering the sysadmin for application layer updates to meet your whims and be dependent on whether she has time for your repeated requests?
Several domain experts in genetics, healthcare informatics, and oil processing, at least, wanted that and felt constrained in what they could do with their data, including at least pondering about, if not full desperation with, the so-called “writeonly” databases. Especially in the biology and biomedical fields, there has been much ontology development as well as generation of much data, in parallel, which somehow has to be linked up again.
The notions of query by diagram or conceptual queries might fill this gap. These ideas are not new [CS94, BH96, BH97, SGJR+17], but now the technologies exist to realize it, even through a web interface and with reasoner-enabled querying. So, now one can do a sophisticated analysis of one’s data and unlock new information from the database by using the OBDA approach. In one experiment, this resulted in the users—scientists conducting in silico experiments—coming up with new queries not thought of asking before [CKN+10].