YouTube

By loading the video, you agree to YouTube's privacy policy.
Learn more

Load video

This is the story of why I decided to use Semantic MediaWiki together with Wikibase and how I took the best of both worlds.

Why

Originally my project was meant to be a two-wiki setup, with the “front-end” wiki taking data from its companion “back-end” based solely on the Wikibase extension: a scheme similar to the one consolidated in Wikimedia projects where Wikidata is the source of raw data for all the Wikipedia websites.

Wikibase allows a clear organization of pure data that can be imported in a linked wiki using built-in templates as well as the powerful LUA language using Scribunto extension. Wikibase also allows queries with SPARQL language and so it seemed the ideal solution for my purposes, at least at first sight. Unfortunately, some WB shortcomings became apparent only after I started working extensively on my implementation of it.

In short, what is impossible with Wikibase is to make queries usable on pages either directly or with LUA code. For the sake of the example, let’s consider a big group of items representing episodes of a TV show. These episodes belong to multiple seasons which can easily be identified by one of the Property of the Item. In my mind, it should have been very easy to make some kind of query in the form of

SELECT items WHERE instance = 'episode' AND season = '1'

This feasible in SPARQL, but the result cannot be embedded in a page. The problem with Wikibase is that you need to know the item you are referring to from the very beginning, then you can read whatever data you want from it, but you cannot really query the whole dataset from pages.

A possible solution

A possible solution to the issue was to use an alternative tool that I originally looked at when I started my project: Semantic MediaWiki. At the time, I decided not to go with it mainly because the idea of strongly separating data from the Wiki was very appealing to me and I liked the way Wikidata and Wikipedia are set up, so I wanted to go with a similar solution. When I reviewed SMW after learning Wikibase shortcomings, it was clear to me that, in the scope of querying, SMW was far more versatile than WikiBase.

A dilemma that wasn’t really there

When I realized Wikibase lacked a critical feature I needed, I was already pretty far in my development. Throwing everything away and start from scratch with Semantic MediaWiki would have been a radical step, a waste of the time I already employed and, least but not last, a leap of faith, as I knew SMW only superficially at that time. But then again, it seemed the only viable alternative.

So I spent days pondering how to proceed, making hypotheses and then plans. I also asked fellow IT colleagues, but there was nobody really able to give me an informed opinion or suggestions based on existing best practices. There were simply none.

(By the way, this is the main reason why I am writing this post and why I try to contribute to the community with my experience)

After a couple of days, the solution struck me suddenly: there was no solution because there was no problem. What I finally realized was that Wikibase and SMW are not mutually exclusive alternatives. They are often used to serve the same purpose, but they are not a case of aut aut. In fact, you can find places on the internet where they list why one tool is better than the other, but they are not really born as opposites. They can coexist, pretty well, I might add.

Final setup

The winning setup, to overcome the shortcomings of both tools, is to use them together. In a few words, I went on with my original plan to store all the raw data in a WikiBase-enabled Mediawiki installation, but I made a critical tweak to the consuming Mediawiki installation. The data there are not simply presented to the user as plain strings of text: data coming from Wikibase are processed and enriched with semantic content before being pushed into the wiki page. This way, the user needs to maintain data in a centralized place in Wikibase and a series of LUA modules automatically add the proper SMW prefixes to them. After the data are rendered with their new semantic meaning, not only they can be managed with the Wikibase tools, but with all the SMW tools as well. All of this is realized without manual intervention: everything happens automatically taking data from WikiBase and processing them with LUA code.

While it is impossible to describe all the details in a blog post such as this, there are a few important points worth highlighting here.

Properties

The type of the SMW property is not manually set in the consumer Wiki: they are stored in a Statement of the corresponding Wikibase property, so the definition needs only to be done once in Wikibase. Afterward, with a template in the wiki, the SMW property can be automatically defined by reading that statement from the property and including it on the property’s page. Unfortunately, since Properties cannot be linked to pages, contrary to Items, the Property needs to be supplied by hand, but this is just a small inconvenience in an otherwise fully automatic process.

Generic LUA functions

Links and texts in the consumer wiki are generated via a combination of templates and LUA modules. I equipped all the LUA functions with a parameter called addSemantic allowing me to parametrically control a function whether to return semantic-enhanced links or not. So the “standard” and “old” code con coexist with the newly-written one.

Conclusions

The job on the integration between Wikibase and SMW is far from over. After all, this is an experiment more than a finished product: this is a way of working that, to my current knowledge, was not implemented elsewhere so there is no right way, no wrong way, and no best practices I can rely on. I can only build them for myself: I hope I can bring more users with me along the way and ultimately build something working for me, but helpful for many others. Stay tuned.