Wikimedia seems to have accomplished its mission: By current count, there are 55 million articles in 300 languages available. But the numbers are misleading. If one follows the vision “Imagine a world in which every single human being can freely share in the sum of all knowledge,” there is still real Herculean work to be done. There is a great need for action in the exchange of knowledge and data between the different language versions. When it comes to integrating the Wikimedia world with other wiki systems, things often don’t look much better.

The Wikimedia Foundation has taken important steps here with the launch of Wikimedia Commons, Wikidata, and the Content Translation Tool. With the new projects Abstract Wikipedia and Wikifunctions, further nodes in the Wikimedia universe are now emerging.

In his keynote at SMWCon 2020, Denny Vrandečić outlines the goals of the new projects and embeds it in the development since 2001. Above all, he shows that the development of Semantic MediaWiki was a milestone and will continue to play an important role in the future.

YouTube

By loading the video, you agree to YouTube's privacy policy.
Learn more

Load video

Untapped knowledge and unwritten articles

Vrandečić’s topic is the multilingualism of the Wikimedia world. And he once again makes clear the problems that lie ahead.

Wikipedia has editions in 300 languages, but the individual language versions of the online encyclopedia differ enormously in terms of content, article quality and size. For example, the German Wikipedia has 2.5 million articles, while Wikipedia’s flagship, the English version, contains 6.2 million.
But by Vrandečić’s count, the two language versions have only 1.2 million topics in common. In other words, the German version alone has 1.3 million articles that are unreadable and therefore unavailable to English readers.
Globally, the English flagship covers at best a third of all available topics: The 300 language versions offer more than 20 million topics, the English version only the 6.2 million just mentioned.

If you want to solve this situation, you face a gigantic volume problem. Making 20 million topics available in 300 languages would mean having to write an unimaginable 6 billion articles in total.

Semantic MediaWiki as the basis for “pay-as-you-go” application development

As an example of how to deal with such a large and complex task, Vrandečić points to the history of Semantic MediaWiki (SMW).

In 2005, at the first peak of the Wikipedia hype, there was an idea to make the data in Wikipedia easier to share and use. For example, the idea was to make it possible to search Wikipedia with triple queries. (A typical triple query is: “Give me a list of all the names of African capitals and the country where each capital is located.”)

Technically, it should be possible to enrich existing links with semantic annotations, which would then be machine-interpretable. With this idea in mind, the Semantic MediaWiki project was launched. Since then, the approach has been elaborated and further developed. And an active and innovative community emerged to this day.

The community discovered very quickly that the Semantic functions could also be used to develop user-specific applications very quickly without in-depth programming knowledge. With manageable effort, amazing results could be achieved. This feature of Semantic MediaWIki was once called “application prototyping” by Markus Glaser in a talk years ago. Denny Vrandečić prefers the term “Pay-As-You-Go-Application-Development”. Pay as you go in the sense that you can develop very efficient very user-specific tools with SMW.

Vrandečić illustrates this feature of Semantic MediaWiki with the example of his personal wiki: he uses it as a research notebook, he writes paper reviews into the wiki, deposits notes and has a calendar with birthdays and a simple address book in his wiki. He also reminds his audience of the almost infinite uses of Semantic MediaWiki on the web: from a database in genetic research like SNPedia to organizing conferences like SMWCon.

“Ultimate nerd snipes” and their limitations

Here Vrandečić sees some parallels with the emergence of other tools from that time: Apple’s HyperCard, Emacs, or MIT Haystack. In these systems, too, it was possible to quickly deposit and search knowledge without special programming skills. And he recalls the development of Speadsheets, which were completely underestimated by developers, but offered incredible possibilities.

All of these systems have created a new type of non-programmer-programmer. And for Vrandečić, they all form a specific type of application: Semantic MediaWiki, Emacs or spreadsheets are “Ultimate nerd snipes” as he put it. The basis of such tools is a compelling idea, where you just drop everything. And they allow you to solve complex and challenging tasks, as well as to think better about yourself or in a team.

However, these “Ultimate nerd snipes” also have their limitations. You can set up an address book with Semantic MediaWiki, but you’ll probably end up using specialized software at some point, in this case address software, because it’s optimized for the use case and Semantic MediaWiki’s features can’t be extended any further.

On the other hand, by using specialized software, you again lose the integration in one tool. The data is torn apart and walled gardens are created: closed software systems that build on another technology and develop their own ecosystem.

And so “Ultimate nerd snipes” like Semantic MediaWiki remain caught in a contradiction: The community creates a wonderful system by combining a semantic annotation template system with categories, Lua system and page history. But this system hard to maintain and hard to market because there is no clear use case. But as soon as you promote only a specific part, for example the use as project management, you lose the big advantage of the Semantic MediaWiki approach.

The future: interoperability of Semantic MediaWiki, Wikidata and Cargo

It was probably the special requirements of Wikipedia that prevented Semantic MediaWiki from being implemented in Wikipedia. Within the Wikimedia Foundation there was the concern that the performance of Wikipedia could suffer. And there was also the idea early on not to store metadata in the various language wikis, but to put it in a central repository that could be accessed by all language versions.

Nonetheless, Semantic MediaWiki continued to develop very dynamically beyond the Wikipedia world in other environments and found, among other things, a strong spread in the areas of research and development, (high) technology and in the business organization.

When the Wikimedia Foundation began working on a centralized metadata management solution for Wikipedia’s many language variants in the early 2010s, Semantic MediaWiki again did not make the cut. Rather, in 2012, the Wikimedia Foundation launched Wikidata, an entirely new project.

For Vrandečić, this was a clear case of the use case and requirements not fitting: in his view, Wikidata was not built on top of Semantic MediaWiki because it was easier to develop a new, specific product to better support multilingualism and to be able to map complex data models.

But now, in addition to Semantic MediaWiki, MediaWiki maintainers also had the option of connecting Wikidata via Wikibase. Finally, another alternative for metadata management came into play in the form of Cargo, which is somewhat more lightweight and thus aims to take into account the differently positioned needs of companies.

Denny Vrandečić sees no problem in this development. For him, whether Semantic MediaWiki, Wikibase or Cargo is used simply depends on the use case. However, he sees the future in the interoperability of these functions. For example, Wikidata and Semantic MediaWiki should work better together in the future.

Semantic MediaWiki is better suited to hold data locally and to enrich, query and visualize it locally. With access to the large, public database Wikidata, one can now enrich this local data. As an example, Denny Vrandecic mentions the display of all ATMs that belong to the Interbank Banking Network. Here one can use OpenStreetMap to display results on a map. Wikidata does not know all the ATMs, but it does provide data on which banks belong to the network. Using Semantic MediaWiki, this can be aggregated into a single query and the result displayed.

For Vrandečić, Semantic MediaWiki has the important function of making different sources available in a local wiki, again allowing him to develop completely new, configurable solutions.

Abstract Wikipedia and Wikifunctions: the solution to the 6 billion article problem?

In the near future, new, helpful sources of data will emerge with the Abstract Wikipedia and Wikifunctions projects. Vrandečić, who initiated the two projects and now also leads them within the Wikimedia Foundation, expects a similar development here as with Semantic MediaWiki: an experimental step-by-step approach.
And he explains very clearly at the end of his keynote what these projects are all about.

Abstract Wikipedia is to help people share more knowledge in different languages. Within Wikidata, people should be able to create and maintain Wikipedia articles in a language-independent way. The result is an abstract that can be machine-translated into other languages.

His example is an article about the scientist Marie Curie. The article in the English Wikipedia is very detailed, in the Amharic Wikipedia the article consists of only one sentence. Now, key statements about Curie such as “the only person to win a Nobel Prize in two different categories” can be written in an abstract and one can also represent the statements in functions. Such a representation is language independent, becomes machine readable, and translation programs can very easily produce a translation from it. The hope is that this will enable the task of writing 6 billion articles, described at the beginning of this article, to be accomplished within existing resources and at a manageable cost.

An important component of the project is a new wiki called Wikifunctions. In the future, collaborative functions will be able to be stored there. Wikifunctions will be a catalog of functions that anyone can call, write, maintain and use. For example, functions that calculate distances between cities. But Wikifunctions will also provide the code that translates the language-independent article from Abstract Wikipedia into the language of the target Wikipedia.

With Wikifunctions, code similar to semantic templates or functions in spreadsheets would be made available to everyone. At the same time, it should become possible to create new functions without programming knowledge. Wikifunctions is intended to democratize code and coding in this way.

As the development of Semantic MediaWiki has already shown, it is quite likely that new and unexpected new areas of application will arise.

Image used on main page: Tobias Schumann, Denny Vrandecic Portrait, CC BY-SA 3.0.