YK: Chapter 16: Semantic MediaWiki
Envelope-to: mediawiki@jefsey.com
X-Mailer: QUALCOMM Windows Eudora Version 7.1.0.9
Date: Mon, 13 Jun 2016 17:44:29 +0200
To: mediawiki@jefsey.com
From: Jefsey <jefsey@jefsey.com>
Subject: CHAPITRE 16
X-EsetId: 37303A2975F2276F627D62
16 Semantic MediaWiki
Semantic MediaWiki is an extremely important extension to MediaWiki. It defines a framework for storing data in the wiki, and then querying it which has the effect of turning a wiki, which is often considered to be just a storage system for text and images, into something more like a database. SMW, as it’s abbreviated, is great by itself, but it’s when it is used together with spinoff extensions that it starts to become (dare I say) magical. SMW has over 50 spinoff extensions, that cover all aspects from entry of data, to browsing and search, to visualization, to enhanced storage, etc. Some extensions are more popular than others, and we’ll cover the extensions that are most essential in this book.
In conjunction with its extensions, SMW can transform a regular wiki into a kind of collaborative database. It’s much more collaborative than a regular database because the version history stored by the wiki means that you can open up editing of all your data to any number of people something that is very rarely possible with a standard database-backed application.
Why is it called “Semantic MediaWiki”? “Semantic” is a word that, in its most general form, indicates meaning not the display or exact phrasing of text (i.e., the “syntax”), but rather its underlying meaning. In modern context, the term “Semantic Web” has gained a lot of buzz since the early-to-mid-2000s. Ironically, the meaning of the phrase “Semantic Web” is itself ambiguous (see here), but the main idea behind it is to get at an underlying meaning of the text that readers see online a meaning that that can then be reused and processed, by humans as well as machines. And that is the main idea behind Semantic MediaWiki as well.
Semantic MediaWiki may not be the single most important MediaWiki extension (ParserFunctions probably holds that title), but it is clearly the one that has taken on the greatest life of its own. Besides the 50+ extensions that make use of it, it also has a thriving community of users and developers, at least some of whom would consider themselves SMW users first and MediaWiki users second. As of 2014, SMW is in use on most likely over 1,000 active wikis. SMW has its own website (semantic-mediawiki.org), its own mailing lists and IRC channel, and its own conference (the twice-yearly SMWCon). There’s no other MediaWiki extension that has anything comparable.
Semantic MediaWiki was created in 2005 by Markus Krötzsch and Denny Vrande i . It was originally conceived as functionality for Wikipedia a way to store data in order to make the hundreds of thousands of manually-generated lists and categories on Wikipedia less necessary. Its use on regular MediaWiki was, at least at the beginning, only of secondary importance to its creators. That quickly changed, as regular wiki users started to discover its benefits and embrace the technology, while the Wikimedia Foundation adopted a "wait and see" attitude on SMW.
The original dream is now coming to fruition, with the Wikidata project, which started in mid-2012, and was originally headed by Vrande i . Wikidata is an extremely exciting project that aims to create a single data repository for all the different language Wikipedias, so that they can all populate their structured data, like infoboxes, automatically. One type of data, links to the same article in other languages, is already handled exclusively via Wikidata. Some of Semantic MediaWiki’s code to handle back-end storage was spun off into separate libraries, now used by both SMW and Wikidata (or, more specifically, Wikibase, the software powering Wikidata) to store their data.
The current structure of Wikidata is quite a bit different from how SMW was originally proposed for use on Wikipedia. Most notably, Wikidata is meant to support hundreds of languages at the same time. And the current syntax for storing and querying data via Wikidata is almost completely different from the standard SMW syntax. So it’s possible to make too much of the fact that Wikidata’s storage component will be code that originated in SMW. Still, if Wikidata is successful, it could end up raising Semantic MediaWiki’s profile considerably. This would be a nice side effect to Wikidata’s main goal, which is to create the largest structured database of general-knowledge information in the history of the world.
But enough about Wikidata: for the rest of this chapter, we’ll focus on Semantic MediaWiki as it is used in regular wikis, and explain the many benefits it can provide. If you got this book only to read about core MediaWiki, hopefully you’ll still read the following chapters, because in my opinion SMW can provide benefits to nearly every wiki.
How SMW works: an example
Let’s say you have a wiki about wines. Now, you want to be able to see a list of all the Chardonnay wines grown in the South of France. On a typical wiki, whether it’s Wikipedia or anything else (even, for the most part, non-MediaWiki wikis), there are essentially two options: you can compile such a list manually on some wiki page, or you can tag all such pages (assuming there’s a page about every wine) with a category like “Chardonnay wines from the South of France”.
Both types of actions are done on Wikipedia all the time, and on many other wikis as well. However, they both have problems: the first option, manually compiling a list, takes a lot of work, and requires modifying each time a new wine page is added that would go that list, or when some error is discovered. In the second case, the list (on the category page) is generated automatically, but the information has to be added painstakingly to each page. And if you’re expecting users to do it, they need to be given precise instructions on how to add categories and what the categories should be named (should it be “the South of” or “Southern”?), and in general, what the ideal data structure should be. Should there be a “Chardonnay” category for each country covered in the wiki, even those with only one or two wines to their name? And, conversely, should countries, or regions, with many wines to their name be further split up, say by year? Or should the year be tagged with a separate category?
Semantic MediaWiki offers a solution to this problem. Instead of compiling lists, or having an overload of categories, you can define a single infobox template meant to be put on wine pages, that both displays the relevant information (region, variety, year, etc.) for each wine, and stores that information in a way that can be queried. So instead of having to manage a large and probably somewhat chaotic set of categories, you can keep the data structure simple, and move the complexity (such as there is) to the queries that display the data.
What about the infoboxes - isn’t it still difficult for users to learn how to add and populate those? For that, there’s the Semantic Forms extension, covered in the next chapter, that provides forms so that users don’t need to see the underlying wikitext syntax in order to create and change data.
Finally, SMW, in conjunction with some other SMW-based extensions, allows you to go one better than simply displaying information in lists or categories - for our example, you can show the information in tables, you can display wines on a map, you can aggregate wines by country, year, etc. to show their breakdown, and you can allow users to do faceted searches on all of those fields in order to find the wine(s) that interest them.
Semantic triples and properties
Semantic MediaWiki is built around data, and in Semantic MediaWiki every piece of data is represented as a “ triple” a semantic-web concept that indicates a three-part structure: a subject, a predicate and an object. An example of a triple would be:
Canada Has capital Ottawa
“Canada” is the subject; “Has capital” is the predicate, or relationship between the two concepts; and “Ottawa” is the object.
In Semantic MediaWiki, the predicate is known as the “property”. And the subject is always the page on which the value is stored.
The easiest way to encode this specific triple would be to go to the page on the wiki named “Canada”, click on “edit”, and write the following:
This looks similar to the MediaWiki syntax for encoding links, and for storing category information. The difference here compared to both of those is that there are two colons instead of one. "Has capital" here represents the predicate, or middle value, or property, of the triple.
What gets displayed on the page? It depends on the “type” of the property. If the property is defined as being of type “Page”, i.e. a link, then “Ottawa” shows up as a link. If it’s defined as being of type “String” or “Text”, then it shows up on the page as simply a string. (There are other property types, but they wouldn’t make sense for this case.) We’ll get to property types in the next section.
Semantic MediaWiki offers another way to store data, and that’s the #set parser function. Here’s how it would be called for this case:
{{#set:Has capital=Ottawa}}
In the case of #set, nothing is shown on the screen: #set works “silently”, and stores data without displaying anything. There are various cases in which you may want to store a value without displaying it, and #set is ideal for those cases.
Defining properties
Every property in Semantic MediaWiki has a type. The type of a property dictates how that property’s values are displayed on the page, how its values are displayed and handled elsewhere, and what kind of values are allowed for that property. By default, properties are of type Page, though it’s good to always define the type explicitly.
How is a property’s type defined? With yet more semantic annotation. Every property has its own page on the wiki, in the “Property” namespace (or whatever the corresponding name is in the wiki’s language). So the page for the “Has capital” property would be named “Property:Has capital”. In that page, you could add the following:
This would define the “Has capital” property to be of type Page. “Has type” is what’s known as a “special property”: a property that’s pre-defined in SMW with special meaning.
You could also define the property to be a simple string of characters, by adding the following instead:
There are various other standard types defined in SMW. The current full set is: Page, Text, Number, Boolean, Date, URL, Email, Telephone number, Code, Quantity and Temperature. Most of these are (hopefully) obvious from their name:
“Page”, as noted before, holds the name of a wiki page.
“Text” holds text values. Until version 1.9 of SMW, text values were stored using two different property types, “String” and “Text”: “String” could hold only up to 255 characters, while “Text” could hold an unlimited number, but could not be queried on (though they can both be displayed in queries). Since SMW 1.9, “Text” can hold an unlimited number of characters and can also be queried on; although for now it’s only the first 40 characters of the string that are searched by queries.
“Number” can hold any integer or decimal number.
“Boolean” can take in any of a number of values meaning “true” and “false”: “yes” and “no” are allowed, as well as values specific to the language of the wiki.
“Date”, “URL”, “Email” and “Telephone number” hold the information that you’d expect them to hold, and are displayed (and linked) appropriately.
“Code” is a minor type that’s basically the same as “Text”, but meant to be displayed in a pre-formatted way.
“Temperature” and “Quantity” are covered in an upcoming section, “Custom units”.
There’s another property type, “Record”, but it’s obsolete, and is best ignored.
Setting allowed values
There are also properties for which you may want to predefine a set of allowed values; in programming terms, these are usually known as "enumerations". These can be defined in SMW as well, though not with a special type: rather, each allowed value is specified on the property’s page, using the "Allows value" special property. For instance, if we wanted to define a property called "Has day of week", we might add the following to its page:
The allowed values for this property are:
Enumerations can be of any type, although in practice, they’re almost always either of type Page or Text (String before version 1.9).
Creating property pages
You can of course hand-create any property page. The easiest way to create properties, though, is via the extension Semantic Forms (see here). If you have Semantic Forms installed, then going to any uncreated property page should show a “create with form” tab near the “create” tab, which brings you to a form that just needs to be filled out and saved. You can also use either the Special:CreateProperty or the Special:CreateClass pages, both defined by Semantic Forms as well.
Custom units
You can also define properties that are stored in units for weight, distance, energy and so on. These can let you convert between different units: the special property “Corresponds to” lets you define the conversion between one and another.
For example, to define a property called “Has distance” that by default is displayed in miles, you would put the following in the page “Property:Has distance”:
Corresponds to::1.609 km, kilometers, kilometres
If a value is stored using the property “Has distance”, it will then always be displayed in miles when it’s queried, or exported via RDF ( see here).
The “Temperature” type is the one type with units that’s pre-defined in Semantic MediaWiki it’s the exception because conversion between temperature types can’t be done through simple multiplication.
Special properties
In addition to user-defined properties, there are also properties that can be defined by the code, i.e. by Semantic MediaWiki and other extensions. These are called “special properties”. We’ve already looked at two of them: “Has type” and “Allows value”. There are also special properties defined by other extensions, like “Has default form” defined by Semantic Forms; we’ll get to those later.
Special properties should never be used for a purpose other than their intended one. This occasionally happens with “Has type”, because it has such a generic name. But this will result in strange behavior; if you’re thinking of creating a property called “Has type”, you should use a name like “Is of type”, “Has car type”, etc. instead.
Wikis in languages other than English will have their own translations for such special properties, although the English-language version should always work as well.
There are also special properties that are not meant to be set by the user, but rather are stored automatically. Two such special properties are “Modification date” and “Has improper value for”. The first is stored, by default, for every page it holds the date on which the page was last modified, and is useful for showing lists of recently-edited pages. The second, “Has improper value for”, shows, for every page, each of the properties it holds that has a value that’s not allowed.
The set of special properties stored for pages can be changed in addition to showing the last modified date, you can also show the creation date, the username of the last editor, etc. The values that are stored can be set via the $smwgPageSpecialProperties setting, described here:
http://semantic-mediawiki.org/wiki/Help:$smwgPageSpecialProperties
Another extension, Semantic Extra Special Properties, lets you store additional metadata, such as the full set of users who edited a page; see here.
Queries
What can we do with the data once it’s stored? Most obviously, we can query it. Queries in SMW are done using a parser function called #ask. Here is a simple call to get the list of countries in a wiki, and their capitals:
{{#ask:|?Has capital}}
Let’s go through the components of this query. The first, “”, is the “filter” it defines which pages get queried; in this case, all pages in the “Countries” category. The second is the “printouts” section it lists the property or properties (in this case, just “Has capital”) that is or are printed out in the output generated. Each property to be printed out is placed after a question mark. By default, this information would be shown in a table, which would look something like the following:
Has capital
Afghanistan Kabul
Albania Tirana
And so on. In this example, the values are all links, because, “Has capital” is a property of type Page, and thus every string in the table happens to have its own wiki page.
The top row is the header row. By default it’s rather cryptic-looking, displaying a property name for each printout column, and blank for the page name column. The following query would have a nicer display for the top row:
{{#ask: |?Has capital=Capital |mainlabel=Country}}
“mainlabel” is just one of the many parameters that #ask queries can take we’ll get to the full set later.
By far the most common usage for queries is to display page sets to list either an entire set of pages that match the general conditions, or the set of pages that match a particular "parent" page. For instance, if a particular wiki is a location directory that holds information about each specific location, there will probably be a separate page for any area, such as a city or country; so we could then add to each such page a query listing all its “children”, i.e. all the places of interest that are listed as being in that area. In a wiki that holds information about museums, for example, we could, add to a page called "Santiago" a query like the following:
{{#ask:Has city::Santiago}}
This makes it easy for users to see aggregated data without having to create queries or otherwise run a search: by simply going to the page called "Santiago", they can see the full list of museums there, formatted in a way that makes sense for this particular wiki (in addition to any specific information presented about Santiago).
If there’s any more than one or two cities, countries etc. in the wiki, it probably makes sense to create templates called “Country”, “City”, etc. to be used in those pages; and then to have each such template hold an aggregating query (in addition to any data we might want specifically about those places). Here’s a query that could go into a template called “City”:
{{#ask:Has city::YK: Chapter 16: Semantic MediaWiki}}
Adding a call to the “City” template will then display, for any city, the set of museums in that city.
You can also perform what are called inverse queries, where you query on a property in its reverse direction (this can only be done for properties of type Page). Inverse queries are done by adding a “-” before the property name. They are usually not that useful, but in certain cases it can be. Some countries have more than one capital South Africa has three, for instance so you could do an inverse query on “Has capital” to list all of a country’s capitals, and information about each one. Here is a query to display all of the capitals of South Africa, and the total area of each of them:
{{#ask:-Has capital::South Africa |?Has total area}}
In addition to just querying on specific values, you can also do “greater than” or “less than” queries, using the operators “::>” and “::<” (though by default, these actually test for “greater than or equal” and “less than or equal”). Here’s how to get a list of countries, and their populations, for only countries with a population of greater than or equal to 10 million:
{{#ask:[[Has population::>10000000]] |?Has population=Population |mainlabel=Country}}
And for String and Page properties, you can also use the “::~” operator to find partial string matches. Here’s how to get the set of countries with “New” in the name of their capital:
{{#ask:Has capital::~*New* |?Has capital=Capital |mainlabel=Country}}
The “::~” check may or may not be case-sensitive, depending on the configuration of your database server.
There are various other standard parameters that you can add to #ask queries, to modify the set of results and their display:
format= - sets the display format for the result (this is covered in depth in the next main section, “Display formats”).
limit= - sets the number of pages to return (the default is usually 20).
sort= - the property name, or names, on which to sort.
order= - the order in which to sort values, if they’re sorted; should be ascending (which is the default), descending, or random.
headers= - whether the headers (by default, the property names) should be shown; the options are show (the default), hide, or plain, which shows headers but not as links.
mainlabel= - the header given to the page names themselves; if this is set to “-”, the page names are not displayed.
link= - sets what parts of results should be links the options are all (the default), none, or subject, where only page names are links but printouts are not.
default= - the default text printed if there are no results.
intro= - introductory text printed, if there are results.
outro= - concluding text printed, if there are results.
searchlabel= - text for the “further results” link; by default, it’s “«… further results»”.
offset= - the result number at which to start displaying (this is used for pagination, and is rarely included explicitly in queries).
Displaying individual values
- ask is geared for showing lists, tables and other aggregated displays. But what if you want to show just a single value? For that, there’s the #show parser function, which has a similar syntax to #ask, but a simpler one. The following call would simply display the text “Brasilia”, for example:
{{#show:Brazil |?Has capital |link=none}}
Linked properties and subqueries
It’s usually a good idea to avoid data redundancy. For every museum, you could store both its city and country, but if every city page already has its country stored, then that’s unnecessary: you could just store the city, and query on the rest.
Let’s take a practical example: say your museum wiki only stores the city for each museum, with the property “Has city”; and the country for each city is then stored on pages for each city, with the property “Has country”. (We’ll ignore for now the problem of different cities with the same name in different countries, and assume that a museum tagged as being in, say, “Moscow” is always in the one in Russia, as opposed to in another city with the same name.) With such a structure, how can you find all the museums in a certain country? You could do it with a query like:
{{#ask:Has city.Has country::Nepal}}
Here, “Has city” and “Has country” are what’s known as linked properties the period between the names defines the linking. The query looks for pages that have the property “Has city” pointing to a page that in turn has the property “Has country” with the value “Nepal”.
You can even do more complex queries using subqueries, which are queries contained within a tag. Here’s one example:
{{#ask:[[Has city::[[Has population::>100000]]Has country::Argentina
]]}}
This query will find all museums in any city with 100,000 or more people in Argentina.
Unfortunately, both linked properties and subqueries work only on the “filter” part of the query, and not on the “display” part so your query can’t contain a property printout like “? Has city.Has country”. That limitation is there for performance reasons, but it has definitely caused problems.
Display formats
Display formats, also known as result formats and query formats, are extremely important: they’re a way to set the display of the data returned by queries, if you want to show it in a way more interesting than just lists or tables. To set the display format, you just need to add the parameter “format=...” to the #ask query.
There’s a whole extension devoted to just holding various display formats, Semantic Result Formats, and another one, Semantic Maps, which holds formats related to mapping; both are described in Chapter 18. But there are various basic formats that are defined within Semantic MediaWiki itself:
list - displays results as a simple list, separated by default by commas. This is the default display when only page names are queried.
ul - a bulleted list.
ol - a numbered list (“ul” and “ol” are both the names of the relevant HTML tags used they stand for “unordered list” and “ordered list”, respectively).
table - a table of data. This is the default display when there are additional printouts in the query.
broadtable - a broadtable. This is identical to the “table” format, except that the width of the table is 100% of the page.
category - displays results in the format that pages appear in on category pages, with a separate header for each new starting letter.
template - applies a template to set the display of each query result; see below.
csv, dsv, json, rss, rdf - machine-readable data formats; these are discussed here.
count - simply displays the number of pages that match the query criteria.
embedded - displays each page that matches the query criteria, in full, one after the other. (This format unfortunately causes each of those pages’ categories to be applied to the page that holds the query.)
debug - displays a printout of the database queries used for this query; useful only for debugging.
Some of these formats have their own custom parameters that can be used, in addition to all the standard parameters. The “category” format, for instance, allows for a “columns=” parameter, which sets the number of columns into which to split results. The best way to see the entire set of parameters for each format is to go to the page Special:Ask on the wiki, which shows the name and a brief description for each one. Special:Ask is described in the upcoming section, “Semantic search page”.
Query templates
Using templates to display query results is a very versatile approach, which lets you apply custom formatting and text around the set of properties displayed for each query result. It’s of course available for the “template” format, but it’s also available for many other formats, including “list”, “ol”, “ul”, “category” and various formats defined in the extensions Semantic Result Formats and Semantic Maps, like “calendar” and “maps”. (We’ll get to these extensions and formats in Chapter 18.) To apply a template, you need to create a template on the wiki that takes in values and applies some formatting to them, then add “|template=template name” to the #ask query.
The template needs to have numbered parameters for each value, starting with 1, where the first parameter is passed in the page name. Here’s an example of the contents a template that could be used to display information about music albums, if the additional query printouts are for the artist, year and genre:
', () - genre:
Here’s what a query that called that template could look like, if that template were named “Album display”:
{{#ask:|format=ul|template=Album display|?Has artist|?Released in year|?Has genre}}
Notice that the format here is “ul”, not “template” that’s so each row will appear as a nice bulleted item. The output would be a series of lines that looked like this:
Computer World, Kraftwerk (1981) - genre: Electronic
Crescent, John Coltrane (1964) - genre: Jazz
Here, you can see that even simple formatting can serve to make the display of data much more legible and reader-friendly.
Concepts
Categories ( see here) are the basic building block, within both MediaWiki and Semantic MediaWiki, for defining a collection of pages, but they’re not the only way to do it. Semantic MediaWiki also provides for "concepts", which are essentially the set of pages that correspond to a particular query: you can think of a concept as a query that can be referred to.
Let’s take a simple case: you have a category called “Cars”, and each page for a car has, among other fields, one indicating the car’s country of origin: Germany, England, etc. You could indicate the country using a category, e.g. “German cars” etc., but by now you know that using SMW’s semantic properties is the much better way to go. So you instead go with a property like "Has nationality". But what if you have a lot of queries on your wiki that refer to, say, Japanese cars, and you’re tired of typing “Has nationality::Japan” every time, and you long for the simplicity of “”? In that case, concepts are the answer.
Concepts are defined within their own namespace, which in English is "Concept:", and they use the parser function #concept. So you could create a page called "Concept:Japanese cars", which just contains the following text:
{{#concept:Has nationality::Japan}}
A #concept function call looks just like a call to #ask, but with no display-related parameters: the call contains only the filtering for the set of pages. After it’s defined, you could add "Concept:Japanese cars" into any query, and it would work just like a category tag. For example, you could have the following query:
{{#ask:Concept:Japanese carsHas layout::Front-wheel drive|?Has size |?Has manufacturer}}
The concept page itself will list all the pages it "contains", i.e. those that match its query, just like a category page does; and the display of a concept page mimics that of a category page (see here).
Concepts are also useful with at least two other extensions: Semantic Forms and Semantic Watchlist. With Semantic Forms, you can have either autocompletion or a dropdown in a form based on the pages in a concept, just as you could with the pages in a category. And with Semantic Watchlist, you can watch for changes in the set of pages contained within a concept. Both extensions are covered later in the book.
Semantic search page
Semantic MediaWiki provides a special page, at Special:Ask, that has an interface for constructing queries. It has separate fields for the query filters, the query printouts, and all additional parameters, like the query format. Special:Ask has four purposes:
It can be used to query the data on the wiki.
It can be used as a “wizard” a helpful interface to generate a query that is then placed somewhere in the wiki. Once a query is created, Special:Ask lets you view the corresponding #ask call, which you can then copy and paste into any page.
It is also used by regular queries if a query has more than a certain number of results, only some will be displayed (usually 20), and then there will be a link below that says “View more results”. This link takes you to the Special:Ask page, where the first set of results are displayed. Special:Ask has pagination, so you can scroll through all the results, no matter how many there are.
Special:Ask is also used to display the export query formats like CSV and JSON; in this way, Special:Ask can function as an API for SMW data (though not the most recommended one that would be the “smwask” API action; see here).
Storing compound data
Not all data can be stored using simple properties. Specifically, “two-dimensional” data data that is usually displayed in a table cannot be stored using regular Semantic MediaWiki properties.
Let’s take as an example the set of ingredients for a recipe which you can think of as a table of data, with each row for an ingredient corresponding to a row in a table. One of the recipe’s rows calls for 3 tomatoes. How would you store that information? You could add the tag “Has ingredient::tomatoes”, but a tag like “Has quantity::3” wouldn’t work it wouldn’t be clear which ingredient that applies to. So you could create a separate page for each row; but this would lead to a large number of pages 10 or more for each recipe that could easily become overwhelming. (To take one example, if you wanted to delete a recipe, you would have to delete all the ingredient pages for it.) Nearly as bad, there’s no obvious naming system to use for each ingredient page you would have to go with something along the lines of “Lasagna recipe tomatoes row”, or, even more cryptically, “Recipe row 73411”. In any case, maintaining all these pages could become a nightmare.
Instead, the recommended solution is to store all this information (in this example, the entire recipe) in a single page. There are two approaches that allow this: the #subobject parser function, and the Semantic Internal Objects extension. The two have a slightly different syntax, but are otherwise essentially the same.
Subobjects
- subobject is a parser function defined by Semantic MediaWiki, that lets you store compound data of this sort. A call to #subobject is defined as:
{{#subobject:subobject name |property 1=value 1 |property 2=value 2 |...}}
For the original example, in a page called “Greek salad”, you could have a call to #subobject that looks like:
{{#subobject:- |Has ingredient=tomatoes |Has quantity=3 |Is row in recipe=YK: Chapter 16: Semantic MediaWiki }}
It’s recommended to always include a property pointing back to the main page (using YK: Chapter 16: Semantic MediaWiki), to make querying easier.
To display all the rows in this recipe from another page, you could run a query like this one:
{{#ask:Is row in recipe::Greek salad |?Has ingredient |?Has quantity |mainlabel=-}}
Why is “mainlabel=-” in there? Because here, as in many cases, there’s no reason to display the name of each row - which were
To show all the recipe pages that call for at least two tomatoes, you could run this query:
{{#ask:[[-Is row in recipe::Has ingredient:tomatoes[[Has quantity::>2]]
}}
Note the inverse query.
Why is the first value passed to #subobject a “-”? (It could also be blank.) It’s because that parameter can take in a subobject name, if you want to pass in a pre-set name like “Tomatoes row”. In practice, this is very rarely done.
Semantic Internal Objects
The Semantic Internal Objects extension provides a very similar approach to storing compound data. It defines a parser function, #set_internal, that holds what it calls an “internal object” within the page that has semantic properties of its own, as well as a property that links the object back to the page. Any number of internal objects can be defined for a single page. A call to #set_internal is defined as:
{{#set_internal: object-to-page property |property 1=value 1 |property 2=value 2 |...}}
With #set_internal, objects are never given a name a name is always automatically assigned to each one. And there’s a specific parameter for setting the property pointing from the object to the page.
Let’s see how this works in action. For the original example, in the page called "Greek salad" you could have a call to #set_internal that looks like:
{{#set_internal:Is row in recipe |Has ingredient=tomatoes |Has quantity=3}}
The query to get all the rows in a single recipe is then identical to the last one we saw for #subobject:
{{#ask:Is row in recipe::Greek salad |?Has ingredient |?Has quantity |mainlabel=-}}
Why use SIO instead of subobjects? In most cases, the only reason to go with #set_internal is the slightly nicer syntax.
Recurring events
There’s one case within Semantic MediaWiki where you can store values via a formula, instead of just manually entering them, and that’s for recurring events. A recurring event is any event that happens on a regular basis: a birthday, a weekly meeting, a once-a-month deadline, etc. The standard way to store a recurring event is to use the #set_recurring_event parser function. Let’s take an example: let’s say you plan to have a weekly sales meeting every Monday, for a year and a half. To complicate things, let’s also say that, for scheduling reasons, on two of those weeks the event should instead be held on Tuesday. On a page called “Weekly sales meeting”, you could accomplish that with the following call:
{{#set_recurring_event:Is instance of
|property=Has date
|start=January 7, 2013
|end=June 9, 2014
|unit=week
|period=1
|include=March 19, 2013;March 26, 2013
|exclude=March 18, 2013;March 25, 2013
}}
This defines a weekly event, that is composed of a group of subobjects (see previous section) in the “Weekly sales meeting”, each of which points to its parent page using the property “Is instance of”. Each of these subobjects also has a property called “Has date” that property is set by the “property” parameter, and it has to be of type “Date”. The parameters “unit” and “period” together define the frequency of the event. “unit” can be any of the values ’year’, ’month’, ’week’ and ’day’, while “period” is an integer. If this were an event that happened every two weeks, the unit would be ’week’, and the period value would be 2.
The “include” and “exclude” parameters let you manually change the set of date values, if necessary.
What if this event happened on, say, the 3rd Wednesday of every month? For that, there’s an additional parameter, “week number”. If you have “unit=month”, and add “week number=3” to the call, then, if the start date falls on a Wednesday, every automatically-generated date will fall on the 3rd Wednesday of the month.
There is one notable limitation to #set_recurring_event: it doesn’t allow for defining a duration of the event, so that you can, for instance, specify that your weekly meeting always runs one hour, between 12 and 1 PM. This is a weakness that will hopefully get addressed in upcoming versions.
To display a list of the four next weekly sales meetings, you could have a query like the following:
{{#ask:Is instance of::Weekly sales meeting
[[Has date::>2024-12-22]]
|mainlabel=-
|?Has date
|format=ul
}}
(“CURRENTYEAR”, “CURRENTMONTH” and “CURRENTDAY” are all pre-defined variables within MediaWiki. There are various ways to encode the current date within queries, but this is the most standard one.)
Refreshing data
On rare occasions, it can be helpful to refresh all of Semantic MediaWiki’s data. Usually, the data stored accurately reflects the contents of the wiki’s pages; and when a template is re-saved, all the pages that call that template automatically get their semantic data refreshed, so changes to the data structure don’t require any additional action. However, there are times when a mass refresh is useful. One case is when some of the semantic values are calculated, instead of being retrieved directly from the page, like if values are themselves the results of queries. Another case is if something went wrong during the initial storage.
There are two ways to do a mass refresh of a wiki’s SMW data. The first is to press the button “Start updating data” in the page Special:SMWAdmin.
The second is to call the script “SMW_refreshData.php”, located in SMW’s /maintenance directory. There are various parameters for this script; you can see the full list of options here:
https://semantic-mediawiki.org/wiki/Help:Repairing_SMW%27s_data
Tooltips
You may want to have “tooltips”, i.e. little icons that a user can click on to see a popup with additional information. These are useful in both regular pages and in forms (Chapter 17). For no strong reason, the best way to display these is defined within Semantic MediaWiki: the #info parser function. To display a tooltip, just place a call like the following anywhere on a page, template or form:
{{#info:Here is some additional information!}}
This will produce an icon where the #info tag was placed, which, when clicked on (and, for more recent versions of SMW, hovered over), brings up a tooltip balloon.
Here is an example screenshot of the result of #info, from a hypothetical chemistry wiki:
[]
RDF and SPARQL
Although the extension is named Semantic MediaWiki, this chapter has for the most part not covered anything related to the so-called Semantic Web.
What is the Semantic Web? The term has been used, arguably to the point of breaking, to refer to at least four mostly-unrelated things: technologies like RDF and SPARQL; publishing data online in any sort of structured way; computers trying to read and understand text on the web (which can include analysis of both facts and opinion); and computers trying to understand and answer natural-language questions. We’ll just look at the first one, because it’s the one that Semantic MediaWiki can in fact make use of.
RDF, which stands for Resource Description Framework, is a framework for storing data in the form of triples (of the kind that SMW itself uses). In some cases, what are called “triples” are actually “quads”, with the fourth element holding information about the context of the triple. There are RDF triplestores (sometimes they’re quadstores), which are essentially databases that are geared specifically to hold and query on data in RDF form. Just like MySQL and Oracle are examples of relational database systems, there are a variety of RDF triplestore systems: some of the best-known ones are Virtuoso, Jena and 4store.
SPARQL, which stands for SPARQL Protocol and RDF Query Language (it’s a recursive acronym), is a query language specifically for querying and modifying RDF data. It works like SQL (a language that does the same thing for relational databases), and its syntax is somewhat similar.
You can set up SMW to store its data in a triplestore, and then to use that triplestore when running #ask queries. If you do that configuration, SMW will still store its data in the regular relational database as well there are a few cases, like special properties, where SMW queries the relational database even if a triplestore exists. Still, it can be helpful to store data in an RDF triplestore. The main advantage is that it lets outside systems query that data directly, using SPARQL. Then, in theory, data from the wiki can be queried at the same time as RDF data from other systems and websites. That’s because one very nice thing about SPARQL, which makes it different from SQL, is that you can construct queries that access any number of RDF sources at the same time. There are also some other potential advantages: in theory, the performance should be faster, since RDF triplestores are optimized for the querying of triples; though no comparative study has been done for SMW. Another advantage is that querying of the semantic data is now on a different system than the normal operation of MediaWiki, so if one of the two becomes bogged down, the other should still work fine.
Setting up SMW to work with an RDF triplestore isn’t that hard, once you have the actual triplestore set up. You can read more about the process and configuration options here:
https://semantic-mediawiki.org/wiki/Help:Using_SPARQL_and_RDF_stores
Additional resources
There are several resources available if you need help with Semantic MediaWiki, or any of its related extensions. Two mailing lists exist: semediawiki-user and semediawiki-devel; the first for users, and the second for developers. And if you’re on IRC, the SMW IRC channel can also be a helpful resource. You can find links and instructions for these at this page:
https://semantic-mediawiki.org/wiki/Help:Getting_support
Additionally, Semantic MediaWiki has a two-page quick reference, or “cheat sheet”, available here:
https://semantic-mediawiki.org/wiki/File:SMW_quick_reference.pdf
It covers not just SMW, but related extensions like Semantic Result Formats, Semantic Forms and Semantic Drilldown all of which we’ll get to in the following chapters. If you’re planning to do any substantial work with these extensions, it’s worth printing out.