YK: Chapter 15: MediaWiki administration
15 MediaWiki administration
Administering a MediaWiki wiki is generally not that hard, once you’ve done the initial setup. It involves both actions done via the web interface, and actions done on the back end, like editing LocalSettings.php and installing extensions. Usually there are just one or a handful of people with access to the back end, and the same or a slightly larger group of people with administrative access on the wiki itself.
This entire book is geared in large part toward MediaWiki administrators, so in a sense most of this book could be fit under the topic of “MediaWiki administration”. But this chapter is meant to hold some of the tools and actions that are relevant only to administrators, that didn’t fit in elsewhere.
Configuration settings
There are many settings for core MediaWiki that can be modified in LocalSettings.php essentially, all the variables that start with “$wg”. Some are covered in this book, though it’s a very small percentage of the total set. You can see the full listing here, grouped by functionality type:
https://www.mediawiki.org/wiki/Manual:Configuration_settings
Here are some of the more useful ones, that aren’t mentioned elsewhere in the book:
$wgCategoryPagingLimit sets the maximum number of pages listed in every category page; default is 200
$wgReadOnly sets the entire wiki to be read-only, with the specified string given as the reason; useful for temporary site maintenance
Debugging
MediaWiki is software, and software unfortunately can go wrong. The issue may be file directory permissions, database user permissions, missing files, missing database tables, bad settings in LocalSettings.php, incompatible versions, or even (perish the thought) bugs in the code. (Which, by the way, are much more likely to happen in extensions than in core MediaWiki.)
Generally, the source of the most confusion in MediaWiki comes when users see a blank page in the browser at any point while on the wiki. This happens if there’s an error, and if PHP is configured to show a blank page, instead of the error message, when that happens. It’s almost always better to see the error on the screen; so if that happens, the best solution is to add the following line, either to LocalSettings.php or to PHP’s own php.ini file:
ini_set( 'display_errors', 1 );
If it’s being added to LocalSettings.php, it should be near the top of the file, right under the “<?php” line.
By far the best tool for any kind of debugging is the MediaWiki debug toolbar. It puts all the necessary information (SQL calls, warnings, debug displays) in one easily-accessible place at the bottom of the browser. For those of us used to having done MediaWiki debugging the old-fashioned way, it’s a remarkably useful tool. You can enable it by adding the following to LocalSettings.php:
$wgDebugToolbar = true;
However, you may not want everyone to see the debugging toolbar, during the time it’s enabled (if you enable it, everyone will see it). Or it may not be available, if you’re using a version of MediaWiki before 1.19. In either case, there are other options. If you see an error message that contains the text "(SQL query hidden)", and you want to see the SQL that was called, you can see it by adding the following to LocalSettings.php:
$wgShowSQLErrors = true;
And if the error that’s happening seems to be complex, you can turn on MediaWiki’s own debug logging, and then examine the contents of that file. To turn it on, add the following to LocalSettings.php:
$wgDebugLogFile = "/full/path/to/your/debug/log/file";
This file needs to be writable by your web server.
Often, the easiest solution, as with a lot of software, is just to do a web search on the text of the error message it could well be that others have come across, and maybe diagnosed, this problem. If you believe that the problem is coming from a specific extension, it’s a good idea to check that extension’s main page, or its talk page, to see if there’s any mention of it.
Improving MediaWiki performance
This is not a web performance book, but if you feel your wiki is too slow, or you’re worried about the results of increased traffic in the future, here are some helpful tips:
Make sure your web server and PHP have enough memory assigned to them.
There are a variety of caching tools that can be used in conjunction with MediaWiki (and with each other), like Squid, Varnish and memcached. Of all the available tools, the most useful is probably APC, a PHP caching utility that often dramatically improves MediaWiki’s performance. You can see all the options for caching here:
https://www.mediawiki.org/wiki/Manual:Cache
There’s an effort in place to get MediaWiki to work with HipHop, a PHP compiler developed by Facebook engineers, that is supposed to have even more dramatic performance benefits. This is still an ongoing project (as is HipHop itself). You can see the current status of this effort here:
https://www.mediawiki.org/wiki/HipHop
If you’re using Semantic MediaWiki, there are various ways to guard against queries slowing down the server; these are covered in some detail here:
https://semantic-mediawiki.org/wiki/Speeding_up_Semantic_MediaWiki
The MediaWiki cache
MediaWiki does extensive caching of pages: when you go to a wiki page, chances are that it wasn’t generated on the spot, but rather is a cached version that was created sometime in the previous day or so. (This doesn’t apply to pages in the “Special” namespace, which are generated anew every time.)
Users can always see a “live” version of any page by adding “action=purge” to the URL.
The MagicNoCache extension lets you mark some pages as never to be cached, via the “__NOCACHE__” behavior switch. See here:
https://www.mediawiki.org/wiki/Extension:MagicNoCache
Caching becomes an issue when Semantic MediaWiki is installed, because pages that are cached don’t automatically show the latest set of query results; this can cause confusion to users if they add some data and it then doesn’t appear in query results elsewhere. The best workaround for this problem is to install the MagicNoCache extension, using it on every page that contains a query. (The ’calendar’ query format already disables caching on pages where it’s displayed, in order for the layout to display correctly.)
Another option is to use the Approved Revs extension (see here) although it’s not intentional, pages that have an approved revision don’t get cached. This may change in the future, but at the moment it’s a side effect that one should be aware of.
SMW actually provides a new tab/dropdown, which only administrators see, called “Refresh”, that points to the “action=purge” URL, preventing admins from having to type it in manually.
The job queue
There are certain tasks that MediaWiki has to run over an extended period of time, in the background. The most common case comes when a template is modified. Let’s say that someone adds a category tag to a template that means that every one of the pages that include that template need to be added to that category. This process can’t be done all at once, because it would slow down the server considerably, or even temporarily crash it. Instead, the process is broken down into “jobs”, which are placed in a “job queue” and then those jobs are run in an orderly way.
Behind the scenes, the job queue is really just a database table called “job”, which holds one row for each job. These jobs are run in sequential order, and once a job is run its row is deleted.
Jobs are run every time the wiki gets a page hit. By default, one job is run on every hit, but this number can be modified to make the running of jobs slower or faster, by changing the value of $wgJobRunRate. To make the running of jobs ten times faster, for instance, you would add the following to LocalSettings.php:
$wgJobRunRate = 10;
Conversely, to make it ten times slower, you would set the value to 0.1. (You can’t actually run a fraction of a job instead, having a fractional value sets the probability that a job will be run at any given time.)
You can also cause jobs to be run in a more automated way, instead of just waiting for them to be run (or hitting “reload” in the browser repeatedly to speed up the running). This is done by calling the script runJobs.php, in the MediaWiki /maintenance directory. You can even create a cron job to run runJobs.php on a regular basis -- say, once a day.
There are various parameters that runJobs.php can take, such as setting the maximum number of jobs to be run, or, maybe more importantly, the type of job to be run. To enable the latter, each job type has its own identifier name, which can be found in the database, if nowhere else. You can read about all the parameters for runJobs.php here:
https://www.mediawiki.org/wiki/Manual:RunJobs.php
In addition to core MediaWiki, extensions can create their own jobs as well. Some extensions that do are Data Transfer, DeleteBatch, Nuke and Replace Text.
Admin Links
One feature common in web-based applications, which MediaWiki has always lacked, is a “dashboard” area, that lets the administrator(s) view statistics and perform administrative tasks from one place.
To a limited extent, the page Special:SpecialPages does that already; it simply lists most of the available special pages, grouped by categories. It’s certainly better than nothing, but not all the pages listed in Special:SpecialPages are specifically useful to administrators, and conversely, not all administrative tasks are done via special pages (editing the sidebar, for instance, is not).
The extension Admin Links provides something closer to a real administrator dashboard. It defines a single page, Special:AdminLinks, which holds links that are useful to administrators, separated by functionality type. Other extensions can add their own links to the Admin Links page, if they choose to, via hooks, and a handful do. Figure 15.1 shows what the page looks like when various of the extensions described in this book, like Semantic MediaWiki, Semantic Forms, and Nuke, are installed.
Figure 15.1 Admin Links page
The other nice feature of Admin Links is that it provides a link to the “Admin links” page within the user links at the top of every page, so that the dashboard is always just a click away. Here is how the top of the page looks in the Vector skin, with Admin Links installed:
Replace Text
MediaWiki lacks an innate way to do global search-and-replace of text, a feature that would come in handy when, for instance, the name of a certain template parameter changes, and many pages that call that template have to be modified. On Wikipedia and some other large-scale wikis, bots are used for that purpose, but having a way to do it from within the wiki is a lot more convenient. Thankfully, the Replace Text extension makes it possible to do site-wide text replacements. Replace Text can handle both the contents of pages and their names; if content in a page title is replaced, it means that the page gets "moved". Every change made by Replace Text shows up in page histories, with the user who initiated the replacement appearing as the author of that edit.
To run a replacement, go to Special:ReplaceText. This action is governed by the ’replacetext’ permission, which by default is given to administrators.
Figure 15.2 Top of Special:ReplaceText
You can see the top of the Special:ReplaceText page in Figure 15.2. What follows below that is a list of namespaces that the user can select from; then below that are some additional options for the replacement, which are shown in Figure 15.3.
Figure 15.3 Bottom of Special:ReplaceText
Hitting the “Continue” button brings the user to a second page, listing the exact matches for the search string, so that the user can manually select which pages will have their contents and/or titles modified.
For more complex transformations, you’ll probably have to rely on bots and the MediaWiki API, which we’ll get to next.
Getting user IP information
In rare cases, it can be useful to get IP address information about users who are logged in for example, if a user’s password is stolen, and someone else starts editing the wiki as them; or if you suspect that a single user is vandalizing the wiki from multiple accounts; or if you suspect that a single user is creating multiple accounts to try to give the illusion of widespread consensus on some issue (this is known as “sockpuppeting”). An IP address is actually stored for each change that happens in the wiki, though it’s not visible anywhere in the wiki. If you have access to the database, you can view this information in the “rc_ip” column of the “recentchanges” table.
If you don’t want this information stored, for privacy reasons, you can disable storage by adding the following to LocalSettings.php:
$wgPutIPinRC = false;
Conversely, the CheckUser extension lets administrators view this information from the wiki itself, for easier access:
https://www.mediawiki.org/wiki/Extension:CheckUser
Search engine optimization
Search engine optimization, or SEO, is the practice of attempting to get the pages of one’s web site to show up as high as possible in search-engine results, most notably on Google. It’s a controversial field: to its proponents, it’s an indispensable way to get web traffic, while to its detractors, it’s at best tacky, and at worst the domain of hucksters, spammers and scammers. Nevertheless, for people who run public wikis, showing up high in search results can be important.
First of all, MediaWiki is already well-geared for doing well in search results in a number of ways. Wikipedia, which is of course MediaWiki-based, is the number one best-performing site for search results, by any metric: it’s usually in the top three, and often #1, for a search on any topic it covers. That’s mostly just because it gets linked to so often from other sites about those specific topics, but it’s also in part due to MediaWiki’s own design.
In MediaWiki, the subject of every page is also: the page’s name, a part of its URL, the text in the top-level header, and the text that shows up in internal links to that page. That sort of consistency is extremely important for search engines in associating that word or phrase with that specific URL. Tied in with that, there’s usually only one top-level header per page: the name of the page is contained within the only