YK: Chapter 15: MediaWiki administration

From Blik
Jump to: navigation, search

15 MediaWiki administration

Administering a MediaWiki wiki is generally not that hard, once you’ve done the initial setup. It involves both actions done via the web interface, and actions done on the back end, like editing LocalSettings.php and installing extensions. Usually there are just one or a handful of people with access to the back end, and the same or a slightly larger group of people with administrative access on the wiki itself.

This entire book is geared in large part toward MediaWiki administrators, so in a sense most of this book could be fit under the topic of “MediaWiki administration”. But this chapter is meant to hold some of the tools and actions that are relevant only to administrators, that didn’t fit in elsewhere.

Configuration settings

There are many settings for core MediaWiki that can be modified in LocalSettings.php ­ essentially, all the variables that start with “$wg”. Some are covered in this book, though it’s a very small percentage of the total set. You can see the full listing here, grouped by functionality type:

https://www.mediawiki.org/wiki/Manual:Configuration_settings

Here are some of the more useful ones, that aren’t mentioned elsewhere in the book:

$wgCategoryPagingLimit ­ sets the maximum number of pages listed in every category page; default is 200

$wgReadOnly ­ sets the entire wiki to be read-only, with the specified string given as the reason; useful for temporary site maintenance

Debugging

MediaWiki is software, and software unfortunately can go wrong. The issue may be file directory permissions, database user permissions, missing files, missing database tables, bad settings in LocalSettings.php, incompatible versions, or even (perish the thought) bugs in the code. (Which, by the way, are much more likely to happen in extensions than in core MediaWiki.)

Generally, the source of the most confusion in MediaWiki comes when users see a blank page in the browser at any point while on the wiki. This happens if there’s an error, and if PHP is configured to show a blank page, instead of the error message, when that happens. It’s almost always better to see the error on the screen; so if that happens, the best solution is to add the following line, either to LocalSettings.php or to PHP’s own php.ini file:

ini_set( 'display_errors', 1 );

If it’s being added to LocalSettings.php, it should be near the top of the file, right under the “<?php” line.

By far the best tool for any kind of debugging is the MediaWiki debug toolbar. It puts all the necessary information (SQL calls, warnings, debug displays) in one easily-accessible place at the bottom of the browser. For those of us used to having done MediaWiki debugging the old-fashioned way, it’s a remarkably useful tool. You can enable it by adding the following to LocalSettings.php:

$wgDebugToolbar = true;

However, you may not want everyone to see the debugging toolbar, during the time it’s enabled (if you enable it, everyone will see it). Or it may not be available, if you’re using a version of MediaWiki before 1.19. In either case, there are other options. If you see an error message that contains the text "(SQL query hidden)", and you want to see the SQL that was called, you can see it by adding the following to LocalSettings.php:

$wgShowSQLErrors = true;

And if the error that’s happening seems to be complex, you can turn on MediaWiki’s own debug logging, and then examine the contents of that file. To turn it on, add the following to LocalSettings.php:

$wgDebugLogFile = "/full/path/to/your/debug/log/file";

This file needs to be writable by your web server.

Often, the easiest solution, as with a lot of software, is just to do a web search on the text of the error message ­ it could well be that others have come across, and maybe diagnosed, this problem. If you believe that the problem is coming from a specific extension, it’s a good idea to check that extension’s main page, or its talk page, to see if there’s any mention of it.

Improving MediaWiki performance

This is not a web performance book, but if you feel your wiki is too slow, or you’re worried about the results of increased traffic in the future, here are some helpful tips:

Make sure your web server and PHP have enough memory assigned to them.

There are a variety of caching tools that can be used in conjunction with MediaWiki (and with each other), like Squid, Varnish and memcached. Of all the available tools, the most useful is probably APC, a PHP caching utility that often dramatically improves MediaWiki’s performance. You can see all the options for caching here:

https://www.mediawiki.org/wiki/Manual:Cache

There’s an effort in place to get MediaWiki to work with HipHop, a PHP compiler developed by Facebook engineers, that is supposed to have even more dramatic performance benefits. This is still an ongoing project (as is HipHop itself). You can see the current status of this effort here:

https://www.mediawiki.org/wiki/HipHop

If you’re using Semantic MediaWiki, there are various ways to guard against queries slowing down the server; these are covered in some detail here:

https://semantic-mediawiki.org/wiki/Speeding_up_Semantic_MediaWiki

The MediaWiki cache

MediaWiki does extensive caching of pages: when you go to a wiki page, chances are that it wasn’t generated on the spot, but rather is a cached version that was created sometime in the previous day or so. (This doesn’t apply to pages in the “Special” namespace, which are generated anew every time.)

Users can always see a “live” version of any page by adding “action=purge” to the URL.

The MagicNoCache extension lets you mark some pages as never to be cached, via the “__NOCACHE__” behavior switch. See here:

https://www.mediawiki.org/wiki/Extension:MagicNoCache

Caching becomes an issue when Semantic MediaWiki is installed, because pages that are cached don’t automatically show the latest set of query results; this can cause confusion to users if they add some data and it then doesn’t appear in query results elsewhere. The best workaround for this problem is to install the MagicNoCache extension, using it on every page that contains a query. (The ’calendar’ query format already disables caching on pages where it’s displayed, in order for the layout to display correctly.)

Another option is to use the Approved Revs extension (see here) ­ although it’s not intentional, pages that have an approved revision don’t get cached. This may change in the future, but at the moment it’s a side effect that one should be aware of.

SMW actually provides a new tab/dropdown, which only administrators see, called “Refresh”, that points to the “action=purge” URL, preventing admins from having to type it in manually.

The job queue

There are certain tasks that MediaWiki has to run over an extended period of time, in the background. The most common case comes when a template is modified. Let’s say that someone adds a category tag to a template ­ that means that every one of the pages that include that template need to be added to that category. This process can’t be done all at once, because it would slow down the server considerably, or even temporarily crash it. Instead, the process is broken down into “jobs”, which are placed in a “job queue” ­ and then those jobs are run in an orderly way.

Behind the scenes, the job queue is really just a database table called “job”, which holds one row for each job. These jobs are run in sequential order, and once a job is run its row is deleted.

Jobs are run every time the wiki gets a page hit. By default, one job is run on every hit, but this number can be modified to make the running of jobs slower or faster, by changing the value of $wgJobRunRate. To make the running of jobs ten times faster, for instance, you would add the following to LocalSettings.php:

$wgJobRunRate = 10;

Conversely, to make it ten times slower, you would set the value to 0.1. (You can’t actually run a fraction of a job ­ instead, having a fractional value sets the probability that a job will be run at any given time.)

You can also cause jobs to be run in a more automated way, instead of just waiting for them to be run (or hitting “reload” in the browser repeatedly to speed up the running). This is done by calling the script runJobs.php, in the MediaWiki /maintenance directory. You can even create a cron job to run runJobs.php on a regular basis -- say, once a day.

There are various parameters that runJobs.php can take, such as setting the maximum number of jobs to be run, or, maybe more importantly, the type of job to be run. To enable the latter, each job type has its own identifier name, which can be found in the database, if nowhere else. You can read about all the parameters for runJobs.php here:

https://www.mediawiki.org/wiki/Manual:RunJobs.php

In addition to core MediaWiki, extensions can create their own jobs as well. Some extensions that do are Data Transfer, DeleteBatch, Nuke and Replace Text.

Admin Links

One feature common in web-based applications, which MediaWiki has always lacked, is a “dashboard” area, that lets the administrator(s) view statistics and perform administrative tasks from one place.

To a limited extent, the page Special:SpecialPages does that already; it simply lists most of the available special pages, grouped by categories. It’s certainly better than nothing, but not all the pages listed in Special:SpecialPages are specifically useful to administrators, and conversely, not all administrative tasks are done via special pages (editing the sidebar, for instance, is not).

The extension Admin Links provides something closer to a real administrator dashboard. It defines a single page, Special:AdminLinks, which holds links that are useful to administrators, separated by functionality type. Other extensions can add their own links to the Admin Links page, if they choose to, via hooks, and a handful do. Figure 15.1 shows what the page looks like when various of the extensions described in this book, like Semantic MediaWiki, Semantic Forms, and Nuke, are installed.


Figure 15.1 Admin Links page

The other nice feature of Admin Links is that it provides a link to the “Admin links” page within the user links at the top of every page, so that the dashboard is always just a click away. Here is how the top of the page looks in the Vector skin, with Admin Links installed:


Replace Text

MediaWiki lacks an innate way to do global search-and-replace of text, a feature that would come in handy when, for instance, the name of a certain template parameter changes, and many pages that call that template have to be modified. On Wikipedia and some other large-scale wikis, bots are used for that purpose, but having a way to do it from within the wiki is a lot more convenient. Thankfully, the Replace Text extension makes it possible to do site-wide text replacements. Replace Text can handle both the contents of pages and their names; if content in a page title is replaced, it means that the page gets "moved". Every change made by Replace Text shows up in page histories, with the user who initiated the replacement appearing as the author of that edit.

To run a replacement, go to Special:ReplaceText. This action is governed by the ’replacetext’ permission, which by default is given to administrators.


Figure 15.2 Top of Special:ReplaceText

You can see the top of the Special:ReplaceText page in Figure 15.2. What follows below that is a list of namespaces that the user can select from; then below that are some additional options for the replacement, which are shown in Figure 15.3.


Figure 15.3 Bottom of Special:ReplaceText

Hitting the “Continue” button brings the user to a second page, listing the exact matches for the search string, so that the user can manually select which pages will have their contents and/or titles modified.

For more complex transformations, you’ll probably have to rely on bots and the MediaWiki API, which we’ll get to next.

Getting user IP information

In rare cases, it can be useful to get IP address information about users who are logged in ­ for example, if a user’s password is stolen, and someone else starts editing the wiki as them; or if you suspect that a single user is vandalizing the wiki from multiple accounts; or if you suspect that a single user is creating multiple accounts to try to give the illusion of widespread consensus on some issue (this is known as “sockpuppeting”). An IP address is actually stored for each change that happens in the wiki, though it’s not visible anywhere in the wiki. If you have access to the database, you can view this information in the “rc_ip” column of the “recentchanges” table.

If you don’t want this information stored, for privacy reasons, you can disable storage by adding the following to LocalSettings.php:

$wgPutIPinRC = false;

Conversely, the CheckUser extension lets administrators view this information from the wiki itself, for easier access:

https://www.mediawiki.org/wiki/Extension:CheckUser

Search engine optimization

Search engine optimization, or SEO, is the practice of attempting to get the pages of one’s web site to show up as high as possible in search-engine results, most notably on Google. It’s a controversial field: to its proponents, it’s an indispensable way to get web traffic, while to its detractors, it’s at best tacky, and at worst the domain of hucksters, spammers and scammers. Nevertheless, for people who run public wikis, showing up high in search results can be important.

First of all, MediaWiki is already well-geared for doing well in search results in a number of ways. Wikipedia, which is of course MediaWiki-based, is the number one best-performing site for search results, by any metric: it’s usually in the top three, and often #1, for a search on any topic it covers. That’s mostly just because it gets linked to so often from other sites about those specific topics, but it’s also in part due to MediaWiki’s own design.

In MediaWiki, the subject of every page is also: the page’s name, a part of its URL, the text in the top-level header, and the text that shows up in internal links to that page. That sort of consistency is extremely important for search engines in associating that word or phrase with that specific URL. Tied in with that, there’s usually only one top-level header per page: the name of the page is contained within the only

tag on the page, which is another thing that helps to establish the page’s subject for search engines. There is at least one active MediaWiki extension that can potentially help further with search-engine rankings: the extension WikiSEO, which adds to the <meta> and <title> tags of a wiki page’s HTML source code. It defines a parser function, appropriately named “#seo”, which can be added anywhere to the page, and which is called in the following way: {{#seo: title=... | titlemode=... | keywords=... | description=... }} The “title=” parameter either replaces, is appended or prepended to the contents of the HTML <title> tag, depending on the value of the “titlemode=” parameter, which can be either replace, append or prepend. The “keywords=” and “description=” parameters get placed as the “name” and “content” attributes, respectively, of an HTML <meta> tag. If you don’t know how best to set all of these tags, it’s a good idea to look up their meaning, and how they should be best used for SEO. You can find more information about WikiSEO here: https://www.mediawiki.org/wiki/Extension:WikiSEO If you’re using infobox-style templates on most pages, a good strategy is to place the tag within the templates, so that you don’t have to add it manually to each page; and then populate it with specific parameters from the infobox. Running a wiki farm It’s not uncommon for organizations and corporations to want to run more than one wiki; sometimes many more. A company that runs public wikis on different topics, for advertising revenue or any other reason, may end up running a large number of them. Internally, companies may want to host more than one wiki as well. Access control to data is one reason, as noted here: the most secure way to keep a set of wiki data restricted to a defined group of users is to keep it in a separate wiki. And different departments within an organization could each want their own wiki, either to keep their data restricted or just because they have little need for sharing data with other groups. In a very large company or other organization, the number of such independent subdivisions that would want their own wiki could number even in the hundreds. Of course, each group that wanted their own wiki could simply set one up themselves; if they all use MediaWiki, installation is free and generally not too difficult. (That, in fact, is how wikis have historically been introduced into organizations: small groups setting them up themselves, in what’s known as "skunkworks" projects). But that kind of setup can quickly become unwieldy: if a different person needs to become a wiki expert for each wiki to be created and maintained, that’s too much work being expended. Even if all the wikis are managed centrally by a single IT person or department, that can become a tedious amount of work when it’s time to upgrade the software. In such a situation, what you should be using is what’s known as a “wiki farm”, or sometimes “wiki family”: a group of wikis that are managed from a single place, and to which it’s easy to add additional wikis. In MediaWiki, there are a variety of ways to create a wiki farm. The best reference for reading about the different approaches, and how to set up each one of them, is here: https://www.mediawiki.org/wiki/Manual:Wiki_family There are many approaches listed on this page: single vs. multiple code bases, single vs. multiple databases, single vs. multiple instances of LocalSettings.php, etc. However, there’s only one approach we really recommend, which is to use a single code base, multiple databases and multiple settings files. This essentially corresponds to the “Drupal-style sites” approach described in that page. We won’t get into the full technical details here, but the basic idea is this: you have a separate database for each wiki, as well as a separate settings file. Each per-wiki settings file gets included from within LocalSettings.php. The individual settings files set the database name for each wiki, and let you customize the wiki’s settings, including standard features like the wiki name, logo, skin and permission; in addition to allowing for extensions that are only included for some wikis. The “Wiki family” manual includes a simple combination of a PHP and shell script for this approach, that together let you create and update the database for each wiki. You also need to decide on a URL structure for the different wikis: the two standard approaches are to use subdomains, like “wiki1.mycompany.com”, or subdirectories, like “mycompany.com/wiki1”. This structure has to be handled by a combination of LocalSettings.php (which has to figure out which settings file to use, based on the URL), and the server configuration, which, if Apache is being used, is usually the file httpd.conf. The specific settings for both are covered within the “Wiki family” manual. If you know ahead of time that you’ll have multiple wikis, it may be helpful to have shared user accounts across all of them, so that users don’t have to create a new account on every wiki that they want to edit. Wikipedia does this in a complex way, using the “CentralAuth” extension, but for other wikis, this can be done in a much simpler way, by just having the various databases share a single set of tables on user information. You just have to decide on which database will hold the information, and then add the following to LocalSettings.php: $wgSharedDB = "main-database-name"; Though “shared DB” sounds like a big deal, by default only tables that have to do with user information are shared. Multi-language wikis Of all the things that wiki administrators typically want to do, possibly the most conceptually tricky is to have their wiki support multiple languages. That’s because there’s a tradeoff in place: you want the text each person reads in their language to be as precise as possible, but at the same time you want to avoid redundancy, because redundancy means more work to try to ensure that the contents in different languages all match each other. First, some good news: the text of the interface itself ­ like the text in the “Edit” and “View history” tabs, or the text in special pages ­ is usually not an issue, because if a user sets their own language under “User preferences”, chances are good that all of that text has been translated into their language, thanks to MediaWiki’s top-notch translation setup. That just leaves the contents of the wiki. For that, the right approach depends mostly on whether the content is meant to be created only by users who speak one language, but read in multiple languages; or whether content is meant to be generated by users speaking multiple languages. There are essentially three approaches. In order from most difficult to least difficult, they are: Separate wiki for each language. This is the Wikipedia approach. You can have a different wiki for each language, ideally identified by each one’s two-letter language code. Pages can then refer to their other-language counterparts via interwiki links (see here), as is done on Wikipedia. This approach is ideal when the content is truly independent for each language, or when you want to ensure that every user experiences the wiki entirely in their language. See “Running a wiki farm”, above, for how to set up such a thing. Multiple translations for each page. You can have one wiki, where each page has multiple translations. The standard approach is to have a main language (such as English), and then allow users to create translation pages for each one, while linking each page to all its other translations via a navigation template at the top or bottom. The standard way to name such pages is via language codes; so, for instance, for a page called “Equipment”, its Amharic translation would be at a page called “Equipment/am”. This approach offers a pragmatic compromise, and it’s fairly popular. The Translate extension makes it easy to go with this approach, by providing a framework for creating and displaying all the translations. Once you have Translate set up, it’s mostly just a matter of adding the right tags to each page that’s meant to be translated: <translate> around blocks of text that should be translated, and <languages />, usually at the top or bottom of the page, to display a bar linking to all the versions of that page in other languages. You can read more about it here: https://www.mediawiki.org/wiki/Extension:Translate Figure 15.4 A bar with links to different translations of a page, provided by the Translate extension Machine translation of content. With this approach, you keep all content in one language, and then just have a mechanism for people to translate the contents via a machine-translation service. The Live Translate extension is the recommended approach for this: it provides an easy-to-use interface, some nice additional features, and it allows for using both the Google and Microsoft translation services. This is by far the easiest approach to multiple languages. You can read about it here: https://www.mediawiki.org/wiki/Extension:Live_Translate