YK: Chapter 23: MediaWiki development: a guide for the accidental developer

From Blik
Jump to: navigation, search

23 MediaWiki development: a guide for the accidental developer


This book will not contain any direct guide to MediaWiki development. A really comprehensive guide unfortunately doesn’t exist yet, though this page has links to a lot of good documentation:

https://www.mediawiki.org/wiki/How_to_become_a_MediaWiki_hacker


This section will describe, in general terms, how we would recommend getting started with MediaWiki programming. Because it’s general, a lot of this information holds true for dealing with any open-source software.

Most people who have done MediaWiki programming, that is, people who have modified MediaWiki code with the intent of changing or adding to its behavior, did not set out to become MediaWiki programmers, and probably still don’t think of themselves as such. Rather, they just had certain requirements for their wiki that didn’t seem possible with MediaWiki, or with the various extensions they looked at. So, armed with whatever PHP knowledge they had or could look up, they began to modify the core MediaWiki code, or modify some extension, or create a new extension, or modify some skin, or create a new skin. And then the resulting changes hopefully accomplished the task. In some cases, such people became hooked and kept writing MediaWiki code, while in most cases, such people moved on to other work, staying with the development only long enough to maintain the code and add some features over the years. Whichever group you fall into, we would like to offer a few pointers to try to maximize the chance that your MediaWiki programming project will be successful.

Let’s start by giving a hypothetical case of a task that might require custom development. Your boss, freshly energized from attending an overpriced conference, and full of ideas about disrupting and pivoting, decides that what your internal wiki needs is “gamification”; and to that end, every 50th time someone makes an edit, the wiki should show an animated image of fireworks on that lucky user’s screen after they hit the “Save” button.

So, you have a task at hand ­ and thankfully, unlike in many situations, it’s at least well-defined. What should you do? The first step, and a very critical one, is to make sure that this functionality hasn’t already been implemented in either core MediaWiki or in one of its extensions. There are a lot of interesting MediaWiki extensions out there, and a lot of functionality within core MediaWiki ­ this book covers only a subset of them. And it’s almost always easier to reuse code (assuming it does what you need) than to create new code. Finding extensions was covered here. If you search through there and can’t find anything, another good step is to ask on the MediaWiki mailing list, IRC channel or users forum (see here).

Let’s say you tried those routes and it turned out that there was no such functionality in either MediaWiki or any of its extensions (and in this case, most likely there isn’t). If you want this feature, your only option now is custom development, by either yourself or someone else. In theory, there are various approaches you could take: you could modify MediaWiki itself, putting the new code in one or more existing PHP files; you could modify the MediaWiki skin you’re using (which is really just a subset of modifying MediaWiki); you could create a new skin, which holds the new functionality; you could modify an existing MediaWiki extension; or you could create a new extension.

In practice, though, only the last two of those options ­ modifying or creating an extension ­ are really appropriate solutions. Modifying MediaWiki itself should be avoided whenever possible. It may be tempting ­ just another 20 lines in some file, and you’re done ­ but the problems come up as soon as you want to upgrade MediaWiki. If you get the MediaWiki code via download ­ i.e, not via Git ­ you need to document all of your changes to core, to make sure that your set of changes don’t get lost when you upgrade the software. If you’re getting MediaWiki from Git, it’s easier, but you still need to deal with merge conflicts when you upgrade. In both cases, if the part of the code that you happen to have modified changes (and it often does), you need to figure out how to re-modify the code accordingly.

Lest you think this is a minor issue, there are many cases of MediaWiki-based wikis that are stuck for years at a time on a certain version, because, as the patches have piled up, upgrading MediaWiki becomes too costly to do on a regular basis. This happens for large wikis as well as small ones: Wikia and wikiHow are both major wikis that have been significantly behind in their MediaWiki versions at various times as a result of their many code customizations.

What about creating a new skin, containing the new functionality? That’s not desirable either, for two reasons. First, it means that your users are forced to use one skin, which is inconvenient, and it’s also confusing if any of your users do switch to a different skin. Second, maintaining a skin can be as much work as maintaining MediaWiki patches, since the required structure for skins still changes quite a bit between MediaWiki version ­ which is part of why, sadly, there are so few working MediaWiki skins available for download, outside the ones that come bundled with MediaWiki. (Hopefully, in the future, skins will become more standardized and will change less from version to version.)

So that leaves working with extensions. If you’re modifying an existing extension, it’s good to talk to that extension’s author(s), via email or talk page ­ hopefully they will be willing to accept your changes or additions in some form, since otherwise, you’ll have to deal with the same issues of maintaining a patched set of code, that you would with changes to MediaWiki itself.

Finally, you can create a new extension ­ and for our hypothetical case, that’s probably what you’d have to do. There is an extension, called “PostEdit”, that displays a pop-up message after the user saves the page ­ and since version 1.22 of MediaWiki, its code has been merged into core MediaWiki ­ but there doesn’t seem to be an extension that displays messages only some of the time.

So let’s say you decide to create an extension. An important thing to note, when planning out extensions, is that most extensions are based around hooks. Hooks are lines in the core MediaWiki code (or in other extensions) where the code makes a call to allow other pieces of code to perform their own actions at that moment, or to modify some or all of the local variables. It’s a fairly simple concept, once you understand the basic idea, but a powerful one, since it lets you modify a lot of MediaWiki’s behavior without the need to touch any of the original code.

MediaWiki has hundreds of hooks scattered throughout its code, and there are many, probably hundreds, of hooks in various MediaWiki extensions as well. Ideally, any extension you create can accomplish what it needs to without the need to modify MediaWiki, or any other extension, possibly through the use of hooks.

You can find MediaWiki hooks via this very helpful page:

https://www.mediawiki.org/wiki/Manual:Hooks


In our hypothetical case, it seems like the hook “ArticleSaveComplete” would do the trick: it’s called after the user hits the “Save” button and the article has finished saving. Each hook has its own documentation page, linked from the “Manual:Hooks” page, detailing its usage. Sometimes finding the right hook can be a matter of trial-and-error -- and sometimes there’s actually more than one hook that can be used, and it’s a matter of finding the best fit.

What if there’s no hook that fits, though? Then it becomes acceptable to modify the MediaWiki code ­ but ideally, only to insert a line that calls the new, needed hook. And even more ideally, once you create this new hook and get your extension working with it, you can send this small set of code as a patch to the MediaWiki developers, so that they can add it to the main code base. (You can, of course, send a patch of any size to the MediaWiki developers, but a single-line hook seems the most likely to get accepted and integrated, especially if you’re just starting out as a MediaWiki developer, or if the feature you want to add is quite specific, like a celebratory animation.)

Now it’s time to create the extension. Actual PHP development won’t be covered in this book. We’ll just say that it’s good to copy from an existing MediaWiki extension that seems to do something relatively similar, rather than starting from scratch. Beyond that, there’s useful documentation for developers on mediawiki.org, including at the “How to become a MediaWiki hacker” page; and, as you might imagine, it’s good to have familiarity with PHP, JavaScript and the other relevant technologies.

Skip ahead a few days, or weeks or months, and now you’ve successfully created your extension, and learned something about MediaWiki development along the way. What’s more editing of your company’s wiki has gone up 70% due to the users’ delight at occasionally seeing an image of fireworks when they make an edit. Your boss is already working on his triumphant presentation for the next conference. Are you done? Hopefully not, because there’s still one important step that you should ideally take: releasing the extension as open-source software.

There can be obstacles to releasing software as open-source. Inertia is a big one: after all the time spent creating this software and getting it to work, why bother putting any more time and effort into it? Tied in with that is a low view of the importance of releasing the software: why would anyone else want to use my silly little extension? In some organizations releasing software as open-source may also run counter to the organization’s principles: after we paid to develop this software, why should others get to use it for free? The organization may even have rules against doing such a thing. And finally, if the wiki is itself part of (or all of) the business (in which case most likely it’s a public wiki), there’s the practical argument: why should we release software that could then easily be used by any competitors, present or future?

Despite all of these objections, we still think releasing one’s MediaWiki extension as open source is almost always the right thing to do. There are a variety of good reasons to release one’s software:

It leads to better code. Having many people, including experienced developers, look at, and use, the code means that bugs will be found (including security leaks), fixes will be suggested, and new features will potentially be added.

The code will be easier to maintain. MediaWiki changes often, and chances are good that, left unattended, your extension’s code will become incompatible with MediaWiki within a year or two, for any number of reasons. Having the extension be open source lets outside developers suggest, or directly add, fixes as new MediaWiki versions come out. If the code remains proprietary, you’re left with two choices: be forced to debug your code any time there’s a MediaWiki update on your wiki and an incompatibility arises; or keep your wiki on an old MediaWiki version forever for fear of breaking anything.

Instant translations to hundreds of languages. If the extension’s code is added to the MediaWiki Git repository, its translatable messages (the ones found in the extension’s "i18n" file) will very quickly start getting translated into dozens or even hundreds of languages by MediaWiki’s superb team of volunteer translators around the world. This is probably not a big deal if your extension is being used on just your local, single-language wiki ­ but if your wiki has users who speak even one other language, then the translation is worth it (for an examination of dealing with multiple languages on a wiki, see here).

Giving back. This is the most intangible of the reasons, and depending on your (or your organization’s) philosophy, it may be more or less compelling than the others. There definitely is a philanthropic argument, though. Unless you’re a sociopath, you would probably agree that, all things being equal, it’s nice to help others. And if you’re using MediaWiki, you’re already benefiting from the altruism of hundreds of people ­ not to mention the tens of thousands of developers of PHP, and the database system and operating system you’re using, if those too are open source. This is not to argue that using open source software brings with it any responsibility to contribute back, but there definitely is a nice symmetry to benefiting from others’ programming work and then giving back in return, even in a small way.

What about the case where the wiki you run is your business, and you don’t want to give code away to any potential competitors? This is not a book on business strategy, but my experience is that fear of competitors, at least in the wiki world, is generally overblown. Wikis gain prominence and usefulness as a result of their user community and their content, and that’s not something that’s easy to duplicate. Chances are that, by the time anybody figures that imitating your wiki is a good idea, it will already have a significant built-in user base, and a lot of content. If you have an open license for your content, someone could potentially "fork" your contents and try to recreate your entire wiki, but I’ve never heard of that being attempted. (Wikipedia is the one exception ­ a few attempts have been made to recreate all of some language of Wikipedia on a separate wiki ­ though those have been done due to some dissatisfaction with how Wikipedia is run, as opposed to a money-making attempt; and they have never worked. Anyway, Wikipedia is a special case.) Meanwhile, releasing the software as open source gives you all the benefits described earlier.

So let’s say you’ve decided to create a new extension, and release it as open source. Now what? There are only two steps you need to do: putting up the code somewhere online, and creating a documentation page.

You can put the code anywhere public (including directly on the extension’s wiki page ­ a popular solution for smaller extensions, though not necessarily a good one). The best place to put the code, though, is on the MediaWiki Git repository. For that you need developer access, which is pretty easy to get ­ see here for the details:

https://www.mediawiki.org/wiki/Developer_access


Then, you just need to create the documentation page for your extension. It should be on mediawiki.org, in the “Extension:” namespace; so if your extension is called “Page Save Surprise”, it would be at the page “Extension:Page Save Surprise”. (One note about extension naming: you have a choice about whether to put spaces in the name. The more common approach is to leave them out, in “CamelCase” style, but I prefer to include them, because it looks nicer. Either way is fine.)

Feel free to create your extension’s page by copying the text from the page for some other extension (and then modifying it, of course); that’s what most people do. Just make sure to set the new extension’s status to “experimental”, since that’s most likely what it is at this point.

Now you’re done. You can announce your new extension to the relevant mailing lists if you want, such as mediawiki-l (see here). If the extension is useful, people may start to ask questions, report bugs and contribute fixes, on the extension’s talk page, via email and on Bugzilla. And, if you find that the process appeals to you, you can start improving and maintaining other extensions as well ­ there certainly are parts of the code that could benefit from the help. You’re not obligated to do any of these things, but they’re all appreciated, in some cases by many people.


Afterword: On semantic wikis


I hope that you’ve enjoyed reading this book, and that it will serve as a useful guide for MediaWiki, which in my opinion is, all things considered, the best wiki software at the moment. I hope that the book adequately explains core MediaWiki, outside of any extensions. And I also hope that, for those of you who aren’t using Semantic MediaWiki and its related extensions, this book made a strong case for their use.

There’s an interesting property of semantic wikis, which is that, once you’ve used them a few times, systems that aren’t semantic wikis start to get annoying. This has happened to me, and I’ve heard of the effect from others. The limitations of other systems start to become apparent: for regular data systems, the most important is the lack of a version history, while for non-semantic wikis, it’s the lack of any way to summarize or aggregate all the data contained in the wiki’s pages, or to impose a structure on pages.

Regular data systems’ lack of a version history means, most importantly, that access has to be tightly controlled: either only a few people can edit any of the data, or each specific set of data is restricted to some small group, with the entire set of permissions settable through some monstrous administrative interface. But it also means that the provenance of data tends to be unknown: if you see a field on the screen, you usually don’t know who put it there, or when it was set, which can cause problems.

Of course, there are workarounds that can be done: you can have a “Notes” field, where everyone who edits the data is meant to summarize their changes, and maybe put in the date; but this is a hack, and any such protocol might not be followed, and even if it, it’s never quite as informative as actually seeing all the changes.

But there’s also the issue of the flexibility of data structures. In a semantic wiki, if you want to add a new field to a page, or remove a field, it’s just a matter of adding or removing a few lines from some wiki pages, and possibly creating a new semantic property. The total time spent could be as little as five minutes. In a non-wiki system, it depends on the setup: creating or removing a field can possibly be done easily, or it might take a significant amount of work, requiring a programmer to re-code some part of the program, a database administrator to apply the necessary changes in the database, and then some QA team to make sure that the changes didn’t break anything. Or, if you’re using an off-the-shelf system, such a change might not be possible at all.

Non-semantic wikis have all that flexibility as well, but again, they don’t have the data reuse, and they don’t have the forms.

It is my belief that, in the future, all data systems will function like semantic wikis, in their flexibility, editability (in most cases, everyone who can read the content will also be able to edit it), and version history. That is not to say that all systems will be semantic wikis, and certainly not that all of them will use Semantic MediaWiki ­ Google Docs is a good example of a system that has these attributes to a large extent, but is not a semantic wiki. Still, semantic wikis are the easiest way to get all these features, and Semantic MediaWiki is currently the best and most advanced semantic wiki software. And of course it’s free software, in both senses of the word. So I think SMW is positioned to become an important part of the software ecosystem in the future.

This doesn’t even get to the issue of reuse of data between systems ­ in the general sense, and in the specific sense of the Semantic Web. Semantic MediaWiki is sometimes grouped in as a Semantic Web technology, though in most cases its users make no use of the Semantic Web, in the sense of importing or exporting content in RDF or a comparable format. But it certainly can be used to do both, and with extensions like External Data, it can make use of data in more standard formats as well. The Semantic Web, as a technology and framework, is growing in importance, and it certainly has had no shortage of buzz. If and when it ever achieves mainstream use, semantic wikis, and SMW in particular, will be a natural choice for publishing Semantic Web content.

Within this context, it’s worth mentioning Wikidata again. This project will, if successful, create a massive, queryable database comprising all of the structured information that one expects to find on Wikipedia. It will be something new in the history of the world: a source of structured information that can, in theory, be used by computers to answer nearly any general-knowledge question. The fact that Wikidata and Semantic MediaWiki will most likely use a joint code base for storing their data is mostly secondary: Wikidata, if successful, will significantly raise both usage and awareness of the Semantic Web, in its various meanings. And at that point, SMW will hopefully stand as an obvious answer when people, in greater numbers, start to ask, “how we do add our own data to this thing?”

In short, semantic wikis solve a lot of problems, some of which people don’t even view as problems before they’ve used one. And in the case of MediaWiki, the benefits of a semantic wiki are available just by downloading and installing some more extensions. It seems like a no-lose proposition. So, happy adventures in the world of collaborative data.