YK: Chapter 5: Content organization

From Blik
Jump to: navigation, search

5 Content organization

Categories

Categories are MediaWiki’s basic method of organizing information. On wikis that don’t use Semantic MediaWiki, categories are really the only way to tag information about pages. Categories, for that reason, have been used in a large variety of ways ­ you only have to look at the explosion of categories on Wikipedia to see that. On the English-language Wikipedia, there are at least 6 ways in which categories get used:

to establish the basic type of the page’s subject, e.g. “Space Shuttles”

to define further characteristics of a page’s subject, e.g. “Italian generals”

to note a larger topic to which this page’s subject relates, e.g. “Theosophy”

to tag temporary information about the page itself, e.g. “Proposed deletion as of September 20, 2014”

to serve as a container super-category for other categories, e.g. “Symphonies by composer”

to tag pages other than regular pages or categories, e.g. “Animal templates”

There are better ways of tagging much of this information, which take less work and lead to less redundancy. In fact, Semantic MediaWiki, which we’ll get to in later chapters, was first thought up in part in order to remove the need for the profusion of categories on Wikipedia. Still, even with Semantic MediaWiki, categories play an important role.

Let’s look at how categories are defined and used. Any content page in MediaWiki can be added to a category, including images and files, as well as category pages themselves. In all cases, adding a page to a category consists of just adding the following text to anywhere in the page:

Nothing will be displayed where that text was added. Instead, at the bottom of the page, text that looks like the following will be displayed, showing all the page’s categories, in the order they were defined on the page:

Categories: 1899 births | 1957 deaths | 20th-century actors | Actors from New York City | American chess players | American Episcopalians | American film actors ...

Category tags can go anywhere in a page, but to keep things readable, the usual convention is to place them at the bottom of the page, with one tag per line. So the wikitext that would generate the previous set of displayed categories might look like this:

...and so on.

There’s one other important place in which categories are declared: within templates. On Wikipedia, the “Proposed deletion...” categories are one example of categories set via templates. (When using Semantic MediaWiki, it is in fact recommended that all category declarations be made via templates; you can see a full explanation of that here.)

Every category has a page in MediaWiki associated with it, which is just the name of that category, preceded by the name of the category namespace, which in English is just "Category:". So the page for a category called “Cars” would be at “Category:Cars”.

Let’s go through the structure of a category, using a real-life example. Here is the top of the page “Category:Hydraulic engineering” category on the English-language Wikipedia:

[]

The top part of the category page consists of whatever text has been manually placed there. Below that is the set of subcategories, if any, for this category; i.e., categories tagged as belonging to this category. Here is that list, again for the Hydraulic engineering category page:

[]

The categories are displayed in alphabetical order, in three columns, with headers for each initial letter. There is also an arrow next to each subcategory name, which can be used to drill down through the hierarchical list of subcategories for each of these subcategories. This arrow functionality is not standard, and comes from the CategoryTree extension ( see here).

Finally, there’s the heart of the category page: the listing of all the pages it contains. Here is just the top of that list, again for the same Wikipedia category:

[]

Pages, like subcategories, are displayed in three columns, with headers for the first letter of the name.

If a category contained any images, below the listing of pages would be a display of the thumbnails of all the images, in gallery format (this is not shown here).

If you click on a category name, and the page for that category hasn’t been created yet, you’ll see a message saying that the category doesn’t exist yet; but it will still list all its pages and/or subcategories. Every category that’s used should ideally have a page created for it. You have to add some content to a page in order to be able to save it: a simple sentence explaining the category is usually what’s done, although even just a "Category" tag, to establish a parent category for this category, would do the trick.

By default, a category page lists its member pages in alphabetical order. This doesn’t always make sense, though: you might want to list people by their last name, you might want to list names that start with “The” under their second word, and so on. You can do that for a particular member page by just adding the indexing string after a pipe in the category tag. For example, to have the page “The Archies” sorted as if it were called “Archies, The”, you would place the following in the page “The Archies”:

Or, if the page belongs a lot of categories, and you want to index it the same way in all those categories, you can use the “DEFAULTSORT” behavior switch on the member page, like this:

...etc.

“DEFAULTSORT” can be placed anywhere on the page, though it’s usually placed right above the set of categories, which in turn are usually at the end of the page. (That’s unless you’re using the Semantic MediaWiki system, though, where categories, and default sorting, are usually set via the template.)

Since the "Category" tag looks exactly like a standard wiki-link, how do you actually link to a category page, instead of making a category declaration? You do that by adding a colon at the beginning of the tag, so the text would look like:

Also check out Category:Butterflies.

...or, if you want custom text in the link, add a "|" and the text after that, just as you would with a regular link:

Also check out the Butterflies category.

Namespaces

Namespaces are how different types of content in the wiki are distinguished. A page is defined as being within a certain namespace if the name of the page begins with the name of that namespace, followed by a colon. So, for instance, the page "Talk:UNIX administration" is in the "Talk" namespace.

It’s very important to note that a namespace is more than just a prefix: it’s a true separate container of content. So the page "Talk:UNIX administration" is actually considered by the system to be the page "UNIX administration" within the "Talk" namespace.

Some namespaces can be represented by more than one string. For instance, in non-English-language wikis, the English-language name for a namespace will always work. So, for example, going to the page “Talk:Paavo Nurmi” on a Finnish-language wiki will redirect you to “Keskustelu:Paavo Nurmi”. (“Keskustelu” is Finnish for “discussion”.) This can also happen within one language. In most wikis, the project namespace can be accessed with either the name of the wiki, the string “Project:”, or, for non-English wikis, the corresponding word for “project” in that language. When settings are changed for individual namespaces within LocalSettings.php, the namespaces aren’t referred to by their language aliases, but rather by values like NS_TALK, NS_USER, etc. So if you want to enable subpages for just “User:” pages, for example (we’ll get to subpages here), you would add the following line to LocalSettings.php:

$wgNamespacesWithSubpages[NS_USER] = true;

NS_TALK, NS_USER etc. are actually PHP constants, that in turn simply represent numbers (NS_TALK is 1, for example). Each namespace’s number is unique. Pages whose names don’t contain a colon are still part of a namespace: that’s the "main" or "blank" namespace, denoted by NS_MAIN in LocalSettings.php (which has the numeric value 0). Similarly, pages whose names do contain a colon, but whose string before the colon doesn’t correspond to an active namespace, are also in the main namespace: a page named "Happy:Times" is just the page "Happy:Times" in the main namespace, not a page called "Times" in the "Happy" namespace, unless a "Happy" namespace has been defined on this wiki. Namespaces are used for many types of content; below is the listing of each namespace that comes by default in MediaWiki, along with its PHP constant, its actual numeric value, its English-language alias, and its purpose:

Namespace constant # English-language alias(es) Purpose

NS_MEDIA -2 "Media" Used for directly linking to uploaded files

NS_SPECIAL -1 "Special" Used for special pages defined by the software

NS_MAIN 0 (no text) Most user-created content

NS_TALK 1 "Talk" Discussions about pages in the main namespace

NS_USER 2 "User" Information about the wiki’s users

NS_USER_TALK 3 "User talk" Discussions with individual users

NS_PROJECT 4 "Project" and a specific name for that wiki, usually the name of the wiki Pages about the wiki itself

NS_PROJECT_TALK 5 "Project talk" and a specific name + " talk" Discussions about project pages

NS_FILE 6 "File" and "Image" Uploaded files

NS_FILE_TALK 7 "File talk" and "Image talk" Discussions about uploaded files

NS_MEDIAWIKI 8 "MediaWiki" System messages and wiki-wide CSS and JS content

NS_MEDIAWIKI_TALK 9 "MediaWiki talk" Discussions about those system messages

NS_TEMPLATE 10 "Template" Holds templates

NS_TEMPLATE_TALK 11 "Template talk" Discussions about templates

NS_HELP 12 "Help" Pages meant to help the wiki’s users

NS_HELP_TALK 13 "Help talk" Discussions about help pages

NS_CATEGORY 14 "Category" Holds categories

NS_CATEGORY_TALK 15 "Category talk" Discussions about categories

Most of these namespaces will be discussed in greater detail in later chapters.

Extensions to MediaWiki can define additional namespaces of their own: some of the extensions that will be covered later in this book, like LiquidThreads, Semantic MediaWiki, and Widgets, do that.

Administrators can also add their own namespaces to a wiki. When this is done, it’s usually in order to separate out types of content. The Wikimedia website Wikisource, for instance, has, in its English-language version, the namespace "Author", which holds all author pages, so that the URL for the poet Robert Frost is:

http://en.wikisource.org/wiki/Author:Robert_Frost

Here, the "Author" namespace seems like it’s intended to provide disambiguation, so that a page about, say, a book called "Robert Frost" would automatically have a separate name.

Personally, I tend to argue against using namespaces for disambiguation alone: it adds more complexity (one more rule for the wiki’s users to remember), it’s a different usage of namespaces than what they were originally intended for, and it’s not usually needed as a disambiguation tool ­ since for all but the largest wikis, there’s usually not much need to distinguish between two entities with the same name.

There’s at least one compelling reason, though, to create additional namespaces for regular content, which is setting access control. A number of extensions that provide controls on both viewing and editing of content use namespaces to define which pages get which level of security (see here). Usually, such extensions allow for defining security levels for categories as well, and sometimes also for individual pages, but namespaces are the most secure approach, since a page is attached to a namespace in a very direct way. For pages that have their security settings defined via categories, there’s always the chance that the category declaration will get accidentally removed in one way or another from a page, thus removing its security settings, at least temporarily. To be sure, for pages with namespace-based protection there’s the risk that someone will accidentally move the page into another namespace; but this is a smaller risk.

Regardless, for whatever reason, you may want to create one or more additional namespaces for your wiki. In that case, the first step is to choose a number for your namespace; or rather a pair of consecutive numbers, since namespaces almost always come in twos: a namespace for the main content, which has an even number, and the one for its equivalent talk pages, which has the odd number that’s one higher. You of course should choose a pair of numbers that haven’t already been taken: that includes the IDs of all the default MediaWiki namespaces, as well as namespaces taken by any MediaWiki extensions you use, or may use in the future. For security’s sake, you may as well not use namespace numbers taken by any extensions. You can see the complete list of namespace IDs currently used by MediaWiki extensions here:

https://www.mediawiki.org/wiki/Extension_namespace_registration

Once you’ve decided on a number, or numbers, for your namespaces, you can register them within LocalSettings.php. For every namespace, you should add two lines, which look like the following:

define( "NS_BOOK", 500 );

$wgExtraNamespaces[NS_BOOK] = "Book";

The way namespaces are structured can lead to one potential awkwardness. Let’s say you create a page called “City:Brussels”, and then remember that you haven’t yet created a “City” namespace. You do that, and then you discover that the page “City:Brussels” is blank again! That’s because the page you created was “City:Brussels” in the main namespace, whereas the page you’re going to now is “Brussels” in the “City” namespace. How can you recover the page that was created before? There are three ways: the first, and probably easiest, way is to temporarily unset the “City” namespace, then move the page “City:Brussels” to any name that doesn’t start with “City”, then reinstate the “City” namespace, and move the page back to the name “City:Brussels”. The second way is to call the script “namespaceDupes.php”, available in MediaWiki’s /maintenance directory. The third involves going into the database, finding the entry for the page in the ’page’ table, and changing both the namespace and the page name via SQL or some other tool. It requires knowledge of database manipulation, and isn’t recommended unless you really know what you’re doing.

A similar problem can come about if the underlying number gets changed for any namespace. Again, those three previous solutions can also be used in this case.

Redirects

Redirects are another useful basic feature of MediaWiki. Redirects let you point one page toward another, so that if a user goes to the URL for page A, what they’ll be shown instead is page B, with a note at the top saying "(Redirected from...".

[]

Redirects are generally done for one of three reasons:

to link a common typo in a page name to its correct spelling

to link one or more synonyms to a single page (e.g., redirecting “USA” and “United States of America” to “United States”)

to link a topic that’s considered not meaningful enough to have its own page, to a general one that’s meant to cover that specific topic to some extent (e.g., redirecting “Fax” to “Company communications policy” in an internal company wiki)

A redirect is defined by placing the following text in the page that will be redirecting:

  1. REDIRECT target page name

The "REDIRECT" can actually be written with any casing, though by convention it’s usually written as all capital letters.

In most cases that’s the only thing that appears on a redirect page, though in theory any other text can appear as well ­ users just won’t see it. The one piece of text that it can be useful to add is category declarations ­ if you add one or more category declarations to a redirect page, that page will in fact show up as a member in all those category pages, though the name will be in italics. Category declarations are generally only done for the third type of redirect ­ specific subjects redirecting to more general topics.

Semantic MediaWiki (chapter 16) has its own behavior when dealing with redirect pages: properties that point to a redirect page are treated as if they’re pointing to the ultimate destination page; which is useful for the first two kinds of redirects (for typos and synonyms), though not always for the third (subtopics to larger topics).

Subpages and super-pages

Subpages are a handy way to break up a single page into multiple pages, if it gets too big or unwieldy. A subpage is simply a page whose name takes the form "main page name/additional text", where "main page name" already exists. So if you have a page about the company Ace Motors, and it contains a long section about company’s history, you could spin off that section into its own page, named "Ace Motors/History", and link to it from the "Ace Motors" page.

Of course, you could also call the page "History of Ace Motors", which is how it would be done on Wikipedia (Wikipedia doesn’t use subpages in its main namespace, though it does use them in other namespaces, like “Wikipedia:” and “Template:”). So are subpages just another naming convention? To some extent, yes, although MediaWiki does offer one important feature that makes subpages feel more like they "belong" to their main page: if you turn on the use of subpages, any page with a slash in its name will include a small "breadcrumb" link at the top, pointing back to the "main" page, i.e., the section before the slash, provided that the main page exists. This small feature goes a long way toward making subpages feel legitimate.

Sub-subpages, and pages further down the hierarchy, are also possible, provided that each page further up in the hierarchy already exists. The "breadcrumb" link at the top will link to each sub-section of the page name in turn. So you could have a page like "Ace Motors/History/Europe/1900-1950", and, if subpages are enabled for the main namespace, the top of the page will look like:

[]

Enabling subpages

Subpages are enabled through the global variable $wgNamespacesWithSubpages, which by default is empty. If you want to have subpages, say, the main namespace and the template namespace, you could add the following to LocalSettings.php:

$wgNamespacesWithSubpages = array( NS_MAIN, NS_TEMPLATE );

Conversely, if you wanted every namespace to have subpages, you would be best off calling array_fill(), like the following:

$wgNamespacesWithSubpages = array_fill( 0, 200, true );

(In this case, 200 is an arbitrarily high number, with the assumption that there are no namespaces on this wiki with an ID greater than 200.)

Special pages

There are pages in MediaWiki that do not contain editable content, but rather interface elements, like lists and helper forms. These are called "special pages". They are contained in the namespace "Special:", and unlike other pages, they can’t be edited, they have no page history, and they don’t have an associated talk page. Instead, the content of these pages is defined by the PHP code. MediaWiki defines a wide variety of special pages, as do many MediaWiki extensions. Among the special pages defined in core MediaWiki are:

Special:RecentChanges ­ shows the list of recent edits in the wiki (see here)

Special:Contributions ­ shows the set of edits made by any one user (when the page is called in the form “Special:Contributions/username”)

Special:Watchlist ­ shows the most recent change to any page that the current user is “watching” (see here)

Special:Version ­ shows the current version of MediaWiki, as well as of any extensions that are installed

Special:SpecialPages ­ shows the set of all special pages on the wiki; this is a useful starting point (see also here for the Admin Links extension)

Special:AllPages ­ shows all non-special pages in the wiki, subdivided by namespace

Special:RandomPage ­ brings the user to a random page in the wiki.

There are many more special pages defined by both MediaWiki and its extensions, some of which are intended only for administrators. We’ll get to many of these over the course of the book, but the list above is a good one for starting out with MediaWiki.