YK: Chapter 9: Browsing and searching

From Blik
Jump to: navigation, search

9 Browsing and searching

Viewing the set of all pages

MediaWiki provides a standard way of seeing the entire set of pages in the wiki: the special page Special:AllPages. It lets you view an alphabetical list of pages in the wiki for each namespace, other than “Special:” ­ that includes all the namespaces, so you can also view categories, templates, files, etc. Figure 9.1 shows the top of the page Special:AllPages for mediawiki.org.

[]

Figure 9.1 Special:AllPages on mediawiki.org

The listing of page ranges in Figure 9.1 keeps going like that for another 10 lines. Clicking on any of those names will display a list of all the pages within that row’s alphabetical range; that list is separated into three columns.

You can also use the form at the top to display a manual list of pages within a certain alphabetical range.

For pages in the "File:" namespace, i.e. pages for uploaded files, there’s an alternate way to see them listed: the page Special:ListFiles. This special page has an advantage over Special:AllPages in that it also shows a thumbnail image for each file.

Figure 9.2 shows how the page Special:ListFiles looks on mediawiki.org.

[]

Figure 9.2 Special:ListFiles on mediawiki.org

For each file, the following information is shown: the date it was uploaded, or last uploaded if more than one version has been uploaded; a thumbnail of the file if it’s an image; its size; the user who last uploaded it; and a description of the file, if one was submitted during the upload. The name of each file is a link to that file’s page in the “File:” namespace, while the subsequent “(file)” link is a link directly to that file.

Searching

MediaWiki search functionality is available via its search bar, which in the Vector skin shows up on the top of the page (Figure 9.3).

[]

Figure 9.3 Search bar in the Vector skin

When a user does a search, they are sent to the page Special:Search, which handles all the actual search functionality. This page lets the user modify their search term, as well as change the set of namespaces that are being searched.

Within Special:Search, assuming you’re using the Vector skin, if the user runs their search and then clicks on the “Advanced” link, they will see an interface like the one shown in Figure 9.4.

[]

Figure 9.4 Special:Search page

Searches always start out being done only for namespaces defined as being "content namespaces", which by default is just the main (blank) namespace. As an administrator, you can change that by adding to $wgContentNamespaces. For instance, to also get pages in the "Help:" namespace to be searched, you would add the following to LocalSettings.php:

$wgContentNamespaces = array( NS_MAIN, NS_HELP );

Each link changes the set of namespaces being searched. MediaWiki’s search interface is case-insensitive. If a user enters text in the search box that exactly matches the name of a page, they will be sent directly to that page. Which is usually the right behavior; but what if the user wants to instead search on that text? This could be more obvious in the interface, but the way for the user to do it is to type in the search string, then wait for the autocompletion dropdown to show up, and select the last item, “containing... search-string”.

There’s another nice feature related to search: having autocompletion within the search input, based on page names in the wiki. You can see this behavior in Wikipedia, if you start typing text in the search box. Starting with MediaWiki 1.20, this is the default behavior. For earlier versions of MediaWiki, you can enable it by adding the following to LocalSettings.php:

$wgEnableMWSuggest = true;

Installing another search engine

MediaWiki’s default search simply uses SQL querying. It’s not bad, although it is sometimes criticized as primitive. Notably, it lacks the ability to find alternate spellings of words. It also ignores any advanced search-engine syntax, though it does support the most common syntax, of putting phrases in quotations to indicate that they should show up exactly as written.

There are three search engine applications that can be be substituted in to MediaWiki in place of the default one: Lucene, Sphinx and Elasticsearch. All three have features that the regular MediaWiki search doesn’t, like checking for misspellings, and all three are open source. Lucene is the best-known one, and it’s the one used by default on Wikipedia. Sphinx is less powerful, but easier to install. Elasticsearch is the newest (to MediaWiki, that is), and in terms of its handling within MediaWiki, it may be the best. It has some advantages in terms of speed and support for non-Latin characters, but its biggest advantage over the other solutions is that it searches on the text of pages when the templates are fully expanded, not on the raw wikitext. It’s currently available as an optional “beta feature” on Wikipedia.

If you want to use Lucene, there are two ways to add it to MediaWiki. The standard way is by installing the extensions Lucene-search and MWSearch. Lucene-search is not really an extension, though it’s billed that way: it’s just the code for Lucene itself, installed on Wikimedia’s code repository. Lucene is written in Java, so you’ll need Java installed as well. MWSearch is the actual MediaWiki extension that connects the search input to the Lucene back end.

The other way is to install the SolrStore extension, which works with Apache Solr (a search platform that includes Lucene), as well as with Semantic MediaWiki. SolrStore allows for both free-text searches and SMW property-based searches. Unfortunately, the extension doesn’t currently allow for searching on misspellings and synonyms, as Lucene/MWSearch does.

You can see the three extensions here:

https://www.mediawiki.org/wiki/Extension:Lucene-search

https://www.mediawiki.org/wiki/Extension:MWSearch

https://www.mediawiki.org/wiki/Extension:SolrStore

To add Sphinx instead, you need to install Sphinx, as well as the SphinxSearch MediaWiki extension. You can view that one here:

https://www.mediawiki.org/wiki/Extension:SphinxSearch

Finally, there’s the Elasticsearch engine, which is available via the MediaWiki extension called CirrusSearch:

https://www.mediawiki.org/wiki/Extension:CirrusSearch

CirrusSearch was only released in June 2013, but given its advantages, it may well become the standard MediaWiki search extension before too long.

One common request is the ability to search through uploaded files, such as Word documents and PDFs. This functionality is available, although currently it’s tricky to set up, and costs money. You have to install the SearchBlox application (not open source, and not free), and then use the SearchBlox extension:

https://www.mediawiki.org/wiki/Extension:SearchBlox

Both the Lucene and Elasticsearch engines natively provide support for searching through such documents, though, so perhaps it’s just a matter of time until such functionality is easily available through one or more of the extensions above.

Using an outside search engine

The other possibility for search is to use an outside search engine, and the overwhelming favorite for this option is Google. Google provides an easy-to-install "custom search" functionality, which is documented at:

http: //www.google.com/cse/

The usual approach to installing it in MediaWiki is to use the extension “Google Custom Search Engine”, which lets you place the input for this custom search anywhere on the page, including in place of the existing search input:

https://www.mediawiki.org/wiki/Extension:Google_Custom_Search_Engine

There are a few advantages to using Google’s search in place of an internal one: Google’s search interface is top-notch, and it’s also very well-known to people. And instead of searching through the wikitext, as most internal MediaWiki search options will do, a Google search will look at what’s actually displayed on the page ­ which in some cases could be quite different, especially if extensions like Widgets or External Data are used.

There are also some downsides, though: this setup only works for public wikis, unless you’re willing to pay to install a local Google search engine on your site. And it will take some time for Google (or any other outside engine) to see changes to your wiki, so recent edits will most likely not show up in search results.