BlikBack

From Blik
Revision as of 06:33, 30 November 2020 by Blik (Talk | contribs)

Jump to: navigation, search


It is important to make regular backups of the data in your wiki. This page provides an overview of the backup process for a typical MediaWiki wiki; you will probably want to devise your own backup scripts or schedule to suit the size of your wiki and your individual needs.

Overview

MediaWiki stores important data in two places:

Database 
Pages and their contents, users and their preferences, metadata, search index, etc.
File system 
Software configuration files, custom skins, extensions, images (including deleted images), etc.

Consider making the wiki read-only before creating the backup - see <tvar|readOnly>Template:Ll</>. This makes sure all parts of your backup are consistent (some of your installed extensions may write data nonetheless).

File transfer

You will have to choose a method for transferring files from the server where they are:

  • Non-private data you can simply publish on archive.org and/or in a dumps/ directory of your webserver.
  • SCP (or WinSCP), SFTP/FTP or any other transfer protocol you choose.
  • The hosting company might provide a file manager interface via a web browser; check with your provider.

SQLite Database

Most of the critical data in the wiki is stored in the database.

If your wiki is currently offline, its database can be backed up by simply copying the database file. Otherwise, you should use a maintenance script:

php maintenance/sqlite.php --backup-to <backup file name>, 

which will make sure that operation is atomic and there are no inconsistencies. If your database is not really huge and server is not under heavy load, users editing the wiki will notice nothing but a short lag. Users who are just reading will not notice anything in any case.

sqlite.php

sqlite.php file is a [[<tvar|maint-scripts>Special:MyLanguage/Manual:Maintenance scripts</>|maintenance script]] for tasks specific to [[<tvar|man-sqlite>Special:MyLanguage/Manual:SQLite</>|SQLite backend]].

Currently, these options are supported:

--vacuum
Executes VACUUM command that compacts the database and improves its performance.

Example:

$ php sqlite.php --vacuum
VACUUM: Database size was 46995456 bytes, now 37796864 (19.6% reduction).
--integrity
Performs integrity check of the database. If no error is detected, a single "ok" will be displayed, otherwise the script will show up to 100 errors.

Example:

$ php sqlite.php --integrity
Performing database integrity checks:
ok
--backup-to <file name>
Backups the database to the given file.

Template:MW 1.17

--check-syntax <one or more file names>
Checks SQL files for compatibility with SQLite syntax. This option is intended for developer use.

All these options can be used at the same time.

File system

MediaWiki stores other components of the wiki in the file system where this is more appropriate than insertion into the database, for example, site configuration files (<tvar|1>Template:Ll</>, <tvar|2>Template:Ll</> (finally removed in 1.23)), image files (including deleted images, thumbnails and rendered math and SVG images, if applicable), skin customisations, extension files, etc.

The best method to back these up is to place them into an archive file, such as a .tar file, which can then be compressed if desired. On Windows, applications such as WinZip or 7-zip can be used if preferred.

For Linux variants, assuming the wiki is stored in /srv/www/htdocs/wiki

  tar zcvhf wikidata.tgz /srv/www/htdocs/wiki

It should be possible to backup the entire "wiki" folder in "htdocs" if using XAMPP.

Backup the content of the wiki (XML dump)

It is also a good idea to create an XML dump in addition to the database dump. XML dumps contain the content of the wiki (wiki pages with all their revisions), without the site-related data (they do not contain user accounts, image metadata, logs, etc).[2]

XML dumps are less likely to cause problems with character encoding, as a means of transferring large amounts of content quickly, and can easily be used by third party tools, which makes XML dumps a good fallback should your main database dump become unusable.

To create an XML dump, use the command-line tool dumpBackup.php, located in the maintenance directory of your MediaWiki installation. See Manual:DumpBackup.php for more details.

You can also create an XML dump for a specific set of pages online, using Special:Export, although attempting to dump large quantities of pages through this interface will usually time out.

To import an XML dump into a wiki, use the command-line tool importDump.php. For a small set of pages, you can also use the Special:Import page via your browser (by default, this is restricted to the sysop group). As an alternative to dumpBackup.php and importDump.php, you can use MWDumper, which is faster, but requires a Java runtime environment.

See Manual:Importing XML dumps for more information.

See [[<tvar|import>Special:MyLanguage/Manual:Importing XML dumps</>|Manual:Importing XML dumps]] for more information.

Backup the content of the wiki (XML dump)

It is also a good idea to create an XML dump in addition to the database dump. XML dumps contain the content of the wiki (wiki pages with all their revisions), without the site-related data (they do not contain user accounts, image metadata, logs, etc).[1]

XML dumps are less likely to cause problems with character encoding, as a means of transferring large amounts of content quickly, and can easily be used by third party tools, which makes XML dumps a good fallback should your main database dump become unusable. </translate>

<translate> To create an XML dump, use the command-line tool <tvar|1>Template:Ll</>, located in the <tvar|2>maintenance</> directory of your MediaWiki installation.</translate> <translate> See <tvar|1>Template:Ll</> for more details.</translate>

<translate> You can also create an XML dump for a specific set of pages online, using Special:Export, although attempting to dump large quantities of pages through this interface will usually time out. </translate>

<translate> To import an XML dump into a wiki, use the command-line tool <tvar|1>Template:Ll</>.</translate> <translate> For a small set of pages, you can also use the Special:Import page via your browser (by default, this is restricted to the sysop group).</translate> <translate> As an alternative to <tvar|1>dumpBackup.php</> and <tvar|2>importDump.php</>, you can use MWDumper, which is faster, but requires a Java runtime environment. </translate>

<translate> See [[<tvar|import>Special:MyLanguage/Manual:Importing XML dumps</>|Manual:Importing XML dumps]] for more information.

Without shell access to the server

</translate> <translate> If you have no shell access, then use the <tvar|WikiTeam>WikiTeam</> Python script <tvar|dumpgenerator>dumpgenerator.py</> from a DOS, Unix or Linux command-line.</translate> <translate> Requires Python v2 (v3 doesn't yet work).</translate>

<translate> To get an XML, with edit histories, dump and a dump of all images plus their descriptions. Without extensions and LocalSettings.php configs. </translate>

python dumpgenerator.py --api=http://www.sdiy.info/w/api.php --xml --images

<translate> Full instructions are at the WikiTeam [<tvar|url>https://github.com/WikiTeam/wikiteam/wiki/Tutorial#I_have_no_shell_access_to_server</> tutorial].

See also Meta:Data dumps.

Scripts

Template:Warning

  • [[<tvar|script>Special:MyLanguage/Manual:Backing up a wiki/Duesentrieb's backup script</>|Unofficial backup script]] by User:Duesentrieb.</translate>
  • Unofficial backup script by Flominator; creates a backup of all files and the database, with optional backup rotation.</translate>
  • User:Darizotas/MediaWiki Backup Script for Windows - a script for backing up a Windows MediaWiki install. Note: Has no restore feature.</translate>
  • Unofficial web-based backup script, mw_tools, by Wanglong (allwiki.com); you can use it to back up your database, or use the backup files to recover the database, the operation is very easy.</translate>
  • [<tvar|wikiteam>https://github.com/WikiTeam/wikiteam</> WikiTeam tools] - if you do not have server access (e.g. your wiki is in a free wikifarm), you can generate an XML dump and an image dump using WikiTeam tools (see [<tvar|backups>https://github.com/WikiTeam/wikiteam/wiki/Available-Backups</> some saved wikis]).
  • Template:Ll
  • Another [<tvar|url>https://github.com/samwilson/MediaWiki_Backup</> backup script] that: dumps DB, files (just pictures by default, option to include all files in installation), and XML; puts the site into read-only mode; timestamps backups; and reads the charset from LocalSettings. Script does not need to be modified for each site to be backed up. Does not (yet) rotate old backups. Usage: <tvar|code>Template:Nowrap</>. Also provides a script to restore a backup <tvar|code>Template:Nowrap</>.</translate>
  • Another unofficial {{<tvar|1>ll|Manual:Backing up a wiki/Lanthanis backup CMD</>|MediaWiki backup script for Windows}} by <tvar|2>Lanthanis</> that: exports the pages of specified namespaces as an XML file; dumps specified database tables; and adds further specified folders and files to a ZIP backup file.</translate> <translate> Can be used with Windows task scheduler.</translate>
  • Script to make periodical backups [<tvar|url>https://github.com/nischayn22/mw_backup</> mw_backup]. This script will make daily, weekly and monthly backups of your database and images directory when run as a daily cron job.

See also

References

  1. XML dumps are independent of the database structure, and can be imported into future (and even past) versions of MediaWiki.

<languages/> Template:PD Help Page <translate> Wiki pages can be exported in a special XML format to [[<tvar|import>Special:MyLanguage/Help:Import</>|upload import]] into another MediaWiki installation (if this function is enabled on the destination wiki, and the user is a sysop there) or use it elsewise for instance for analysing the content.</translate> <translate> See also [[<tvar|meta>m:Syndication feeds</>|m:Syndication feeds]] for exporting other information but pages and <tvar|import>Template:Ll</> on importing pages.</translate>

<translate>

How to export

There are at least four ways to export pages:

<translate>

  • The backup script <tvar|dumpBackup>dumpBackup.php</> dumps all the wiki pages into an XML file.</translate> <translate> <tvar|dumpBackup>dumpBackup.php</> only works on MediaWiki 1.5 or newer.</translate> <translate> You need to have direct access to the server to run this script.</translate> <translate> Dumps of Wikimedia projects are regularly made available at <tvar|url>https://dumps.wikimedia.org/</>.</translate>

<translate>

    • Note: you might need to configure AdminSettings.php in order to run dumpBackup.php successfully.</translate> <translate> See <tvar|1>m:MediaWiki</> for more information.</translate>

<translate>

  • There is a OAI-PMH-interface to regularly fetch pages that have been modified since a specific time.</translate> <translate> For Wikimedia projects this interface is not publicly available; see <tvar|1>Template:Ll</>.</translate> <translate> OAI-PMH contains a wrapper format around the actual exported articles.</translate>

<translate>

By default only the current version of a page is included.</translate> <translate> Optionally you can get all versions with date, time, user name and edit summary.</translate> <translate> Optionally the latest version of all templates called directly or indirectly are also exported.</translate>

<translate> Additionally you can copy the SQL database.</translate> <translate> This is how dumps of the database were made available before MediaWiki 1.5 and it won't be explained here further.</translate>

<translate>

Using 'Special:Export'

To export all pages of a namespace, for example.

1. Get the names of pages to export

I feel an example is better because the description below feels quite unclear.

  1. Go to [[<tvar|allpages>Special:Allpages</>|Special:Allpages]] and choose the desired article/file.</translate>

<translate>

  1. Copy the list of page names to a text editor</translate>

<translate>

  1. Put all page names on separate lines</translate>

<translate>

    1. You can achieve that relatively quickly if you copy the part of the rendered page with the desired names, and paste this into say MS Word - use paste special as unformatted text - then open the replace function (CTRL+h), entering ^t in Find what, entering ^p in Replace with and then hitting Replace All button.</translate> (<translate> This relies on tabs between the page names; these are typically the result of the fact that the page names are inside td-tags in the html-source.</translate>)

<translate>

    1. The text editor Vim also allows for a quick way to fix line breaks: after pasting the whole list, run the command <tvar|code1>:1,$s/\t/\r/g</> to replace all tabs by carriage returns and then <tvar|code2>:1,$s/^\n//g</> to remove every line containing only a newline character.</translate>

<translate>

    1. Another approach is to copy the formatted text into any editor exposing the html.</translate> <translate> Remove all <tvar|tr1><tr></> and <tvar|tr2></tr></> tags and replace all <tvar|td1><td></> tags to <tvar|trtd><tr><td></> and <tvar|td2></td></> tags to <tvar|tdtr></td></tr></> the html will then be parsed into the needed format.</translate>

<translate>

    1. If you have shell and mysql access to your server, you can use this script:

</translate>

#
mysql -umike -pmikespassword -hlocalhost wikidbname 
select page_title from wiki_page where page_namespace=0
EOF

<translate> Note, replace mike and mikespassword with your own. Also, this example shows tables with the prefix wiki_</translate>

<translate>

  1. Prefix the namespace to the page names (e.g. 'Help:Contents'), unless the selected namespace is the main namespace.</translate>

<translate>

  1. Repeat the steps above for other namespaces (e.g. Category:, Template:, etc.)

A similar script for PostgreSQL databases looks like this: </translate>

$ psql -At -U wikiuser -h localhost wikidb -c "select page_title from mediawiki.page"

<translate> Note, replace wikiuser with your own, the database will prompt you for a password.</translate> <translate> This example shows tables without the prefix wiki_ and with the namespace specified as part of the table name.</translate>

<translate> Alternatively, a quick approach for those with access to a machine with Python installed:</translate>

<translate>

  1. Go to [[<tvar|allpages>Special:Allpages</>|Special:Allpages]] and choose the desired namespace.</translate>

<translate>

  1. Save the entire webpage as index.php.htm.</translate> <translate> Some wikis may have more pages than will fit on one screen of AllPages; you will need to save each of those pages.</translate>

<translate>

  1. Run <tvar|export>export_all_helper.py</> in the same directory as the saved file.</translate> <translate> You may wish to pipe the output to a file; e.g. <tvar|code>python export_all_helper.py > main</> to send it to a file named "main".</translate>

<translate>

  1. Save the page names output by the script.

2. Perform the export

  • Go to [[<tvar|export>Special:Export</>|Special:Export]] and paste all your page names into the textbox, making sure there are no empty lines.</translate>

<translate>

  • Click 'Submit query'</translate>

<translate>

  • Save the resulting XML to a file using your browser's save facility.

and finally...

  • Open the XML file in a text editor.</translate> <translate> Scroll to the bottom to check for error messages.</translate>

<translate> Now you can use this XML file to [[<tvar|import>Special:MyLanguage/Help:Import</>|perform an import]].

Exporting the full history

A checkbox in the [[<tvar|export>Special:Export</>|Special:Export]] interface selects whether to export the full history (all versions of an article) or the most recent version of articles.</translate> <translate> A maximum of 100 revisions are returned; other revisions can be requested as detailed in <tvar|1>Template:Ll</>.</translate>

<translate>

Export format

</translate> <translate> The format of the XML file you receive is the same in all ways.</translate> <translate> It is codified in XML Schema at <tvar|url>https://www.mediawiki.org/xml/export-0.10.xsd</></translate> <translate> This format is not intended for viewing in a web browser.</translate> <translate> Some browsers show you pretty-printed XML with "+" and "-" links to view or hide selected parts.</translate> <translate> Alternatively the XML-source can be viewed using the "view source" feature of the browser, or after saving the XML file locally, with a program of choice.</translate> <translate> If you directly read the XML source it won't be difficult to find the actual wikitext.</translate> <translate> If you don't use a special XML editor "<" and ">" appear as &lt; and &gt;, to avoid a conflict with XML tags; to avoid ambiguity, "&" is coded as "&amp;".</translate>

<translate> In the current version the export format does not contain an XML replacement of wiki markup (see Wikipedia DTD for an older proposal).</translate> <translate> You only get the wikitext as you get when editing the article.</translate>

<translate>

Example

</translate>

  <mediawiki xml:lang="en">
    <page>
      <title>Page title</title>
      <restrictions>edit=sysop:move=sysop</restrictions>
      <revision>
        <timestamp>2001-01-15T13:15:00Z</timestamp>
        <contributor><username>Foobar</username></contributor>
        <comment>I have just one thing to say!</comment>
        <text>A bunch of [[Special:MyLanguage/text|text]] here.</text>
        <minor />
      </revision>
      <revision>
        <timestamp>2001-01-15T13:10:27Z</timestamp>
        <contributor><ip>10.0.0.2</ip></contributor>
        <comment>new!</comment>
        <text>An earlier [[Special:MyLanguage/revision|revision]].</text>
      </revision>
    </page>
 
    <page>
      <title>Talk:Page title</title>
      <revision>
        <timestamp>2001-01-15T14:03:00Z</timestamp>
        <contributor><ip>10.0.0.2</ip></contributor>
        <comment>hey</comment>
        <text>WHYD YOU LOCK PAGE??!!! i was editing that jerk</text>
      </revision>
    </page>
  </mediawiki>

<translate>

DTD

</translate> <translate> Here is an unofficial, short Document Type Definition version of the format.</translate> <translate> If you don't know what a DTD is just ignore it.</translate>

<!ELEMENT mediawiki (siteinfo,page*)>
<!-- version contains the version number of the format (currently 0.3) -->
<!ATTLIST mediawiki
  version  CDATA  #REQUIRED 
  xmlns CDATA #FIXED "https://www.mediawiki.org/xml/export-0.3/"
  xmlns:xsi CDATA #FIXED "http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation CDATA #FIXED
    "https://www.mediawiki.org/xml/export-0.3/ https://www.mediawiki.org/xml/export-0.3.xsd"
  xml:lang  CDATA #IMPLIED
>
<!ELEMENT siteinfo (sitename,base,generator,case,namespaces)>
<!ELEMENT sitename (#PCDATA)>      <!-- name of the wiki -->
<!ELEMENT base (#PCDATA)>          <!-- url of the main page -->
<!ELEMENT generator (#PCDATA)>     <!-- MediaWiki version string -->
<!ELEMENT case (#PCDATA)>          <!-- how cases in page names are handled -->
   <!-- possible values: 'first-letter' | 'case-sensitive'
                         'case-insensitive' option is reserved for future -->
<!ELEMENT namespaces (namespace+)> <!-- list of namespaces and prefixes -->
  <!ELEMENT namespace (#PCDATA)>     <!-- contains namespace prefix -->
  <!ATTLIST namespace key CDATA #REQUIRED> <!-- internal namespace number -->
<!ELEMENT page (title,id?,restrictions?,(revision|upload)*)>
  <!ELEMENT title (#PCDATA)>         <!-- Title with namespace prefix -->
  <!ELEMENT id (#PCDATA)> 
  <!ELEMENT restrictions (#PCDATA)>  <!-- optional page restrictions -->
<!ELEMENT revision (id?,timestamp,contributor,minor?,comment?,text)>
  <!ELEMENT timestamp (#PCDATA)>     <!-- according to ISO8601 -->
  <!ELEMENT minor EMPTY>             <!-- minor flag -->
  <!ELEMENT comment (#PCDATA)> 
  <!ELEMENT text (#PCDATA)>          <!-- Wikisyntax -->
  <!ATTLIST text xml:space CDATA  #FIXED "preserve">
<!ELEMENT contributor ((username,id) | ip)>
  <!ELEMENT username (#PCDATA)>
  <!ELEMENT ip (#PCDATA)>
<!ELEMENT upload (timestamp,contributor,comment?,filename,src,size)>
  <!ELEMENT filename (#PCDATA)>
  <!ELEMENT src (#PCDATA)>
  <!ELEMENT size (#PCDATA)>

<translate>

Processing XML export

</translate> <translate> Many tools can process the exported XML.</translate> <translate> If you process a large number of pages (for instance a whole dump) you probably won't be able to get the document in main memory so you will need a parser based on SAX or other event-driven methods.</translate>

<translate> You can also use regular expressions to directly process parts of the XML code.</translate> <translate> This may be faster than other methods but not recommended because it's difficult to maintain.</translate>

<translate> Please list methods and tools for processing XML export here:

<translate>

<translate>

  • [[<tvar|meta>m:Processing MediaWiki XML with STX</>|m:Processing MediaWiki XML with STX]] - Stream based XML transformation</translate>

<translate>

<translate>

Details and practical advice

  • To determine the namespace of a page you have to match its title to the prefixes defined in

<tvar|ns>/mediawiki/siteinfo/namespaces/namespace</></translate> <translate>

  • Possible restrictions are</translate>
    • sysop - <translate> protected pages</translate>

<translate>

Why to export

Why not just use a dynamic database download? </translate>

<translate> Suppose you are building a piece of software that at certain points displays information that came from Wikipedia.</translate> <translate> If you want your program to display the information in a different way than can be seen in the live version, you'll probably need the wikicode that is used to enter it, instead of the finished html.</translate>

<translate> Also if you want to get all of the data, you'll probably want to transfer it in the most efficient way that's possible.</translate> <translate> The Wikimedia servers need to do quite a bit of work to convert the wikicode into html.</translate> <translate> That's time consuming both for you and for the Wikimedia servers, so simply spidering all pages is not the way to go.</translate>

<translate> To access any article in XML, one at a time, link to:

Special:Export/Title_of_the_article

See also

</translate>

[[Category:Help{{#translation:}}]]



Template:RightTOC This page describes methods to import XML dumps. XML Dumps contain the content of a wiki (wiki pages with all their revisions), without the site-related data. A XML dump does not create a full backup of the wiki database, the dump does not contain user accounts, images, edit logs, etc.

The Special:Export page of any MediaWiki site, including any Wikimedia site and Wikipedia, creates an XML file (content dump). See meta:Data dumps and Manual:DumpBackup.php. XML files are explained more on meta:Help:Export.

What to import?

How to import?

There are several methods for importing these XML dumps.

Using Special:Import

Special:Import can be used by wiki users with import permission (by default this is users in the sysop group) to import a small number of pages (about 100 should be safe). Trying to import large dumps this way may result in timeouts or connection failures. See meta:Help:Import for a detailed description.[1]

You are asked to give an interwiki prefix. For instance, if you exported from the English Wikipedia, you have to type 'en'.

Changing permissions

See Manual:User_rights

To allow all registered editors to import (not recommended) the line added to "LocalSettings.php" would be:

$wgGroupPermissions['user']['import'] = true;
$wgGroupPermissions['user']['importupload'] = true;

Possible problems

For using Transwiki-Import PHP safe_mode must be off and "open_basedir" must be empty (both of them are variables in php.ini). Otherwise the import fails.

If you get error like this:

Warning: XMLReader::open(): Unable to open source data in /.../wiki/includes/Import.php on line 53
Warning: XMLReader::read(): Load Data before trying to read in /.../wiki/includes/Import.php on line 399

And Special:Import shows: "Import failed: Expected <mediawiki> tag, got ", this may be a problem caused by a fatal error on a previous import, which leaves libxml in a wrong state across the entire server, or because another PHP script on the same server disabled entity loader (PHP bug). This happens on MediaWiki versions prior to MediaWiki 1.26, and the solution is to restart the webserver service (apache, etc), or write and execute a script that calls libxml_disable_entity_loader(false); (see Template:Phabricator).

Using importDump.php, if you have shell access

Recommended method for general use, but slow for very big data sets. For very large amounts of data, such as a dump of a big Wikipedia, use mwdumper, and import the links tables as separate SQL dumps.

importDump.php is a command line script located in the maintenance folder of your MediaWiki installation. If you have shell access, you can call importdump.php from within the maintenance folder like this (add paths as necessary):

php importDump.php --conf ../LocalSettings.php /path_to/dumpfile.xml.gz --username-prefix=""

Template:Note

or this:

php importDump.php < dumpfile.xml

where dumpfile.xml is the name of the XML dump file. If the file is compressed and that has a .gz or .bz2 file extension (but not .tar.gz or .tar.bz2), it is decompressed automatically.

Afterwards use ImportImages.php to import the images:

php importImages.php ../path_to/images

Template:Note If you have other digital media file types uploaded to your wiki, i.e., .zip, .nxc, .cpp, .py, or .pdf, then you must also backup/export the wiki_prefix_imagelinks table and "insert" it into the new SQL database table that corresponds with your new MediaWiki. Otherwise, all links referencing these file types will turn up as broken in your wikipages.

Template:Note If you are using WAMP installation, you can have problems with the importing, due to innoDB settings (by default this engine is disabled in my.ini, so if you don't need problems, use MyIsam engine)

Template:Note running importDump.php can take quite a long time. For a large Wikipedia dump with millions of pages, it may take days, even on a fast server. Add --no-updates for faster import. Also note that the information in meta:Help:Import about merging histories, etc. also applies.

Template:Note Optimizing of database after import is recommended: it can reduce database size in two or three times.

After running importDump.php, you may want to run rebuildrecentchanges.php in order to update the content of your Special:Recentchanges page.

FAQ

How to setup debug mode?
Use command line option --debug.
How to make a dry run (no data added to the database)?
Use command line option --dry-run

Error messages

Typed
roots@hello:~# php importImages.php /maps gif bmp PNG JPG GIF BMP
Error
 
> PHP Deprecated:  Comments starting with '#' are deprecated in /etc/php5/cli/conf.d/mcrypt.ini on line 1 in Unknown on line 0
> Could not open input file: importImages.php
Cause

Before running importImages.php you first need to change directories to the maintenance folder which has the importImages.php maintence script.


Error while running MAMP
DB connection error: No such file or directory (localhost)
Solution

Using specific database credentials

$wgDBserver         = "localhost:/Applications/MAMP/tmp/mysql/mysql.sock";
$wgDBadminuser      = "XXXX";
$wgDBadminpassword  = "XXXX";

Using importTextFiles.php Maintenance Script

Template:MW 1.23 Template:MW 1.27 If you have a lot of content converted from another source (several word processor files, content from another wiki, etc), you may have several files that you would like to import into your wiki. In MediaWiki 1.27 and later, you can use the importTextFiles.php maintenance script.

You can also use the edit.php maintenance script for this purpose.

Using mwdumper

Apparently, it can't be used to import into MediaWiki 1.31 or later.

mwdumper is a Java application that can be used to read, write and convert MediaWiki XML dumps. It can be used to generate a SQL dump from the XML file (for later use with mysql or phpmyadmin) as well as for importing into the database directly. It is a lot faster than importDump.php, however, it only imports the revisions (page contents), and does not update the internal link tables accordingly -- that means that category pages and many special pages will show incomplete or incorrect information unless you update those tables.

If available, you can fill the link tables by importing separate SQL dumps of these tables using the mysql command line client directly. For Wikimedia wikis, this data is available along with the XML dumps.

Otherwise, you can run rebuildall.php, which will take a long time, because it has to parse all pages. This is not recommended for large data sets.

Using pywikibot, pagefromfile.py and Nokogiri

pywikibot is a collection of tools written in python that automate work on Wikipedia or other MediaWiki sites. Once installed on your computer, you can use the specific tool 'pagefromfile.py' which lets you upload a wiki file on Wikipedia or MediaWiki sites. The xml file created by dumpBackup.php can be transformed into a wiki file suitable to be processed by 'pagefromfile.py' using a simple Ruby program similar to the following (here the program will transform all xml files which are on the current directory which is needed if your MediaWiki site is a family):

# -*- coding: utf-8 -*-
# dumpxml2wiki.rb
 
require 'rubygems'
require 'nokogiri'
 
# This program dumpxml2wiki reads MediaWiki xml files dumped by dumpBackup.php
# on the current directory and transforms them into wiki files which can then 
# be modified and uploaded again by pywikipediabot using pagefromfile.py on a MediaWiki family site.
# The text of each page is searched with xpath and its title is added on the first line as
# an html comment: this is required by pagefromfile.py.
# 
Dir.glob("*.xml").each do |filename|
  input = Nokogiri::XML(File.new(filename), nil, 'UTF-8')
 
  puts filename.to_s  # prints the name of each .xml file
 
  File.open("out_" + filename + ".wiki", 'w') {|f| 
    input.xpath("//xmlns:text").each {|n|
      pptitle = n.parent.parent.at_css "title" # searching for the title
      title = pptitle.content
      f.puts "\n{{-start-}}<!--'''" << title.to_s << "'''-->" << n.content  << "\n{{-stop-}}"
    }
  }
end

For example, here is an excerpt of a wiki file output by the command 'ruby dumpxml2wiki.rb' (two pages can then be uploaded by pagefromfile.py, a Template and a second page which is a redirect):

{{-start-}}<!--'''Template:Lang_translation_-pl'''--><includeonly>Tłumaczenie</includeonly>
{{-stop-}}
 
{{-start-}}#REDIRECT[[badania demograficzne]]<!--'''ilościowa demografia'''-->
<noinclude>
[[Category:Termin wielojęzycznego słownika demograficznego (pierwsze wydanie)|ilościowa demografia]]
[[Category:Termin wielojęzycznego słownika demograficznego (pierwsze wydanie) (redirect)]]
[[Category:10]]</noinclude>
{{-stop-}}

The program accesses each xml file, extracts the texts within <text> </text> markups of each page, searches the corresponding title as a parent and enclosed it with the paired {{-start-}}<!--'''Title of the page'''--> {{-stop-}} commands used by 'pagefromfile' to create or update a page. The name of the page is in an html comment and separated by three quotes on the same first start line. Please notice that the name of the page can be written in Unicode. Sometimes it is important that the page starts directly with the command, like for a #REDIRECT; thus the comment giving the name of the page must be after the command but still on the first line.

Please remark that the xml dump files produced by dumpBackup.php are prefixed by a namespace:

<mediawiki xmlns="http://www.mediawiki.org/xml/export-0.4/">

In order to access the text node using Nokogiri, you need to prefix your path with 'xmlns':

input.xpath("//xmlns:text")
. Nokogiri is an HTML, XML, SAX, & Reader parser with the ability to search documents via XPath or CSS3 selectors from the last generation of XML parsers using Ruby.

Example of the use of 'pagefromfile' to upload the output wiki text file:

python pagefromfile.py -file:out_filename.wiki -summary:"Reason for changes" -lang:pl -putthrottle:01

How to import logs?

Exporting and importing logs with the standard MediaWiki scripts often proves very hard; an alternative for import is the script pages_logging.py in the WikiDAT tool, as suggested by Felipe Ortega.

Troubleshooting

Merging histories, revision conflict, edit summaries, and other complications

Template:See also

Interwikis

If you get the message

Page "meta:Blah blah" is not imported because its name is reserved for external linking (interwiki).

the problem is that some pages to be imported have a prefix that is used for interwiki linking. For example, ones with a prefix of 'Meta:' would conflict with the interwiki prefix meta: which by default links to https://meta.wikimedia.org.

You can do any of the following.

  • Remove the prefix from the interwiki table. This will preserve page titles, but prevent interwiki linking through that prefix.
    Example: you will preserve page titles 'Meta:Blah blah' but will not be able to use the prefix 'meta:' to link to meta.wikimedia.org (although it will be possible through a different prefix).
    How to do it: before importing the dump, run the query DELETE FROM interwiki WHERE iw_prefix='prefix' (note: do not include the colon in the prefix). Alternatively, if you have enabled editing the interwiki table, you can simply go to Special:Interwiki and click the 'Delete' link on the right side of the row belonging to that prefix.
  • Replace the unwanted prefix in the XML file with "Project:" before importing. This will preserve the functionality of the prefix as an interlink, but will replace the prefix in the page titles with the name of the wiki where they're imported into, and might be quite a pain to do on large dumps.
    Example: replace all 'Meta:' with 'Project:' in the XML file. MediaWiki will then replace 'Project:' with the name of your wiki during importing.

See also

References

  1. See Manual:XML Import file manipulation in CSharp for a C# code sample that manipulates an XML import file.