You are hereWikiXRay

WikiXRay


When I first started to work with Wikipedia data, from the database dumps published on the Wikimedia Download Center, I found a lot of difficulties to extract available information. The dumps are really big XML text files, that must be parsed efficiently to load the information in your local database, before taking a look at it. At that time, existing tools for parsing these files were not flexible enough to suit the ambitious purposes of my research.

Thus, I decided that this was the perfect excuse to write a better, libre software, alternative tool to recover information from Wikipedia databases, and automate the quantitative analysis of this information. The result was WikiXRay, a synchronized collection ofPython and GNU R scripts, aimed to analyze any language version of Wikipedia (and in fact, any wiki running on MediaWiki).

You can visit the dedicated page about WikiXRay on meta.wikimedia.org, to get a quick impression of its current capabilities. Some of them are:

  • General statistics (active editors, active articles, talk pages, content distribution...).
  • Distribution of effort among editors (inequality analysis, model for editor's activity...).
  • Survival analysis of editors (joining and quitting editors, mean and median lifetime of editors in the project, survival curves...)
  • Evolution of key descriptive parameters (focusing on sustainability conditions and common activity trends).

In the same way, if you are thinking about other new analyses, please consider to join the project and write your own script/library for WikiXRay. Altogether, we can make this the reference tool for Wikipedia analysis.