You are herePower laws

Power laws


One of the critical point in my PhD. dissertation was how to find robust methods to fit statistical models to the empirical data obtained from Wikipedia database dumps, in particular for power laws and Pareto distributions. In this sense, I thank my colleague Dr. Israel Herráiz, who pointed me to the best paper on fitting power law distributions to empirical data. In fact, Aaron Clauset maintains a web page explaining the rationale behind this process, assessing about best practices in this field and providing all the source code to implement the best available methods.

In addition to that, I found this paper by Aban et al. very useful to adjust another less common type of distribution, the (upper) truncated Pareto. This is the distribution followed by the effort spent by human authors in all language editions of Wikipedia, both in terms of number of revisions performed and total number of different articles edited. The fitting process is supported in GNU R through the VGAM package.