Is Geometric Mean or Median the better way to find the center in a range of ever-increasing "ranked numbers" (details in text)?

So basically, I like to make a lot of ranking lists, and use the ranked spots as a way to sort of calculate which things are superior. For example, let's say I wanted to rank the levels in a series of games that had four levels each. Upon ranking the levels found in the first two games, I come across this hypothetical ranking.

1.Game 2

2.Game 2

3.Game 2

4.Game 1

5.Game 1

6.Game 1

7.Game 1

8.Game 2

Now, if I were to take the mean value of each ranked spot, Game 1 would have a value of 5.5, and Game 2 would have a value of 3.5. In other words, Game 2 has the better levels on average, as it has the lower rank). The issue with using the mean average though is that, as we play more games and rank more levels, the distribution has the potential to change. For example, let's say that after playing 200 of these games, and therefore ranking 800 levels in total, the rankings for Game 1 and Game 2 now look like this.

1.Game 2

20.Game 2

30.Game 2

31.Game 1

32.Game 1

33.Game 1

34.Game 1

800.Game 2

So while the levels for Game 1 are still all bunched together, we see a huge variance in the levels for Game 2. After all, it seems Game 2 has both the best and worst level in this series of hundreds of games. The issue is that, due to the outliers, the mean now puts Game 1 as being better than Game 2. Game 1 now has a mean value of 32.5, and Game 2 has a mean value of 212.75.

Now if we use the median value, or geometric mean value, we see that the data is corrected, and now Game 2 is back to having the better levels overall.

And so my question is, when we're talking about any ever-growing list with the potential for outliers, is it better to use the median, or geometric mean to compare centers of data? The key is potential for outliers, as you can't really tell outliers will happen, until the data added in the future causes them to happen.

1 Answer

Your question is about selecting the most appropriate measure of central tendency when dealing with an ever-growing list that has the potential for outliers. The choice between mean, median, and geometric mean (and indeed other measures of central tendency) depends on the characteristics of your data, including its distribution, the presence of outliers, and the nature of the data itself.

In your example, you're dealing with rankings, which are ordinal data. For ordinal data, the mean may not be the best measure because it assumes an equal interval between all points, which may not be true for ordinal data. The median might be a better choice because it is not influenced by outliers and provides a measure of the central point of the data.

However, if you have outliers and they are legitimate observations (i.e., not errors), it might be inappropriate to disregard them by using the median. In these cases, the geometric mean could be useful. The geometric mean is often used for data that are highly skewed or have extreme values, as it tends to dampen the effect of very large or small values.

In your case, if the ranking of the games is highly variable with the potential for extreme values, then the geometric mean might be a better choice. On the other hand, if the ranking is relatively stable and you want to find the central value, then the median might be more appropriate.

Remember, no single measure of central tendency is inherently "better" than another. The choice depends on the characteristics of the data and what you want to understand about it.
 
 
 
 

Join Matchmaticians Affiliate Marketing Program to earn up to a 50% commission on every question that your affiliated users ask or answer.