This is interesting analysis to me, as I've done some similar work to assist my "Fantasy Sumo League" play. I used a slightly smaller dataset (from 2000 to the present, which was enough to cover all current top-division men's full careers) and the Glicko 2 algorithm (which is inspired by Elo's ratings, but it's a bit more fancy since it doesn't need to be calculated by hand on paper the way Elo ratings did when they were first introduced for chess).
One thing Glicko 2 does that's better than Elo is that it can take absences from play into account, not just assuming that your base skill level remains constant forever if you're not competing. It does that by having an "uncertainty" rating for each player's score, which grows as you take more time off. A conservative rating then subtracts (some multiple of) the uncertainty from the base score before comparing players. This means that somebody like Asanoyama or Abi who fall way down the divisions during a suspension gets a slightly lower conservative rating after their absences, and the confidence of their rating only stabilizes after they've done a fair amount of climbing back up the ranks.
I'll see if I can get a larger set of sumo data and run my algorithm over the whole modern 6 basho era and get lifetime best scores to compare to yours. I suspect they'll be very similar! I know I verified that Hakuho had the best ever score in my dataset around the time he retired. That was, as you said, a smell check to make sure my code's output made some degree of sense. I'll need to write a bit of extra reporting code to extract a similar list of the top scoring wrestlers of all time, but it shouldn't be very hard, once I have gathered the data.
Yeah that sounds like a better way to get at the same question (who is the best sumo wrestler, at various given times)
I honestly think at least this post is the last Elo analysis I do for a while. It was cool, but I think there's a better way to answer this and the next work I do will be on quantifying peaks.
This is interesting analysis to me, as I've done some similar work to assist my "Fantasy Sumo League" play. I used a slightly smaller dataset (from 2000 to the present, which was enough to cover all current top-division men's full careers) and the Glicko 2 algorithm (which is inspired by Elo's ratings, but it's a bit more fancy since it doesn't need to be calculated by hand on paper the way Elo ratings did when they were first introduced for chess).
One thing Glicko 2 does that's better than Elo is that it can take absences from play into account, not just assuming that your base skill level remains constant forever if you're not competing. It does that by having an "uncertainty" rating for each player's score, which grows as you take more time off. A conservative rating then subtracts (some multiple of) the uncertainty from the base score before comparing players. This means that somebody like Asanoyama or Abi who fall way down the divisions during a suspension gets a slightly lower conservative rating after their absences, and the confidence of their rating only stabilizes after they've done a fair amount of climbing back up the ranks.
I'll see if I can get a larger set of sumo data and run my algorithm over the whole modern 6 basho era and get lifetime best scores to compare to yours. I suspect they'll be very similar! I know I verified that Hakuho had the best ever score in my dataset around the time he retired. That was, as you said, a smell check to make sure my code's output made some degree of sense. I'll need to write a bit of extra reporting code to extract a similar list of the top scoring wrestlers of all time, but it shouldn't be very hard, once I have gathered the data.
That's really cool, and thanks for reading!
Yeah that sounds like a better way to get at the same question (who is the best sumo wrestler, at various given times)
I honestly think at least this post is the last Elo analysis I do for a while. It was cool, but I think there's a better way to answer this and the next work I do will be on quantifying peaks.