Sumo wrestling is a centuries old Japanese sport and tradition. Here at Ozeki Analytics, we seek to dig into the numbers throughout that history and see what kind of conclusions and predictions we can draw from that.
This piece is meant to serve as a living encyclopedia featuring research done here and elsewhere on sumo. Furthermore, it seeks to provide some basics to the sport too to help folks along their sumo journey if they would like to learn more.
I divided it into sections and you can simply scroll the bolded headers to see which if any topics you’d like to go to the research on.
Ozeki Analytics is a free substack and will remain that way. If you would like to stay up-to-date with the latest research here, then please subscribe.
Sumo Basics
In the modern day, since 1958, sumo tournaments, or Bashos, have been held 6 times a year, every other month. Each tournament consists of 15 days of matches. The top division is the Makuuchi, and the second highest division is the Juryo. Sumo wrestling can be viewed the following ways relatively simply:
https://www3.nhk.or.jp/nhkworld/en/tv/sumo/
https://www.youtube.com/@NHKWORLDJAPAN/videos
Please note that NHK World Japan’s Youtube channel is inconsistent on posting the daily matches in a timely manner.
If you are ever looking for historic sumo data, SumoDB is my go to and where I source all the data used on this blog.
https://sumodb.sumogames.de/Default.aspx
Please note that unfortunately the site is often down.
Who Are the Top Wrestlers
A redditor Raileyx maintains a terrific Elo ranking for sumo wrestling.
https://public.tableau.com/app/profile/raileyx/viz/2020-2024Makuuchi-Sumo/Overview
Elo rankings were originally designed by Arpad Elo, a chess player, to provide chess rankings and to predict who would win in any individual match. It isn’t perfect (surely no measure ever will be), but it’s still used in chess to this day. I think it’s useful and I’m incredibly grateful to Raileyx for maintaining this valuable tool.
I also tried implementing my own Elo ranking that went back through the 1950’s. It had an issue where after a certain point the point system became uncapped at the high end, but if you want a flawed but directionally coherent ranking of the top 100 Elo ranked sumo then by all means, it’s here.
Yokozuna
If you want an easy answer to who the top wrestler at any given time is, then you’d be correct to point to the Yokozuna (can be singular or plural). It is an honorific title one gains by winning, and winning a lot against other top wrestlers.
If you want to know how you reach Yokozuna, then here is a piece on what the Yokozuna had in common. Generally speaking it’s having 26+ wins over two tournaments from Ozeki and at least one Yusho (championship).
Finally, if you would like to learn about the Yokozuna’s paths to the top, you can find this piece covering when they hit some key developmental milestones such as debuting in the top division or reaching Ozeki, that is here.
Ozeki
Ozeki are the second highest rank in sumo wrestling, and like Yokozuna, there isn’t a specific criteria to reaching it.
Here is a piece on what Ozeki had in common. For Ozeki generally speaking you need 33 wins over 3 tournaments, and the final tournament needs to be from the Sekiwake position.
Determining the Banzuke
For everyone else, the tournament is decided by how many wins the wrestler had the prior tournament, and at what rank they did it at. This sheet containing the rankings of all the wrestlers is called the Banzuke. Here is a picture of it. More wins than losses and you’ll be going up higher towards the top divisions. More losses than wins and you’ll be going down toward the bottom divisions.
Here I take on the lower divisions too (being Makushita, Sandanme, Jonidan, and Jonokuchi, from highest to lowest division).
Finally, this post I discuss further limitations of the current model.
Determining Daily Matchups - More Research to Come
Here is my initial look at determining daily matches.
A shortcoming is I used the relative ranking (the Yokozuna 1 East being 1st ranked, followed by second ranked etc.) as opposed to looking at ranks by name. If that means nothing to you, then you don’t have to worry. If you understand why that’s a shortcoming, you might have already informed me of that, and I promise that’s a future post.
An additional shortcoming is how to go beyond just the top two divisions (Makuuchi and Juryo). This too is something I am seeking to remedy in the future by researching it.
When Do Wrestlers Peak?
Here’s a fun piece looking at when wrestlers reach their highest rank. The key insight is that relatively few wrestlers reach the top two divisions (Makuuchi and Juryo), so to make the data more meaningful, I distinguished by peak ranks. So if you’d like to see what age Yokozuna peak at vs men that peak out at Jonokuchi, for instance, this is exactly that.
When Do Wrestlers Debut?
We found that wrestlers from the 50’s started debuting younger and younger until they hit their youngest in the 80’s - although a 12 year old debuted in the 70’s. The other story is that wrestlers are now increasingly coming from college.
If you’re curious about Makuuchi debuts, then I have just the piece for you too! I looked at that in the ‘modern era’ (1989-present)
Here’s the piece that has the breakdown by decade.
When Will Wrestlers Retire?
As in the section above, more insight comes from dividing between men that reach the top two divisions, vs the rest of the wrestlers. This also gives us a formula that anyone can apply to calculating any individual wrestler’s retirement odds.
Sekiwake
Sekiwake is the highest position you can reach by just going Kachikoshi (more wins than losses) continuously. To reach Ozeki and Yokozuna requires achievements above and beyond simply 8 wins in a tournament.
Check the piece out here where we see how they determine when there out to be more than just 2 Sekiwake in any individual tournament. Generally speaking it’s just two Sekiwake per tournament.
Komosubi
More or less everything from the Sekiwake section applies here too. The big difference is they open up additional Komosubi slots less often than for Sekiwake. So it’ll almost always be 2 Komosubi.
Talent Research and Other Odds and Ends
This is a bit of a catch all for my inchoate research to better quantify talent in sumo. We currently have ways to do that - the Banzuke is essentially just that - using the prior tournament and wrestlers’ records, where should they be ranked? But there’s the potential for more useful categories that are less granular than - Exampleyama is ranked 1st as Yokozuna, or 13th as Maegashira X East and so on - but more useful than - Exampleyama is a Makuuchi or Juryo wrestler. This is my attempt to provide that by looking at who wrestlers might reasonably face. It allows us to at least introduce the category of “Mid-Maegashira” and so on beyond the Joi in a precise way. There is likely further refinement to be done by myself or others, but I see this as a useful stepping stone for more analysis.
Classifying beyond the Joi to the bottom of the Makushita
Individual Wrestler Analysis
I have done several pieces on individual wrestlers and their histories.
Now Kotozakura then Kotonowaka
Interesting Guys in Sumo
I only have one edition, but I intend to do further. This is meant to cover guys that aren’t in the running for Yokozuna or Ozeki currently but rather in lower divisions. Ideally we capture a few top prospects that might one day reach the top.
Statistics Posts
I consider this a sumo and statistics blog. As such I have written on statistics beyond sumo wrestling.
Here is part 1 of Demystifying Elements of Statistical Learning, widely considered one of the premier textbooks for statistics and machine learning. Furthermore, like all the blog posts, this book is freely available online! This is meant to be companion notes for someone tackling that book. The link to it is in the post, and it takes some work but better understanding statistics is quite valuable, in my opinion.
Here is a post concerning feature selection and OLSs. It’s easier than it sounds!
Statistics Posts Continued (Others)
In 2024 in the run-up to the election in America, I ran a couple pieces that concern elections and the numbers that surround them. I think if you read this, it’ll let you not only join the conversation about how to interpret numbers related to elections, but perhaps even sound smart and well informed (I certainly hope!). Please note as all content on here is meant to be, this is written from a non-partisan perspective. Well, non-partisan except I’m highly partial towards the reader learning a little more statistical reasoning.
Election Polling and Aggregation Has Problems - Looking at Election Predictions to Learn Statistics
The Problem With Prediction Markets
Please feel free to reach out to me across the platforms if you have any questions or concerns.