Ozeki Analytics - One Year Down, Many More to Come! 2024 Review & 2025 Preview
Thanks for reading and happy 2025! This piece will be looking back on all that we accomplished over 2024, and a preview of some of the goals we have for 2025. I know some of these pieces can be a bit boring to readers so I will try to make it so that the throughline between what’s been done last year will hopefully lead into what the dream 2025 for Ozeki Analytics looks like. I’ll also try to explain some of the underlying philosophy to the blog. I think that’s helpful.
But first I want to say another thank you! The reception I’ve received has been almost universally positive, and I think that even some of the detractors might be coming from a place of frustration that what we’re presenting here isn’t in its final form yet. Worry not, we aim to continue to advance the frontier of analytics and data on sumo wrestling.
Ozeki Analytics’ Long Term Goal
I can distill this project to a single goal, and that would be: “to be able to indefinitely simulate sumo tournaments into the future with a relatively high degree of plausibility for all the underlying assumptions required for that.”
That might be a bit technical sounding so I’ll phrase it a bit less concisely while also highlighting some important considerations I keep in mind. This blog has a statistics focus. By looking at past results and data related to sumo, we can begin to understand the general rules, exceptions, and unknowns in our sport.
Sometimes we have fairly explicit rules - promotion to Yokozuna, the apex of the sport, requires two Yusho equivalents. However, those rules aren’t also as ironclad as they seem or can be massaged if you prefer it that way. If there aren’t many Yokozuna, and a wrestler has hit 26 wins over 2 tournaments (hard win requirement - so a rule) then that Yusho (tournament win) equivalent can be a bit squishier. That might be a good example of an exception or perhaps even an unknown. quick note: don’t get too hung up on exceptions/unknowns - it’s just a way to express the inherent uncertainty we’ll have.
In fact, a better example of exceptions and unknowns would be the Banzuke (the ranking of every wrestler) making process. If you have spent any amount of time working on Banzuke - Guess the Banzuke for instance - you’ll know that while they follow general rules and guidelines in putting the Banzuke together, what will determine who guessed the Banzuke best each time involves some luck. They also have been known to make a couple quirky decisions each tournament. In other words, there are exceptions and unknowns.
It’s similar to trying to figure out who will win each match. Using Elo (a chess system) can provide a good start. Furthermore, our intuition is often good; Hakuho is gonna win more matches than he loses. But we also can’t know every match; Hakuho did in fact lose matches. By the way, better prediction matches is actually a major goal for 2025.
So we look to be able to have a framework in place to predict everything related to sumo from who will win matches, to what the matches will actually be, to how that will affect the following tournament.
Furthermore, this framework should be based on the existing data we have about sumo and we should be able to test how well our predictions compare to the reality. This will allow us to find areas for improvement and further research.
I thought putting it this way might help. A testable framework is preferable to non-structured guesses. Frameworks can be improved upon and weaknesses identified. Semi-structured (at best) guesswork cannot (so easily).
The above is for me. It’s not a hard and fast rule. But also, when you’re talking about repeatable processes - which is required for indefinite simulation - then you do need this kind of process. There was a minor controversy over some Banzuke predictions I posted last year. I know the algorithm requires further refinement, but its very existence is a goal in and of itself. Without the algorithm, the kind of improvement we’re looking for isn’t possible!
So if 2025 is a success, by the end of year, we’ll be able to simulate till Atamifuji (or whomever is the last current wrestler) retires in the 2030’s and ideally the results will look realistic from the matchups, to the number of new rikishi, to who is winning matches and so on.
It’s ambitious I know, but that’s what we aim for!
2024 Achievements - Where We Got to in 2024
Looking at the above, honestly we could probably pull that off.
If you want the minutiae of everything covered, the Sumo Knowledge Encyclopedia has been updated to have everything from 2024 in it. Even better, it’s organized by topic too!
But to go a bit in, we began by looking at what determines what is required to be a top dog - Yokozuna or Ozeki - promotion-wise. We also looked at how many Sekiwake and Komosubi there should be each tournament because that’s also not pre-determined/allotted. In fact, we did a decent amount looking at what determines the Banzuke.
We also took a look at how wrestlers careers begin and end. We have an algorithm with some decent results that can predict retirements, albeit after each tournament is done. We even looked at when wrestlers peak - about age 27, similar to other sports, but mathematically confirmed now.
Now to begin the transition: we also looked at determining matchups, which will require further work. It actually only covers the top division and I need to do further testing and verification. In fact, I also have a few improvements for the Banzuke prediction algorithm in mind too. Finally, determining who will win matches also requires further work.
2025 Roadmap - Where We Hope to Reach in 2025
As I said above, “Frameworks can be improved upon and weaknesses identified.” We have frameworks for almost everything that we want to learn in place. But those improvements and weaknesses need to be identified and implemented.
This is less sexy, but I do need to work this year on improving my data and code pipeline. I have all the programs and data, but it’s much less smooth to run in practice. Fixing this shouldn’t lead to a large improvement in terms of how accurate estimates are (although it might by preventing manual errors caused by the messy process), at least in the short term. However, getting my environments more “productional” and “professional” for lack of better terms will lead to improved results because testing different hypotheses, models, factors and so on will be smoother and quicker. This one is more for me than the readers.
As mentioned above, there are some identified Banzuke prediction algorithm improvements. Additionally, the above mentioned data pipeline improvements will make further refinement quicker and easier.
Torikumi or the art of matchmaking requires further refinement on my part. This will also benefit greatly from the data pipeline 2.0.
Last year I began a framework to determine things like “what is mid-Maegashira” and so on. A piece you can expect to see in the near future will finish that for the lower divisions as it previously left off at the bottom of Makushita (3rd highest division).
This leads into the final major topic for 2025: determining who will win matches. I have done some work with Elo (originally a chess ranking system, but it’s widely applied across sports) but I seek to combine that with the above framework and perhaps other features (or independent variables, or if you prefer predictors (I probably won’t use it, but you could imagine using height as a predictor - i.e. the taller wrestler is likelier to win))
We also have some fun pieces planned too. I mentioned a potential “Dynasty Score” or a way to weigh who has had the best sumo run of all time (well, I mean besides Hakuho, and how those runs measured up). I also might look to do what a “Typical Banzuke” might look like by taking the guys who occupied each position on the Banzuke since 1989 the most. Presumably Hakuho will be no 1, but who was most commonly second in the Banzuke? I genuinely don’t know but it interests me, so I’m hoping that’s reflective of others too.
Finally, I do speak some Japanese and read it much better. I might eventually look to translate some important passages from books on sumo or important newspaper articles and such.
Personal Bits (Skippable)
I’m incredibly grateful how well this first year went. I actually set a goal to write an article a week for the year. If you check the archives I was actually just about doing that through July or August. Unfortunately, I do tend to trail off from a productive standpoint after summer. If I wanted to cut myself some slack I have a decent excuse that a lot of the work I do tends to be more deeply researched pieces. Those often require tons of prep work just to get the data ready much less running the models, interpreting the results, and writing them up. In fact, I have probably 3-5 articles in my drafts in varying states of readiness. Some require more research, others rewriting and others I just haven’t had the time to finish.1
It’s kind of hard to believe but we hit 100 subscribers last year starting from a total of 0. Who knew 100 people wanted to read in depth pieces on the numbers behind sumo wrestling? Well I certainly didn’t, but I have had a great time getting to know some of my readers better, and also just good interactions with fans. Related to that I will try to hopefully be more active on socials this year.
I’ll also say that I’m an ideas guy. Believe me, I have tons of balls in the air of which this is just one. That said, this is one that I have relatively large amounts of follow-through on. As I review what I still think needs to be refined and further researched, I’m actually incredibly heartened by the progress we made through 2024. While the remaining urgent areas are some of the toughest and most important, if I wanted to, I could work with the existing programs and data and simulate indefinitely albeit with no new wrestlers to replace the current ones. I actually don’t know how I’ll tackle that yet, but that’s a problem for future me and one I’d be happy to have as my most pressing.
Hopefully my other projects go well enough that I have time to cross promote them here where relevant.
Other than that, just wanted to thank everyone for reading this far. I had a few folks that took some of my work in what I see as an uncharitable light and denigrated it as such. They even had some good points on improvements to be made that I agreed on although the dismissiveness I found less agreeable. That is not at all the spirit which the majority of folks I’ve interacted with had. Almost everyone I’ve had the pleasure of meeting (online) throughout this project has been incredible. We’ve exchanged ideas and improved each other. Beyond that, we’ve had a good time. I said it before, but I’m thankful for that and thankful for the readers. I hope we can all have a great 2025 and even further! Thank you so much again.
I can’t lie though. Mid year when I saw 100 subscribers was possible, I set that as a goal, and hitting 100 before December probably didn’t help in terms of prioritizing new articles over a couple other pressing things on my plate