The Banzuke. It’s a large list of ordered names showing who is at the top, and who is at the bottom of the sumo world. Luckily, we have those Banzuke going far back through history, and at Ozeki Analytics, when we have data over time, we like to take a look at them and see what insights we can tease out.
This is the first in a series of posts where I’ll examine the Banzuke construction and see what we can learn about predicting them. Today’s post will be looking at the first of a couple of high level ways to try and tease out what a Banzuke might look like if you know the prior tournament’s wins, losses and ranks. In a future post I’ll look at the determination of promotion and demotion between Makuuchi and Juryo. Perhaps even further down the line, if it feels warranted or gets requested, I might look and see if promotion practices across divisions are similar, and similarly, if it’s easier to rocket up or tumble down one division vs. another.
So what is the dataset and why did I choose it? Great question! I’ve noted this throughout many of my historic posts, but the controversy over Futahaguro left a very deep impression on the sumo world. As such, I decided to look just after 198905 (May), the first Banzuke made with him out of the sport to November 2023. It’ll come up later but I didn’t include January 2024 so that I had out of sample data - I’ll discuss that in due course.
How to determine ranks using the previous Banzuke? In fact, this is a couple of problems. First, who will the Yokozuna be (we have some ideas here). Second, who will the Ozeki be (we also have some ideas on this). Then you can begin the business of ordering the rest of the wrestlers. And to be fair, there are some further questions over who receives the Sekiwake and Komosubi titles, but that’s merely the name. Everyone below the Yokozuna and Ozeki can be ranked relative to each other on a separate scale, and that’s how I coded it up. So Sekiwake 1 East is the top ranked non-Yokozuna, non-Ozeki wrestler and has a rank of 1. Sekiwake 1 West is 2, and so on and so forth. So I have the relative rankings and win totals for each wrestler from 198905-present and I tried two different methods to generate a new Banzuke.
Starting with my first one, I did a basic linear regression. The idea is, I fit a model with the independent variables based on relative position and number of wins and losses. The dependent variable - what we’re trying to predict - is the position in the next tournament. Essentially, what’s the best equation I can have for:
Rank Next Tournament = coefficient 1*(Rank in Current Tournament) + coefficient 2*(Current Tournament Wins) + coefficient 3*(Current Tournament Losses)
The set-up I went with isn’t quite the same as the above, so if you want to get into the nitty gritty and learn some things about modelling, then check out the Appendix (ctrl-F should take you there). And in fact, the form of the equation I ended up using is:
Change in Tournament Rank = coefficient 1*(wins - losses)
So pretty similar. Now (wins-losses) is our x, and we’re trying to find Change in Tournament Rank, our y. I’m running a Ordinary Least Squares (OLS) Linear Regression and basically it tries to calculate a coefficient 1 (from above equation) that minimizes the errors.
What do the actual results for that look like?
No joke, when I was showing the results to a finance friend, she asked where she can bet on sumo with how high the R-squared is (R-squared is basically a measure of how good your model is, and the closer to 1 the better; .83 is really good).1 So our equation of what the change in rank will be is:
ΔRank = -1.717 * (Wins - Losses)
And then, we add that ΔRank to their existing rank within the non-Ozeki, non-Yokozuna Banzuke which gives us what I’ll term a ‘Rank Score’.
Rank Score = ΔRank + Existing Rank
The Δ here is delta, and is often used in equations to indicate a change.
You’ll recall that Sekiwake 1 East, the highest rank in the population we’re looking at and testing for, is 1. That explains why the coefficient for x (wins-losses) is negative; in this instance it’s like golf where a lower score is better. Let’s apply this to November 2023.
Daieisho Rank 2023-11 = 1 as Sekiwake West
Daieisho Rank Score = -1.7171*(9 Wins - 6 Losses) + 1
-1.7171*3 + 1 = -4.1513
Whereas Kotonowaka at S2E’s Rank Score = -1.7171*(11-4) + 3 = -9.0197
Again, it’s golf scores so per the model he should be in front of/ranked higher than Daieisho and in fact in January he would be.
So let’s take a look at the Hatsu ‘24 Makuuchi Banzuke using this method.
Would it win the Guess the Banzuke contest? No, but this turned out fairly solid actually. The biggest miss is on Kotoshoho, and as of yet, the regression doesn’t take into account the hard luck that Juryo Rikishi get on their Banzuke placement following promotion. The only other rank that missed by more than 2 with this method was Churanoumi - this one I don’t have such a clean explanation for. Sorry the model output results didn’t work out for you - luckily the actual world was kinder to you, Churanoumi.
If you add up the absolute value of the misses by the OLS it comes out to exactly 38, or off by 1 rank on average (and remember, that’s including East and West too, so not too shabby). One thing that surprised me was that it actually did fairly well in the region around the borderline between Makuuchi and Juryo. I thought that’d be a problematic area, but in fact there are quite a few perfect matches.
I think this bears further investigating, so I might try to re-work some code and operations so it’s easier to set-up Banzuke predictions and comparisons. Like I said, it looks like the area around the Makuuchi/Juryo is fairly accurate, but I’d like to actually test that and see if that’s true on average across 10 or more tournaments for instance.
With this post getting a bit long, I decided to split it up. I’ll go over a different method in another week and will discuss how it differs, and why this model works better.
Thanks for reading guys, and I’ll leave you the:
Appendix
So I actually did run the OLS initially like the first equation I provided.
Rank Next Tournament = coefficient 1(Rank in Current Tournament) + coefficient 2(Current Tournament Wins) + coefficient 3(Current Tournament Losses)
Y is the rank next tournament (rank) and Non_Special_Rank is the current tournament rank.
This is actually a good example of things to watch out for or think of if you’re doing similar modelling, or looking at model results.
The R-squared is higher, but there are some other things that should give us pause
Check out the Cond. No. of 12.4 here vs the Cond. No. of 1 in the final model higher up
Cond. No. is Condition Number, and generally speaking, the higher it is, the more sensitive it is to changes in input data and can lead to larger errors
Another thing that made me stop and ask my more mathematically inclined friends some questions is that the rank has a coefficient of .9588. I was thinking that the coefficient ought to be 1, but also wasn’t sure how or why to get that. That’s when one of them suggested that I take the difference in ranks and use that as the y I’m modelling for. I think there are reasons to think that having the current rank in the model could be positive, but also, it makes the model more complex. Again, I’m not a statistician, and context is very important, but generally the simpler, the better at least when you’re starting off. It’s usually easier to add more variables to see if you can improve the fit later vs starting off with a more complex model where you can’t be certain how all the features (fancy word for independent variables or inputs) are interacting with each other.
There is a similar story to be had with the wins and losses too. Is it worse to have a loss than it is better to have a win for your rank? That’s what the fact the (absolute value) win coefficient is lower than the (absolute value) loss coefficient potentially communicates. That is a proposition we could test one day, but it’s not the proposition for today.
In fact, there’s another problem with the wins and losses as independent variables. It’s that they’re not independent. There are only 15 matches and you can only win or lose, so a win means one fewer loss, and vice versa. This is a situation where the variables are collinear. Long story short, you want independent variables because again, if the variables affect each other, then your model is liable to overfit. Overfitting is something we worry about because it means the model is likely fit to the data you have, but might not necessarily fit to data that is out of your sample (the data you input). This is not uncommon in finance. There are legion stories of stock-picking models that worked quite well in the backtest (applying the model to historic data) because it’s based off historic data, but once it’s actually in use, the present day is different from the past, and the model underperforms vs. what the portfolio manager expected.
So to continue simplifying I switched to using Wins-Losses. Again, simpler is usually better. So let’s check out what the results look like in this method vs the final OLS and the actual ranks
I’ll save you the math, and this OLS where we talked about all the methodological potential issues actually only had 36 total ranks of misses aka 2 fewer than the “good” OLS (hence the cheeky “Bad” label above). I think that’s a fitting final lesson: sometimes the model that’s messier or less simplified and “shouldn’t” work better can in fact do that. Still, as someone in finance, I can promise there are tons of very smart people that have lost millions and billions because they used models without fully understanding them or improperly specifying or overfitting (among legion other stats illusions you can fall prey to). As such, I’ll stick to my simpler model. After all, this is just a more basic look.
Please note throughout this post I play quite fast and loose with explaining statistics. It’s less that I’m saying anything wrong per se but rather that I’m not hedging statements as much as I should in the interest of readers’ ease. For instance, there are plenty of reasons a model could have a high R squared while also actually not being a good model. Still, here I think it is actually indicative of a good fit, and that will be shown in the results