Banzuke Prediction - Lower Divisions
From the lowest Jonokuchi all the way to Makushita - how does your record affect your place in the following tournament?
Hi all! We previously looked at the ranks of the Sekitori (the paid wrestlers in the top two divisions)1 but there are also lower divisions. In fact, part of the appeal of sumo is its meritocratic nature. Everyone starts in these lower divisions, and unless you win more than you lose, you’ll stay stuck there too. So today I took a look at the four lower divisions consisting of Jonokuchi, Jonidan, Sandanme, and Makushita. We’ll compare and contrast the divisions and I’ll also run a basic test comparing what our results would predict for the Haru ‘24 Banzuke using the Hatsu ‘24 results to show how well (or not well) it does. Finally, I’ll also try to include some model discussion so that if you’re here for statistics you can also learn some from there. There’s even a bonus section at the bottom where you can think about some actual problems I encountered while coding this up - could be nice practice for thinking about these kinds of problems.
So let’s start by just laying down some background on this. If you’re a young Japanese boy, and you go straight into sumo wrestling, rather than going into college, then you’ll debut in the Jonokuchi.2 As stated above, if you win more than you lose in a tournament, you’ll make your way up the divisions, so from Jonokuchi to Jonidan, Jonidan to Sandanme, and Sandanme to Makushita. In all of these divisions, wrestlers only fight 7 times per tournament, so 4 wins or more and you’ll make your way up the Banzuke.
Now we know the basics and with those basics in mind, we can get to formulating our question. When you’re modelling, it’s very important to know what question you’re asking, or what problem you’re trying to solve. It sounds a bit obvious, but articulating your goal is always going to help because it’ll give you a concrete goal you’re working towards. In this case, our question is “In the lower sumo divisions, how do wins and losses affect your position in the following tournament.”
With our question prepared, we can now think about how we’ll test this. In this case, I grabbed all the Banzukes starting in 2001 and ending in 2023, fully inclusive. So we have 82 tournaments (recall there was one cancelled in March 2011, and then one for Covid). With 82 tournaments I next rearranged the data so that I not only had each individual wrestler’s rank at a given date (for instance, 200101 for Hatsu in January 2001), but also their rank in the following tournament.3 Additionally, I have the number of wins and losses they had in that tournament. So we now have all the data to answer our question above.
Data and question prepared, what we’ll use to model it is a basic linear regression. We’ll have input data (number of wins and losses) and we’ll use that to predict the rank the following tournament. But actually, it’ll be slightly different. To be precise, we’ll be using net wins and losses (so 4 wins, 3 losses is net 1). Furthermore, we’ll also use net change in rank. So if you were 75th ranked wrestler the current tournament, and the following tournament you were ranked 64th, that would be a net rank change of 11. There are 42 wrestlers in Makuuchi and 28 in Juryo so 75th is upper Makushita, 64th is lower Juryo. If you’re curious about using the net wins and rank change as opposed to wins and losses, and rank in current tournament then rank next tournament, I’d recommend going over the piece linked at the top where it’s discussed in detail. But long story short, it’s about answering our question more directly and making our model cleaner.
Finally we can update our question to what we’re actually testing: “From 2001 to 2023, using net wins as our X, or input data, how well does that explain/predict net rank change, our Y, or output data for each of the lower sumo divisions.” It’s perhaps not the simplest formulation, but it does have the advantage of being the actual tests that we’re running. So at the end we’ll have the following equation and coefficient for each division:
Predicted Rank Change (Y-hat)4 = Division’s Net Win Coefficient * Net wins (X)
And after running all of the numbers, here are the coefficients, and R-squared, which is a measure of how well our equation does vs. the actual real world results.
So if you went 4-3 in Jonidan, that would be 1 net win and I would expect that wrestler to go up 20 ranks. Please note that each position in the Banzuke is a rank, so Makuuchi 1 East is one rank away from Makuuchi 1 West. In this case those 20(.4 technically) higher places would translate from going from Jonidan ~34 to Jonidan ~14 per our equation above.
So I think excluding Jonokuchi, our results look fairly sensible. The lower the division, the faster you’re able to go up (or down!). Furthermore, the R-squared for all of them are fairly high. It won’t perfectly explain all the movements, but it’ll explain a whole lot. See footnote for R-squared aside if you want me to give more caveats, otherwise this works for a stylized analysis.5
Now Jonokuchi? What the heck is going on there? I promise I was asking that with greater urgency to myself than you’re probably considering it right now. After some digging, I do think I have the answer. Some background info, as it’s the lowest division, there’s a lower bound or in other words, you can go up quite a bit, but you can only go down so far (as there’s no division lower than it). Additionally, I found this out from Wikipedia, “Jonokuchi is the only division in which wrestlers are semi-regularly promoted even with a losing record; promotions to the next highest jonidan division with a losing record are especially common for the May tournament when there is the large influx of new recruits.” The effect of both of those means that having a net losing record won’t result in going down the Banzuke as much as it would in other divisions.
So we have some logical explanations, and I can supply some numbers that’ll help explain this numerically.
Same table as above but we now have the standard error included. As we can see, the standard error, or standard deviation is much higher in Jonokuchi than the other lower divisions. Standard Deviation is just the square root of variance (often expressed via sigma: σ, with variance being sigma squared)6. Variance is exactly what it sounds like, how much does the data differ from the average as a whole. So the fact our data varies more doesn’t necessarily explain it, but it’s a start.
I think the real explanation to be had in the data is the image below it with the blue column headers. We can see that the rank changes from Jonokuchi are just way more extreme than the other lower divisions. It makes sense with what we saw above how there’s a lower bound on how much lower your rank can be, whereas in theory there isn’t an upper limit to how much your rank can rise. Looking through the data, at a high level, I’m fairly satisfied with the explanation but am open to alternate theories.
Finally, let’s see how this actually works in practice. Last time around I used the top 50 wrestlers to test how well the OLS worked. This time it’s a bit harder given the number of wrestlers covered is 530 or 10.6x as many. I’m still working to put together the infrastructure to quickly test out of sample but in a pinch, and wanting to get this out, for now what I did was just rank all wrestlers based on their division and the derived coefficients. I used September ‘23 predicting November ‘23 and then took their predicted order and compared to their actual order in November ‘23. It’s not super rigorous but I think looking at the average error and median error does paint a pretty good picture of how well it worked, and is alright for our inaugural effort into predicting lower divisions.
So the higher divisions are actually slightly biased downwards (i.e. will predict a worse rank than the actual rank) whereas Jonokuchi fails in the opposite way by underestimating how high wrestlers can rise out of the division. I also included the median error because I think that shows that the median isn’t too far. And keep in mind that in this scale, Makushita 1 East is one rank from Makushita 1 West. So having a median error of predicting a guy would be Makushita 1 West when in actuality he should be Makushita 1 East in my mind is a solid start for a basic OLS with only one dependent variable (net wins). OLSs can be quite powerful indeed!
Hopefully this was interesting and a good start. I’ll have more in the near future including a full Banzuke prediction from Makuuchi to Jonokuchi!
Bonus Practical Questions and Answers
Q1: SumoDB where I scraped the data from actually doesn’t list just wins and losses. It also includes a category for when a wrestler sat out that day. Say a wrestler competed the first two days and won, got injured and lost on day 3, and then sat out and lost the remaining days, then their record would be displayed as 2-1-4 (2 wins, 1 loss, 4 losses by default due to injury non-participation). Would it cause an issue if I I used just the wins and losses without accounting for the injury losses (aka the 4 in the example above)?
Q2: Why would wrestlers who retire cause issues if I don’t exclude them?
Scroll down for answers
A1: Yes! Think about it: when determining the Banzuke, all losses are treated equally - or at least at a high level; there’s another research question! So to use our 2-1-4 wrestler, if I didn’t make any adjustments, he would go into the model with net 1 wins. However, he would’ve been judged based off his 2 wins and 5 losses when setting up the Banzuke, and would not be going up. If we didn’t exclude or regularize this data, then it would throw off our calculations. I believe when I forgot to do this, the R2 for the lower divisions was .006. Compare that to my regularized data (where I added injury losses to ensure that everyone had 7 matches between wins and losses) above where the R2 is 10+ times that. These are important things to consider because it makes it very easy to have a poor model if you don’t account for quirks like this. In this case I was lucky the R2 made it obvious the model was wrong, but it’s very easy to accidentally not account for something like that and have flawed outcomes that are more subtle as a result.
A2: This one is a bit open ended because it depends how you code it up. In my case, if a wrestler was retired, then they went into the database as having a next tournament rank of 0. If they were included then that would be bad data that would throw the model off - I’m hoping it’s relatively simple, but if you take 5 minutes to think about it and aren’t sure why feel free to reach out and I can update this piece with it. If they just had blanks for next tournament, then it could potentially cause an error for having null data.
I might take another run at this in the future. This exercise I look at each division separately, but for the prior post I treated Makuuchi and Juryo the same. I’ll revisit it and if the difference is large, or if I feel I have something to add to the prior modelling then I’ll probably write that up
Due to American success in sumo, sumo instituted rules to reduce the number of foreigners. They eventually settled on a rule that only one foreigner is allowed per sumo stable. That’s why I specified Japanese boy, and as for going straight into it, I believe they have changed the rules lately, but at least historically, amateur success would allow you to begin higher than Jonokuchi
Keeping up with the more statistical parlance, it would be common to call the tournament in question, continuing with using the Hatsu January 2001, T, and the next tournament (March 2001, Haru) T+1 because it’s one period in the future relative to T. You could carry this further with T+2 being May 2001. Conversely, you can go the other way too with T-1 being valid and referring to November 2000.
If you want to learn more about Y-hat and statistics in general, I’m working on a series going through a classic textbook of statistics
R-squared is a measure of how well our derived equation and coefficients work in sample. That’s important because it could potentially do poorly out of sample. So that’s something good to watch out for. Furthermore, there’s lots of situations where a high R-squared can be misleading but I think in this case that doesn’t apply. Still, I do like to include these caveats. Having things like this top of mind when modelling or looking at others’ statistical results is helpful.
I'm in awe of your ability to digest and interpret all this data. Very interesting stuff!
I think the example of 20 ranks going from Jonidan 34 to Jonidan 14 is wrong. Since the count is half-ranks, a 20 rank increase should be to Jonidan 24 (or maybe 23w, if you started at 34e and round 0.4 fractional part to 0.5).
I wonder if excluding the Makushia joi, which has special rules for getting promoted to Juryo, might improve the predictions for Makushita as a whole. If you started the sample from the 81th ranked rikishi (Ms6e) instead of the 71st (Ms1e), the numbers might line up better.