A common theme of Ozeki Analytics is taking that sumo we love seeing on the TV and trying to uncover the logic behind what is occurring. How do they decide who will be Yokozuna? My favorite guy got his Kachikoshi, how much higher can I expect him to rise next tournament? Those questions are the bread and butter here. Today I will be trying to see if I can replicate the hard work of the tournament organizers in trying to match the wrestlers up against each other. For now, I just have a relatively basic version that covers Makuuchi and Juryo, but if you have been following us for long, you probably know that we are always looking to build on what we have and improve, and I’ll also cover some expected future updates which will make this even more useful and accurate. Hopefully this can also serve as a nice intro to the art of match making basics. It’s divided into sections to make it easier - no worries if you want to skip over the methodology section. It’s probably the drier part. It follows that with the results of my algorithm compared to the actual Natsu Basho in May ‘24. Finally there’s a prediction of what Day 1 of the Nagoya July ‘24 Basho will look like using our algorithmically generated Banzuke prediction. I think it’s pretty cool that at this point we’re able to do it 3 weeks out from Day 1 and with the official Banzuke not even released yet. And furthermore we can at least have some numbers and method behind our predictions. As far as I’m aware, nobody else in English is taking this systematic approach, so if that’s correct you’re reading something and getting analysis that literally can’t be found anywhere else. I think that’s neat.
Determining Matchup Basics
So let’s cover some basics: at the top of the Banzuke we have the Joi. The Kanji are: 上位. I actually speak a touch of Japanese and to go into it: the first character means top (literally and metaphorically) and the second character means place/position. So the Joi is akin to top rank or upper ranks which is how it’s usually translated. I did some light research on this term’s usage in English and it seems there isn’t actually a consensus, so I’ll define it here in a way that I think most people would be fine (enough) with: the Joi are the wrestlers who would be reasonably expected to face a Yokozuna. The Yokozuna (if there’s only one) faces the top 15 guys so with some basic assumptions it would be the top 16 wrestlers; we’ve actually covered how there can be a variable number of Ozekis, Sekiwake, and Komosubi (and Yokozuna for that matter) and how it’s determined before, but generally speaking if you’re at Maegashira 1 or 2, you’ll be in the Joi. It can include lower Maegashira too, but those are some basics.
There is a slight exception to this where during the latter stages of the tournament, say day 10 or later: they will begin making matchups more based on record that tournament. It’s common to see a guy from Maegashira ~12 or lower unexpectedly be at 8-2 or better and begin facing Ozeki instead of guys at a similar ranking to him at the bottom of the Makuuchi. That’s the basic idea and for now I’ll leave the research on this till later. I’ll talk about methodology below, but if you’ll let me geek out on that future research. I think a good question I’ll try to answer will be: can we try and figure out which day they start using record over “similarly ranked guys” so to speak. This also further explains why I’m holding off on the lower divisions for now. It’s one thing to deal with matching up 42 guys or 21 matches (potentially fewer with injuries) and then 28 guys/14 matches in Juryo; Makushita alone is more matches than those two combined. But we do look forward to this challenge in the future at Ozeki Analytics.
I believe wrestlers of the same stable1 are not supposed to face each other which should be the final exception. I further believe the protocol is they will not face each other unless there is a tournament on the line (i.e. there are playoffs to decide the Yusho). I’ll eventually look to add that depending on how easy that is to keep track of, but for now I will not bake that in to any predictions.
Methodology and Limitations
If you’re familiar with how we predict the Banzuke here, it’s the same principle where instead of thinking of a guy as Yokozuna 1 East, we’ll instead think of him as the 1st ranked wrestler. Then Yokozuna 1 West (if there is one) will be 2 or second. Going forwards I will refer to it as the Absolute Rank as it goes down the whole Banzuke. So there are 42 wrestlers in the Makuuchi, and then Juryo 1 East will be Absolute Rank 43. This is helpful as there can be crossover between divisions. For instance, it’s not uncommon to se a higher ranked Juryo wrestler face a Makuuchi wrestler due to injuries.
I used data from 1988-2023 all inclusive. If I queried the data I was using correctly, with just the Sekitori (Makuuchi + Juryo) that was 100k+ matches alone.
Each day, I start at the top of the Banzuke, Terunofuji for now, and use his rank 1, and then take the median Absolute Rank that the first ranked Wrestler on day 1 of all previous tournaments faced. So in this case he’s “predicted” to face the 10th ranked wrestler M1E Atamifuji (using Natsu ‘24). Then simultaneously Atamifuji is assigned Terunofuji as his opponent. However, what if Atamifuji had pulled out so there wasn’t a wrestler available at Absolute Rank 10? I then go by most common opponent rank, then second most common and so on. That’s it. Then I do that for the wrestler ranked 2 and so on. I will say even just doing the top two divisions, I ended up having guys left matchless as I was experimenting with later days. In those cases I just matched the remaining guys up till everyone had an opponent.
So it’s:
Median opponent that was faced by wrestler at the same rank in the past on that day of the tournament
(if wrestlers remain) Most common opponent by rank that wrestlers at the same rank in the past faced
Continue to second most common opponent and so on
If there are still wrestlers without opponents, then pair them up with the two highest facing each other and so on
I wish I could’ve done something mathier than just median and most common. I will say I did some basic research into doing tournament systems like this, and the writing seemed thin based on what I was looking for. If you know of any papers that would have applicability here, I’d greatly appreciate you reaching out. I’m still not quite sure how I’ll model out using records to match wrestlers. Also, given that there are two different matchup regimes (using Absolute Rank the first week or so then record later) I’ll need to determine when the record becomes the deciding factor. That’s probably a post of its own.
A potential limitation is that the number of wrestlers in each division changes over time, that might be a worthwhile feature to add. It also might not. I definitely have a lot more planned for this. I will look to put together a quicker way to be able to test how well changing the data subset that the median (or whatever function we end up with) uses. But as is, there’s only so many hours in the day.
Results
I did make an assumption here that I will know who has withdrawn each day; I think that is fair.2 Other than that I just look at how many of the match ups the algorithm correctly predicted. This is just supposed to be a sampling but I’ll look to improve the number of tests I can eventually do.
Please note that I did this across some notebooks and various csv and Excel files, so I spot checked if it’s all correct and it seemed good, but I’ll update in case any of this is wrong. Just wanted to say that because I was having trouble dealing with cross checking if two different pairs match/how to order them:
First off, as I only used wrestlers in the Makuuchi and Juryo, it’s interesting they weren’t aggressive in bringing guys up from Makuushita till the final day. I’ll have to look into that further to see if it’s just a Natsu ‘24 thing or not.
I need to figure out how to properly score it better than this. Still, I think it’s a pretty good start and seemingly the first day it knocks it out of the park.
I’d also like to note, I did look at Day 3 and a few of the lower scoring days, and while it didn’t get the matches perfect, they were in the ballpark. So I’ll need to come up with a better method to look at accuracy; I’m thinking try to find the rank difference in predicted opponent but actual opponent. Again, there are only so many hours in the day and this was actually one of the more labor intensive projects I’ve undertaken here but you can rest assured we’ll have improvements upon this.
Prediction for Nagoya Basho Day 1 Using This Method + Banzuke Prediction
I’ll take a look at how well this pans out. If there’s an injury I’ll re-run it and I’ll probably also put this out once we have the confirmed Banzuke. If there’s a substantial update to the algo/method I’ll definitely check how it did against this and will update you all.
Hopefully you enjoy this. I could be mistaken but I think it’s pretty cool how far we’ve come that we can have a reasonable prediction3 of what Day 1 of what the Nagoya Basho will look like match up wise and it’s still close to a month from now. We don’t even have the official Banzuke. If you want to see their predicted Banzuke rank (i.e. Ozeki instead of just Rank 2) that’s linked here but just getting it looking neat like this was a major lift on Substack so not pressing my luck. Cheers!
Rank Incumbent Name Rank Challenger Name
1 73rd Yokozuna Terunofuji Haruo 10 Meisei Chikara
2 Kotozakura Masahiro 9 Hiradoumi Yuki
3 Hoshoryu Tomokatsu 11 Atamifuji Sakutaro
4 Takakeisho Takanobu 12 Takayasu Akira
5 Kirishima Tetsuo 13 Wakamotoharu Minato
6 Onosato Daiki 14 Gonoyama Toki
7 Abi Masatora 15 Ura Kazuki
8 Daieisho Hayato 16 Onosho Fumiya
17 Tobizaru Masaya 18 Oho Konosuke
19 Mitakeumi Hisashi 20 Takanosho Nobuaki
21 Shonannoumi Momotaro 22 Kotoshoho Yoshinari
23 Sadanoumi Takashi 24 Kinbozan Haruki
25 Tamawashi Ichiro 26 Ryuden Goshi
27 Shodai Naoya 28 Midorifuji Kazunari
29 Oshoma Degi 30 Nishikigi Tetsuya
31 Ichiyamamoto Daiki 32 Asanoyama Hiroki
33 Hokutofuji Daiki 34 Churanoumi Yoshihisa
35 Takarafuji Daisuke 36 Kagayaki Taishi
37 Chiyoshoma Fujio 38 Endo Shota
39 Wakatakakage Atsushi 40 Bushozan Kotaro
41 Roga Tokiyoshi 42 Nishikifuji Ryusei
If you know of some way I could share an excel or csv I could potentially provide all 15 days of this if for some reason people were interested in that, be it applying the algorithm to Natsu or using the prediction for Nagoya. In fact, I’ll probably look at the algorithm and how it does against the sumo every day and post about it at least once.
Sumo stables are called Heya and an Oyakata, a retired sumo wrestler - often a prominent one - runs it where a bunch of wrestlers live and train
Although this does mean I might eventually look about modelling out injuries. Maybe if we’re lucky we can even get some data that would be useful like recovery rates and injury rates
We know it did well for day 1 in Natsu, and there’s an explicit logic behind all our decisions
Great stuff, as always. If we get a Day 1 match-up of Terunofuji vs. Meisei, I think that will be very telling of what Terunofuji's tournament will be like. If he's in fighting shape, Teru should win that. If Meisei (who does pretty well against him) gets the win, that's a sign that our yokozuna probably won't last the tournament.