Classifying Wrestlers Into Archetypes via Kimarite - 50th Post
Today’s post I am incredibly excited about and anxious for on a few levels. I’ve been fairly open with how I conceive of this blog: I think that a lot of the data driven analysis that has become commonplace in other sports is missing from Sumo. I try to search around if someone has covered topics I’m writing about and often the answer is no - or at least not in a data driven way. Today we will once again be breaking new ground which pumps me up, but on the flip side I’m also concerned that I don’t have much to check my work against or reference to ensure my thinking and models are sound. I just wanted to get a full disclosure out there to open because I know there’s far more work to be done on this topic.
We’re using statistics to create archetypes of sumo wrestlers! Every match of sumo there is a Kimarite - or winning technique - that is used. If you watch enough you’ll start to get familiar with the more common ones such as Oshidashi and Yorikiri (frontal push out and frontal force out involving belt respectively) but in total it’s over 80 different techniques. Using K-means clustering (will explain in next section) we’ll look at ~2000 wrestlers and their kimarite to try and create archetypes of sumo wrestlers.
I alluded to before there are some general archetypes out there - pusher-thruster, belt grappler, etc. - but not based on repeatable processes and hard data rather experience and intuition. I’ll be clear that I think those are quite valuable too, but as the author of a sumo statistics blog I think there’s merit in using data to build on that.
As usual I’ll begin by explaining the data used, the overall process, and any considerations or decisions I had to make along the way. If you just want the data on the archetypes and how they differ in types of kimarite they win by that can be found at the bottom. I also include a table showing the 42 wrestlers in the Makuuchi for January ‘26 and how they’re classified. Finally there’s a table of Yokozuna in the dataset and which archetypes they fall under.
Data Used and Techniques Explained
As usual I am using wrestlers from 1988 to present. 1988 is right after Yokozuna Futahaguro stepped down and provides a clean, convenient marker for the modern era. Within the subset of wrestlers from 1988-present, I then looked at every wrestler who had at least 100 matches with defined kimarite. In all that left 1,940 wrestlers. With the kimarite you can actually find which kimarite the wrestlers lost to as well. For instance, Examplenofuji won 20 matches via Yorikiri and lost 10 via Yorikiri. Let me show you what the data looks like:
As a note I ended up looking only at wins. I tried going through this process with Wins and Losses but it performed worse than using Wins alone. I do think there could be some value in knowing how wrestlers lose, but at least for this exercise and implementation I’m unsure how to improve results with that information.
Next, I converted these to percentages. It’s important to compare what percentage of matches wrestlers won with a given kimarite vs how many matches they used the kimarite in. It makes comparison across wrestlers apples to apples.
Next we apply K Means Clustering to the data. I could try and go into the mathematical explanation but I think it would be easier to illustrate by analogy. A Voronoi Diagram is a diagram that takes a bunch of given points, and then splits it into various regions based on which point they’re closest to. This illustration from Wikipedia ought to help.
The K Means Clustering is so named because we provide k number of clusters (I tested k=3,4,5, and 6 but 3 was best1) around each average. So when I run the algorithm for k=3 it’ll calculate 3 averages and then classify every wrestler based on which of those 3 it’s closest to. The same idea applies when I set k=4 and so on.
Results
Here’s the fun part! As stated before there are 80+ kimarite, but for visualization purposes, those 80+ dimensions (each representing a different kimarite) are collapsed down to two dimensions2. The clustering itself was done in the full 80+ dimensional space; the 2D plot uses Principal Component Analysis (or PCA - as noted in the footnote beyond the scope of this post) purely for visualization. Using 3 different groups, it looks like this:
As you can see, there isn’t really a sharp dividing line for these classifications. I also have a kimarite comparison table to help elucidate the differences between the archetypes. The names of the archetypes and the graph make clear that it’s more of a spectrum with Pusher-Thrusters and Belt Grapplers on opposite ends with Balanced in the middle.
Next is the highest exemplar of each of these archetypes. In other words, who most fits the Pusher-Thruster classification would be the guy ranked number 1 below and the same for Belt Grapplers and Balanced wrestlers.
Next up we have our Makuuchi - I applied the framework to the 42 top division wrestlers in January ‘26 and these are the results. Typicality_Score goes from 0-100 with 100 being the most typical of that archetype.
Finally I have the Yokozuna from the dates covered and their styles. Interesting that none of them are classified as Pusher-Thrusters.
With that I’ll conclude the article. That said, I think you can expect further work on this in the future. Cheers!
Appendix
I said earlier that I ran the code with 4, 5, and 6 clusters vs the above k=3. From my perspective it doesn’t look like it adds a ton of value. The dots are all in the same places, and with them all being in a large cluster, the bounds can feel even more arbitrary. If someone who knows more about kimarite and has ideas on how to classify wrestlers via those kimarite reached out though I’m all ears.
Beyond the bit mentioned in the Appendix, there’s also a better silhouette score using k=3 than the other values for k, but that’s beyond the scope of this post
The analysis was done with each of the kimarite as an individual dimension, but going into PCA is also beyond the scope of this post and risks making it even more intimidating











So there are various aspects of analysis I’d like to dip into, because this is the highest level of analysis, but I want to incorporate info such that a wrestler has certain win %/rank — how they deal with other wrestlers, etc.
Think of somebody who has to switch up approach because someone is heavier/taller/shorter/faster.
Very interesting, like every other posts! Thanks 👍
It could be interesting if we had also the kimarite the wrestler tried to win with. It could probably give another insight on wrestlers types 🤔