Ozeki Analytics - Reflections on the First Half Year
If you’re reading this, it’s probably because you’re an email subscriber or super-fan trawling through the archives. This is meant to be a bit more of a personal piece going over writing the blog, my research, and potential future plans. If that sounds interesting, then by all means, please keep reading. If you just want sumo (or stats) posts, then I won’t begrudge you not reading this.
My first post on Ozeki Analytics was January 10th this year, meaning we’ve had about half a year together exploring sumo wrestling using a statistical lens. I first actually started working on this idea in October of last year. That was when I began accumulating data and thinking about how I could apply statistical analysis to deepen our understanding of sumo. The new year provided a nice opportunity to set a resolution and I resolved to post weekly. I haven’t quite kept up with that, but I have been fairly consistent in averaging close to a post a week. Furthermore, most of the pieces are providing new research. I know people enjoy the Banzuke and Yokozuna + Ozeki previews, but personally I grade myself more on continuing to advance in new research as opposed to applying that knowledge or setting up infrastructure for it. It will catch up though so I will get to that eventually.
As for what we have accomplished so far, I think it’s quite a lot already with just a half year. I won’t go into all the details when the Sumo Knowledge Encyclopedia exists, but I think it can’t be underestimated that in just 6 months, we’ve been able to create the tools to generate Bashos from top to bottom (all 6 divisions), determine promotions, explore development of rikishi, and even match up the top two divisions’ daily matches among others. I’ll be the first to tell you1 that there is room for improvement in a lot of these different areas, but I think in all of our research areas we have established solid baselines of knowledge. Furthermore, having those bases gives us room to further improve our understanding and also having a framework to confirm we have improved our models and understanding. Being able to contextualize how good (or bad) our current models are is very important for improvement. If, heavens forbid, this was my last post, someone in the future would be able to build off of what I have done in a way that I wasn’t able to when starting out.
I recognize a shortcoming is that this is essentially a hobby of mine. I have a full time job; that takes up time. I have a girlfriend and family. I’m very fortunate in those regards. However, it is unfortunate for my readers and commenters. There have been times I have been unable to comment and engage with suggestions from the community as much as I would ideally like. I can’t promise that it will improve, but I can promise I will always try my best to get back to folks and to take feedback into consideration. If I had to choose a favorite part of this project it would be the fact there are folks out there that are reading what I’m putting out there, and find it insightful and useful and that we’re all engaging. Direct improvements to the model - and confirming other avenues wouldn’t improve models, which is equally valuable - have come from conversations had on here and other socials. I’ve had fun chatting with folks on here, on Reddit, Twitter, and over email. So long as that’s the case, I have little doubt in the good of this project. Seriously, if you email me I’m way better at responding.
Next Steps.
There is still a tremendous amount of research to be done. Generally speaking, I think that every topic I’ve explored has yet to be exhausted. But beyond revisiting and expanding on previous research, I think my focus through the rest of ‘24 will be covering areas I have yet to cover. Two of those topics forefront in my mind are exploring who will win each individual match and identifying/predicting injuries. If you’re curious about more potential future topics, then I actually wrote about some of those recently.
A goal for the rest of the year is to get more plugged into the actual Japanese sumo community. I speak a little Japanese myself and have lately been redoubling my efforts, so ideally I would figure out where and how folks are discussing it in Japan, and then work on getting this material in Japanese.
I also would like to hit 100 subscribers by year end. I’m probably on track for that, but might as well speak it into existence for accountability.
I might eventually look to go multimedia. I don’t envision myself getting into the podcast game myself, but I might look to see if I can somehow get on some of the already existing ones. I have some limited video editing skills too, so potentially porting this to Youtube in a useful way might be a goal down the road. Finally on the multimedia, I have one other prominent thing I have in mind that I won’t be spoiling till it’s closer to realization.
Selfishly, if you have any ideas about the preceding paragraphs, please feel free to reach out. Grand Sumo Breakdown in particular helped me get more into sumo and if you don’t know how to get me in touch with them, I’d recommend their podcast. Always a fun listen, and I have to credit a lot of their discussion for influencing this blog, whether it be a baseline knowledge beyond what you get from broadcast, or helping me frame various questions I’ve attempted to answer here.
I might look to expand the offerings here at Ozeki Analytics. For one, part two of Demystifying Elements of Statistical Learning should be out soon. If you want to learn about logistic regressions, then this should cover that (I think). I also have some ideas for articles that go beyond sumo and statistics but those I’m less sure on. For instance, I have long thought about how Judd Appatow, in my opinion, in a very roundabout and unintentional way killed the comedy movie. Or at the least, had such an influence that it changed the genre for a decade plus. I’m not sure if this is the proper venue for that but just wanted to throw that out there. I know that’s one of my hotter takes, but I do think I’ve honed my perspective and have interesting opinions beyond sumo analytics but perhaps I’m just blowing wind and should stick to sports.
Finally, just wanted to stress again that there are zero plans to have any posts locked behind a paywall. I am doing this research for the good of the larger sumo community, and locking that knowledge up is antithetical to those goals in my mind. I actually did have a subscriber pledge to this publication, but monetization is turned off and I’m still generating zero revenue from this. To be honest, the only way I see that changing is if there’s enough pledges I could use it to purchase some subscriptions related to this blog like Twitter for bosting visibility, Anaconda cloud which I use for a lot of programming, Adobe After Effects for video editing, etc. but on the other hand, it’d probably be a pain tax-wise and I’m fortunate I could pay for those now but don’t out of stubbornness. Long story short, you’ll keep receiving all posts free.
Thank you all for reading! I definitely wouldn’t have an equal motivation to do this without all the lovely people I’ve gotten to talk with over this past year. I hope everyone stays healthy and safe and we can keep enjoying sumo together for a very long time.
I will acknowledge some of my commenters love being second to tell me