This is great, and I hope people enjoy it, I tried to do something similar years ago commercially, and so I'll ask: please make sure you keep it as a passion project - turning this into a money spinner could lead to disappointment.
I was on the winning team for what I believe was the EPL's first and only hackathon at Manchester City back in 2016 or thereabouts. It was a while back.
That hackathon used a couple of data sources that professional football (never "soccer" on this island), has available: Opta data which is now the industry standard, and a 25fps video system turned into player, official and ball tracking data based on 6+ cameras installed at every ground in the league. Teams had access to all games they participated in, but not all their competitors games. Hackathon brief: what you can do with these two data sources?
That video tracking at 25fps was not great. Yes, producing heat maps, cool. Yes, automated pass tracking, cool. Automated distance travelled, pace/minutes played, possession, yada, yada, all cool. Combining it with Opta data meant you could start to get pass completion and "forced error"-like stats, cool. So you could start to think about how to get outputs that could turn into coaching inputs, and that's kinda cool but also, y'know, coaches like Pep aren't going to listen, no matter how often you make them read/watch Moneyball.
We won because we quantified the effect of opposing players on a 30 degree and 45 degree forward arc on pass completion - intuitively any fan knows that defending players lower pass completion, but we were the first in the World apparently to put a hard number on it because of this data. (12% reduction in pass completion for every player in a 30 degree arc ahead of a player, if you're interested).
However, we had suspicions with video based analysis for accurate tracking. Because of - when you think about it, kinda obvious - timing issues that can mean at 25fps you have 40ms latency/slippage frame to frame, the data suggested the ball was kicked in one game, by Yaya Toure into the back of the opposing team's net at just over 1000m/s - a little over Mach 3. I remembered the goal from that game and quipped "it was good, but not that good".
After doing that work CFG started to explore next steps. That's a long story, but I started to dive in to some ideas. I pointed out that there was so much archive video from TV coverage that could be used for analysis, throwing it all through some pose estimation could yield results. Even just set pieces, even just corners, could mean you could produce a huge resource for analysis.
The problem is, there's not much budget for this.
It seems odd, but most EPL clubs - and EPL is the richest league in the World - have annual turnovers that would be dwarfed by some Series D startups. All the money goes on the field in player salaries. If you've read Soccernomics, you'll agree it should. That meant data teams often have small budgets, measured in the high 5-figure/low 6-figure region for kit, and not much more for analyst salaries.
Digging around, I discovered that there are high school grid iron football teams in Texas whose non-salary OPEX budgets for data analysis aren't that much lower than most EPL teams.
That may have changed, but I think you have a chance of making a dent in this space, getting interest from lots of clubs (both professional and amateur), but you might struggle to make decent returns from it. It's why I had the exact same idea years ago and abandoned it - it seemed like a lot of work for very little return and I needed to focus on returns at that time.
TLDR: this is awesome, I hope it thrives, keep going, and Godspeed. Just keep the day job for now. ;-)
Thank you so much for your insightful comment. I genuinely appreciate the time you took to share your knowledge and experiences.
I'll take your advice about keeping this project as a passion rather than being desperate for profits seriously. Certainly, I'll need to face several issues, such as budget constraints, data, and process issues, like your anecdote about Yaya Toure's nearly supersonic shot.
All you mentioned will help a lot to make it thrive, especially the opportunities you found in high school football.
I'm grateful for the guidance you've provided to me. Thanks again!
I've also done some fun video analysis for sport (bouldering) and thought some about how to make money from it. Surprised EPL budgets are so small, but there you go...
For those interested, there are currently a bunch of funded PhD projects in Australia around similar topics (funded under the 2032 Brisbane Olympics program).
> Because of - when you think about it, kinda obvious - timing issues that can mean at 25fps you have 40ms latency/slippage frame to frame, the data suggested the ball was kicked in one game, by Yaya Toure into the back of the opposing team's net at just over 1000m/s - a little over Mach 3.
This error seems too big to have come from the inherent latency of 25 fps.
1000 m/s over the shortest period, a single frame, would be 40 metres in 0.04 seconds. If there was one frame's worth of error on each end, then the lowest speed the real value could be was 40 metres in 0.12 seconds which is still a completely impossible 330 metres per second.
Am I missing something? 25 fps seems like enough to capture the ball location to within a metre for all but the hardest shots.
I was on the winning team for what I believe was the EPL's first and only hackathon at Manchester City back in 2016 or thereabouts. It was a while back.
That hackathon used a couple of data sources that professional football (never "soccer" on this island), has available: Opta data which is now the industry standard, and a 25fps video system turned into player, official and ball tracking data based on 6+ cameras installed at every ground in the league. Teams had access to all games they participated in, but not all their competitors games. Hackathon brief: what you can do with these two data sources?
That video tracking at 25fps was not great. Yes, producing heat maps, cool. Yes, automated pass tracking, cool. Automated distance travelled, pace/minutes played, possession, yada, yada, all cool. Combining it with Opta data meant you could start to get pass completion and "forced error"-like stats, cool. So you could start to think about how to get outputs that could turn into coaching inputs, and that's kinda cool but also, y'know, coaches like Pep aren't going to listen, no matter how often you make them read/watch Moneyball.
We won because we quantified the effect of opposing players on a 30 degree and 45 degree forward arc on pass completion - intuitively any fan knows that defending players lower pass completion, but we were the first in the World apparently to put a hard number on it because of this data. (12% reduction in pass completion for every player in a 30 degree arc ahead of a player, if you're interested).
However, we had suspicions with video based analysis for accurate tracking. Because of - when you think about it, kinda obvious - timing issues that can mean at 25fps you have 40ms latency/slippage frame to frame, the data suggested the ball was kicked in one game, by Yaya Toure into the back of the opposing team's net at just over 1000m/s - a little over Mach 3. I remembered the goal from that game and quipped "it was good, but not that good".
After doing that work CFG started to explore next steps. That's a long story, but I started to dive in to some ideas. I pointed out that there was so much archive video from TV coverage that could be used for analysis, throwing it all through some pose estimation could yield results. Even just set pieces, even just corners, could mean you could produce a huge resource for analysis.
The problem is, there's not much budget for this.
It seems odd, but most EPL clubs - and EPL is the richest league in the World - have annual turnovers that would be dwarfed by some Series D startups. All the money goes on the field in player salaries. If you've read Soccernomics, you'll agree it should. That meant data teams often have small budgets, measured in the high 5-figure/low 6-figure region for kit, and not much more for analyst salaries.
Digging around, I discovered that there are high school grid iron football teams in Texas whose non-salary OPEX budgets for data analysis aren't that much lower than most EPL teams.
That may have changed, but I think you have a chance of making a dent in this space, getting interest from lots of clubs (both professional and amateur), but you might struggle to make decent returns from it. It's why I had the exact same idea years ago and abandoned it - it seemed like a lot of work for very little return and I needed to focus on returns at that time.
TLDR: this is awesome, I hope it thrives, keep going, and Godspeed. Just keep the day job for now. ;-)