<- all articles

The Generative AI Playground — Part 1: Automated Soccer Match Report

Daphné Vermeiren

The rapid advancements in the field of generative AI have truly amazed us over the past year. As AI experts, it was a challenge to keep up with new technologies and iterations of them being released daily — let alone that we could build something each time to experiment with them. Of course, we wouldn’t be real experts if we didn’t try!

To experiment with these new technologies, we established a Generative AI Playground at Raccoons. This playground allows our developers to explore their ideas and build playful yet highly valuable applications based on cutting-edge (generative) AI technology. Doing this enables us to experiment and learn, ultimately with the aim of integrating new-found insights into our client projects. By starting small, we can build a solid foundation of understanding for future use.

In the past months, we've built impressive demos illustrating how to use generative AI in a broader business setting. To inspire you, we’re launching a new insights series: The Generative AI Playground. This first part will explore how we turned incomprehensible structured data into a captivating, human-readable report. Meet our Automated Soccer Match Report, our first experiment.

Incomprehensible but structured data

Our experiment started with a straightforward idea: what if we could automatically generate match reports from the data of women’s soccer matches, a segment often neglected by traditional journalism? If we say so ourselves, it's a great idea for two reasons. First, it may bring attention to the lack of women’s sports coverage in the news, and second, because it gave us an enormous playground to operate in.

You may or may not know that soccer games generate a ton of data. Every player and the ball are full of sensors, which translate to data points. Ultimately, you get a file with a chronological order of all actions during a game. Think of individual actions, passes, fouls, cards, corners, penalties, shots on target, and more. While this data is crucial for analysis, it’s far from user-friendly.

The result is an endless scroll of data with no clear distinction between what’s essential and what’s not... A reality of working with datasets that have 70.000 lines on average. These files are created by computers, for computers, making them nearly indecipherable for humans. Let's see if computers, specifically large language models (LLMs), could help us decipher these enormous data files.

snap code

Data preprocessing

Before involving artificial intelligence, we needed to preprocess our data, as we did not need every action that happens in a game — only the essential ones. Think of it as a journalist who only notes the “big actions” and pens them down in an exciting narrative. So, we got to work and simplified the data — separating the signal from the noise — making it AI-ready. Data engineering is an essential step of any AI project, really, as it enables us to use data more efficiently.

Data transformation

What we’re now left with, however, is still plain data. It is already an improvement, but a critical insight is that LLMs — such as GPT-4 — excel at understanding and generating natural language. So, they are more proficient in processing textual data than structured data. Even if we were to provide the LLM with a description of each piece of structured information, it would struggle to derive meaningful insights. However, by translating this structured data into a format that the LLM can readily understand, we ensure it performs much better.

Here’s how it works:

  • Each data point, e.g., “type id = 10” (representing a tackle) is transformed into a natural language sentence, e.g. “2:38 - C. De Norre van OHL heeft een tackle gemaakt, maar de bal niet veroverd”.
  • Additionally, we include the timestamps to provide context. These offer a chronological account of when each event occurred during the match

Time for AI!

Generating the report

Now, it’s time for some AI magic. We send our structured preprocessed data to the LLM, in addition to some highly engineered prompts, and the result is nothing short of remarkable. What was once an indecipherable file is transformed into a vivid match report. The result is a report that lists events and interprets them in the context of the match, much like a sports journalist.

For example, if the model notices a cluster of “shots on target” within a short period, it might interpret this as “the team is ramping up the pressure”. The LLM does more than give an objective summary of the game; it also provides insights and a narrative. It’s mind-blowing how the model can give creative insights while staying grounded in the actual events. Another example: if the team commits several errors, the system might narratively conclude that “the team was under a lot of stress, leading to a more aggressive play style”. This isn’t just a bland statement of facts; it’s an insightful, story-like interpretation.

"Rond de 28ste minuut brak chaos uit in het doel van ZW toen Schrijvers van OHL het eerste doelpunt van de wedstrijd maakte na een fout van doelman S. Bossut van ZW. De doelpunten bleven komen aan de zijde van OHL met doelpunten van Þorsteinsson in de 40ste minuut en De Norre in de 44ste minuut, waardoor de eerste helft eindigde met een comfortabele 3-0 voorsprong voor OHL.Halverwege de tweede helft werkte ZW zich eindelijk op het scorebord. Vossen van ZW transformeerde een penalty in een doelschot in de 72ste minuut, waardoor de ploeg nieuwe hoop kreeg. De spanning bleef echter aan de kant van OHL. Þorsteinsson kreeg een penalty en scoorde zijn tweede doelpunt van de wedstrijd in de 75ste minuut. Zulte Waregem behield zijn weerstand en scoorde een laatste doelpunt door Zinho Gano in de 81ste minuut. Desalniettemin werd de wedstrijd afgesloten met een 4-2 overwinning voor OHL."

Voice-messaging our favorite sports journalist

While we successfully converted this data into readable match reports, we asked ourselves: How can we make this experience more interactive and engaging for the user? And who thinks of (Belgian) soccer games thinks of Jan Dewijngaert, a well-known sports journalist. What if you could ask him what he thought of the game and make an analysis? We decided to build a Jan Dewijngaert bot that could precisely answer factual questions, thanks to the data, but also provide subjective insights. We prompted the chatbot similarly to Jan's tone of voice, making the interaction engaging and genuine.

Of course, Jan has a distinct voice, so we decided to add a cherry on top. Another technology entered the playground — voice cloning! This way, users could voice-message Jan Dewijngaert with any question and get an (almost scarily) accurate answer. To explain how we did this on a more technical level, we did the following to ensure a quick answer:

  • Incremental processing: Similar to how GPT generates responses token by token, we have designed our system to send each completed sentence to our voice cloning tool. This process ensures the response is built progressively, maintaining a natural flow.
  • Parallel sentence processing: Our voice cloning tool can process multiple sentences simultaneously — this was crucial to make the conversation smooth. While the first sentence might take a moment to generate, subsequent sentences are processed in parallel, leading to a smooth, uninterrupted audio response.

The result is a smooth conversation with “the one and only” Jan Dewijngaert!

Beyond soccer: endless possibilities

Of course, this playground experiment goes beyond soccer or chatbots — for us, it highlights the power of generative AI in transforming dry, complex data into attractive reports for any field, be it in sports, healthcare, finance, or any field overwhelmed with data yet starved for meaning. To show the possibilities, we translated it into five different use cases. Of course, this is a non-exhaustive list of what could be done with this technology.

  1. Healthcare: Imagine transforming electronic health records into concise, readable narratives for healthcare professionals, such as doctors or nurses. Doctors could receive summarized patient histories, making it easier to understand complex medical data at a glance. And yes, this can be done with high patient confidentiality and security!
  2. Finance: In the financial sector, AI can help analyze intricate market data and generate stakeholder reports. This could include summarizing stakeholder reports or interpreting complex financial trends into understandable language for investors.
  3. Media and journalism: Media firms could use this technology to automatically generate news articles from large data sets, such as election results or large-scale surveys, ensuring quick and comprehensive coverage.
  4. Legal: In the legal field, AI could assist in sifting through case files and legal documents, summarizing critical points for lawyers and legal professionals, and aiding in research and case preparation.
  5. Education: In education, you could use this technology to create summary notes from extensive research materials or to turn content into more digestible formats for students of all ages. And guess what — we already did this at Acco.

Interesting use cases, but of course, the possibilities are endless and can be mapped on any type of data or industry. Not sure if your data is fit for this kind of project? We can take a look together — no strings attached.

Takeaways

By doing this experiment, we have gathered a few takeaways that we would love to share with you.

  1. Embracing LLM’s creativity: While many companies shy away from AI-generated creativity due to fears of fabricated information, we see its value in storytelling. For instance, if data shows numerous tackles in a match, the AI might narrate “The audience disagreed and started to protest”. This might not be a factual account, as we don’t have data on audience reactions, but it’s a plausible and engaging addition to the narrative.
  2. Controlled creativity: While we allow for certain liberties in this project, it’s important to note that our approach can be tailored. In other scenarios, we might prioritize strict adherence to fats. The key lies in effective prompt engineering, giving us control over the AI’s creative boundaries.
  3. Beyond chatbots: A critical takeaway from this project is that LLM use cases aren’t limited to chatbots. This is often a common misconception among our clients. Our experiment demonstrates that LLM’s capabilities extend far beyond “just a chatbot”. It can actually interpret data and convert it into comprehensive reports.

So, three key takeaways and fun conversations with Jan Dewijngaert. We couldn’t be happier with our first experiment in our generative AI playground. And we promise, there’s more to come where this one came from! Next up: function calling in a concrete use case.

Written by

Daphné Vermeiren

Want to know more?

Related articles