December 10, 2024
At the Sports Department in Kin, we provide 100% dynamic and customizable consultant services to sports organizations. This allows our clients to have a sophisticated data science team without investing a lot of money to establish one on-premises. Even though the deliverables are handed through different convenient ways, like online dashboards or phone and web applications, achieving a close-to-zero latency between getting the data and delivering the analysis, there is always space to improve our client's experience.
For instance, let one of the deliverables contain a ranking of the best players in the competition with respect to a specific set of KPIs tailored to measure the organization's needs, so one piece of this information looks like this:
In this way, the analyst of this sports organization has access to this ranking information so they can evaluate their players’ performance against the top-ranked players of the competition. Suppose the analyst needs to see not only the top-ranked of all the competitions but also the top-ranked among the ones that played over 900 minutes in the competition. Of course, this specific information can be modified to deliver as they want.
But think about some more specific requirements. Imagine the analyst needs to evaluate the competition's top performers who belong to teams A, B, and C, who played over 900 minutes if they are defenders or 700 minutes if they are midfielders. There are many ways to transform this requirement into a deliverable. However, including an extensive amount of filters or options in any front-end application or a direct deliverable may not justify the cost for a one-time requirement, even if it is important at that moment, considering that the analyst’s requirement might be completely different the next day. Here is when the time to board the generative AI train arrived. To provide a flexible enough solution with high precision, we built an assistant using Open AI’s assistants that looks like this:
Here, I intend to explain how we managed to transform our high-dimensional data into a structure that makes it understandable, accessible, and efficient for the Assistant tool. First, here are a couple of reasons why we chose the assistant tool instead of something else:
Therefore, we chose to use the assistant tool with the code interpreter setting on. The next question now is how to share our data with the assistant. For this, it is key to understand what data we need, and it essentially can be summarized in three different components coming from non-relational databases:
How this is shared with the assistant is not a trivial question, given that there is a large range of available file types that can be used (see documentation). Also, we face some level of complexity inherent to this type of data that we should let the assistant know. To give a couple of examples:
To address this issue, we will address how we handled two chunks of data here: the KPIs data files and the instructions.
Data Files
First, we want the assistant to be versatile, but also very precise. That is why we made the decision that we will let the user choose if they want to have a conversation about players or teams (see Radio Buttons in the upper left corner of the image above). We found that this reduced significantly the amount of errors the assistant did when building a response. This means, for this part, two different data files: one for players’ KPIs and one for teams’.
Now remember we are choosing to work with code interpreter within the assistant tool, which means it will run some code to get answers when needed. Therefore, the choice is obvious: we need a file that makes this task easy. We found that working with JSON files often performed better than CSV files, and it is actually more convenient code-wise (as we try to avoid pandas overwhelming computational time as much as possible in our code). Then, after a complex data selection process, we build files that look like this:
where players do not have the same metrics necessarily (doesn't make sense to measure dribble accuracy for goalkeepers, right?). With something similar for teams.
Instructions
For instructions, it is straightforward, just tell the assistant what you want them to do and they will, be specific about the situations where you expect to produce graphs or text, and everything will be fine. But one important piece of information here: we noticed that the assistant is far more efficient when the instructions contain the available KPIs and the available list of teams. First of all, KPIs names can be complex and may be unheard of (or uncommon) for the assistant, therefore providing this information avoids answers like “this type of metric is not available” when it is. Secondly, some team names are also not well-identified as team names for questions like “Show me the most accurate players of SomeRandomTeamName”, so having this information explicitly incorporated in the instructions helped a lot.
Some key insights about the process
When implementing this type of assistant in your application, it is important to keep in mind that the API’s client is not the important part, the IDs are (IDs for the assistant, files, thread, etc).
Then, when a session starts, we only need to create one assistant, upload the files once and everything runs smoothly. Here, when the user changes the conversation from “teams” to “players”, we only change the file ID available for the same assistant and modify the instructions slightly, which is done seamlessly.
At the end of each session, remember to delete the created assistant and files from your Open AI account so you don’t end up having thousands of assistants and files by the end of the week.
Finally, the assistant will show the thinking process, which is of zero to no interest to your client, so make sure to manage this on your end and show only the relevant information.
And here it is, product-ready: