December 10, 2024

AI Assistant Tool: Simplifying Data, Amply Insights

At the Sports Department in Kin, we provide 100% dynamic and customizable consultant services to sports organizations. This allows our clients to have a sophisticated data science team without investing a lot of money to establish one on-premises. Even though the deliverables are handed through different convenient ways, like online dashboards or phone and web applications, achieving a close-to-zero latency between getting the data and delivering the analysis, there is always space to improve our client's experience.

For instance, let one of the deliverables contain a ranking of the best players in the competition with respect to a specific set of KPIs tailored to measure the organization's needs, so one piece of this information looks like this:

In this way, the analyst of this sports organization has access to this ranking information so they can evaluate their players’ performance against the top-ranked players of the competition. Suppose the analyst needs to see not only the top-ranked of all the competitions but also the top-ranked among the ones that played over 900 minutes in the competition. Of course, this specific information can be modified to deliver as they want.

But think about some more specific requirements. Imagine the analyst needs to evaluate the competition's top performers who belong to teams A, B, and C, who played over 900 minutes if they are defenders or 700 minutes if they are midfielders. There are many ways to transform this requirement into a deliverable. However, including an extensive amount of filters or options in any front-end application or a direct deliverable may not justify the cost for a one-time requirement, even if it is important at that moment, considering that the analyst’s requirement might be completely different the next day. Here is when the time to board the generative AI train arrived. To provide a flexible enough solution with high precision, we built an assistant using Open AI’s assistants that looks like this:

Here, I intend to explain how we managed to transform our high-dimensional data into a structure that makes it understandable, accessible, and efficient for the Assistant tool. First, here are a couple of reasons why we chose the assistant tool instead of something else:

For the assistant to answer the questions properly, it needs to access a lot of information, which can mean a lot (truly a lot!) of tokens, and passing these tokens on a recursive API call to the chat completions can increase the bill in exponential rates.
There is an alternative to using function calling. This could reduce the amount of input information by performing API calls to our services to get only the necessary information to answer the questions and nothing more, reducing the tokens involved. But remember, we want to provide a flexible alternative that can be used as a conversational tool (like talking to your data analyst on-premises). This means that the completion API (from Open AI) would need to translate the conversation into metrics or event identifiers, and for this to happen with a low rate of errors, many descriptions and definitions are needed, again increasing the number of tokens enormously.
The use of function calling within chat completions would also require reidentifying metrics, events, and players' identifiers over and over again to be able to follow up on the questions. As you can imagine, the probability of errors kept increasing here, and no efficient conversation could be carried out.

Therefore, we chose to use the assistant tool with the code interpreter setting on. The next question now is how to share our data with the assistant. For this, it is key to understand what data we need, and it essentially can be summarized in three different components coming from non-relational databases:

A document with the information on the available KPIs for the client, including goals, references, descriptions, and some other things.
A set of documents with pre-calculated KPIs for teams and players.
A set of meta information (dates, names, events, etc).

How this is shared with the assistant is not a trivial question, given that there is a large range of available file types that can be used (see documentation). Also, we face some level of complexity inherent to this type of data that we should let the assistant know. To give a couple of examples:

The assistant could not determine whether a higher number is better or worse for complex tailor-made KPIs.
Sometimes, the user could ask for KPIs measured relative to the time played (ExpG by 90 minutes, for instance) and sometimes for absolute values (players with more goals).
The assistant does not identify the teams’ names for competitions in languages with special characters.

To address this issue, we will address how we handled two chunks of data here: the KPIs data files and the instructions.

Data Files

First, we want the assistant to be versatile, but also very precise. That is why we made the decision that we will let the user choose if they want to have a conversation about players or teams (see Radio Buttons in the upper left corner of the image above). We found that this reduced significantly the amount of errors the assistant did when building a response. This means, for this part, two different data files: one for players’ KPIs and one for teams’.

Now remember we are choosing to work with code interpreter within the assistant tool, which means it will run some code to get answers when needed. Therefore, the choice is obvious: we need a file that makes this task easy. We found that working with JSON files often performed better than CSV files, and it is actually more convenient code-wise (as we try to avoid pandas overwhelming computational time as much as possible in our code). Then, after a complex data selection process, we build files that look like this:

where players do not have the same metrics necessarily (doesn't make sense to measure dribble accuracy for goalkeepers, right?). With something similar for teams.

Instructions

For instructions, it is straightforward, just tell the assistant what you want them to do and they will, be specific about the situations where you expect to produce graphs or text, and everything will be fine. But one important piece of information here: we noticed that the assistant is far more efficient when the instructions contain the available KPIs and the available list of teams. First of all, KPIs names can be complex and may be unheard of (or uncommon) for the assistant, therefore providing this information avoids answers like “this type of metric is not available” when it is. Secondly, some team names are also not well-identified as team names for questions like “Show me the most accurate players of SomeRandomTeamName”, so having this information explicitly incorporated in the instructions helped a lot.

Some key insights about the process

When implementing this type of assistant in your application, it is important to keep in mind that the API’s client is not the important part, the IDs are (IDs for the assistant, files, thread, etc).

Then, when a session starts, we only need to create one assistant, upload the files once and everything runs smoothly. Here, when the user changes the conversation from “teams” to “players”, we only change the file ID available for the same assistant and modify the instructions slightly, which is done seamlessly.

At the end of each session, remember to delete the created assistant and files from your Open AI account so you don’t end up having thousands of assistants and files by the end of the week.

Finally, the assistant will show the thinking process, which is of zero to no interest to your client, so make sure to manage this on your end and show only the relevant information.

And here it is, product-ready:

We ask the assistant to specifically hide the players’ names so we can share it :)

‍

By:

Sebastián Jimenez

Machine Learning

/More
stories

Lending Services

Credit underwriting in an unpredictable economic environment

5 minutes

Lending Services

Machine Learning

From Chatbots to Coworkers: Why AI Agents Are the Next Big Shift in Equipment Leasing

5 minutes

Lending Services

Machine Learning

Methodology - 5D journey

Transforming Traditional Credit Scoring & Lending through Analytics and AI

A story about Automation and better Customer Service

A story about Strategic Approach to Risk-Based Pricing

Methodology - 5D journey

Transforming Traditional Credit Scoring & Lending through Analytics and AI

A story about Automation and better Customer Service

A story about Strategic Approach to Risk-Based Pricing

AI Assistant Tool: Simplifying Data, Amply Insights

/More
stories

Credit underwriting in an unpredictable economic environment

From Chatbots to Coworkers: Why AI Agents Are the Next Big Shift in Equipment Leasing

Transforming Traditional Credit Scoring & Lending through Analytics and AI

Methodology - 5D journey

Transforming Traditional Credit Scoring & Lending through Analytics and AI

A story about Automation and better Customer Service

A story about Strategic Approach to Risk-Based Pricing

AI Assistant Tool: Simplifying Data, Amply Insights

/Morestories

Credit underwriting in an unpredictable economic environment

From Chatbots to Coworkers: Why AI Agents Are the Next Big Shift in Equipment Leasing

Transforming Traditional Credit Scoring & Lending through Analytics and AI

/More
stories