The Next Step is Responsible AI. How Do We Get There?
The Next Step is Responsible AI. How Do We Get There?
In the last decades, many AI projects focused on model efficiency and performance. Results are documented in scientific articles, and the best-performing models are deployed in organizations. Now it is the time to put another important part into our AI systems; responsibility. The algorithms are here to stay and nowadays accessible for everyone with tools like chatGPT, co-pilot, and prompt engineering. Now comes the more challenging part which includes moral consultations, ensuring careful commissioning, and informing the stakeholders. Together, these practices contribute to a responsible and ethical AI landscape. In this blog post, I will describe what responsibility means in AI projects and how to include it in projects using 6 practical steps.
A Brief Introduction Towards Responsible AI.
Before I deep dive into responsible AI (rAI), let me first outline some of the important steps that are taken in the field of Data Science. In a previous blog, I wrote about what to learn in Data Science [1], and that data science products can increase revenue, optimize processes, and lower (production) costs. Currently, many of the deployed models are optimized in terms of performance, and efficiency. In other words, models should have high accuracy in their predictions and low computational costs. But higher model performance usually comes with the side-effect that model complexity gradually increases too. Some models are turned into so-called "black box models". Examples can be found in the field of image recognition and text mining where neural networks are trained on hundreds of millions of parameters using a specific model architecture. It has become difficult or even unknown to understand why particular decisions are made by such models. Another example is in finance where many core processes readily run on algorithms and decisions are made on a daily basis by machines. It is most important that such machine-made decisions can be fact-checked and re-evaluated by human hands when required.
To open up the black box models, a new data science domain has emerged, and is called eXplainable AI or short xAI. The new xAI algorithms shed light on model decisions and help researchers and data scientists argue why models make particular decisions. Let's go to the next section and see why being responsible matters!
New Technology Makes AI More Accessible.
Although machine learning solutions have been known for many years, we are still at the very beginning of a new era where algorithms are utilized and used in all kinds of processes and in a variety of domains from finance to medicine. Until now it was the few Data Scientists that created, utilized, and deployed the models. Those who studied mathematics/ statistics and other exact studies know how to train models and write programming code. This will change dramatically because we are at a point where basically everyone with an internet connection has access to all kinds of algorithms. Technologies such as chatGPT, co-pilot, and prompt engineering together with the developments in generative AI started to give access to the complex world of Data Science. This means that individuals without know-how about statistics and coding skills can now create models without understanding why/how it exactly works. Therefore it requires that we need to set some rules because creating and deploying machine learning models is not something that we need to take lightly [6].
Creating and deploying machine learning models is not something that we need to take lightly!
If we look back at the trajectory of algorithm development, there were first mathematicians and statisticians who created the core fundamental algorithms (#1). Then there were the scientists at academic institutions that optimized/improved and/or extended the core fundamentals (#2). This is followed by scientific programmers who translate scientific articles with mathematical equations into applications (#3). It was only a decade ago that the so-called "data scientists" emerged which is basically a mix of the previous three groups which we now can call the fundamental data scientists (#4a). Later on, a second group of data scientists arose; those that were focused on the applications, also named the Applied Data Scientists (#4b). Because the algorithms and methods needed to be deployed and maintained, new groups of developers, such as (data) engineers also moved slowly into the field (#5). Up to this point, the group that is working with complex machine learning models are specialized scientists, developers, and programmers.
Soon there will be a new group in the data science field; those without statistical/ programming experience but enough technical know-how to create and deploy AI products by relying on co-pilot and chatGPT-like technology. Over time, maybe even a larger group of non-technical will also create machine learning models because basically, all you need is an internet connection with the latest and newest technology.
We are at the point where an internet connection is all you need to create AI models.
The new large language models (LLMs) are a technological and revolutionary step but we need to think of the possible risks and consequences too. If we look back at the trajectory of developers, it can be clearly seen that at every step, a larger group can use complex AI methods. There is a trade-off though, at every step the group of individuals becomes more unaware of how exactly the underlying algorithms and/or programming code work.
The use of data and algorithms will likely grow exponentially in the coming years. We are at a point where we need to set out rules and best practices on how to responsibly use the newest and latest technology.
The Six Parts of Responsible AI.
When we talk about responsible AI (rAI), it is not just a single task that we need to do or look at. Being responsible when working with AI means that we carefully need to consider the decisions we make during all the (non-) technical steps of the project. As data scientists, we can easily make tens to hundreds of decisions from data collection, model training, and outputting, visualizing the model results. However, our responsibility as data scientists starts even before the data collection because you may want to ask the question; Why do need to use the selected technology? What is the purpose and what results is it intended to produce?
The responsibility of data scientists starts even before the data collection.
If the answers to these questions remain on the necessity of AI, then we need to protect the basic rights, such as privacy, including governance, and take care of the data input quality, model quality, and output quality. Responsible AI can be summarized in six parts from which ethics is tangled throughout all the parts: privacy, governance, input quality, model quality, and output quality. In the next sections, I will describe these six topics in more detail.

1. Privacy.
The very first step before even starting a project is looking at the privacy aspects. When we talk about privacy it can be personal data such as age, gender, etc. but also other confidential information such as political, and religious preferences. Then there can be indirect information that can be linked to individuals such as car license plates. Such data must be protected and safeguarded throughout the entire project and beyond.
However, this does not necessarily mean that we can not use this kind of data. We can use such information in projects but it requires carefully examining its impact, and ensuring that it can only be used for the official validated task.
The use of personal information requires carefully examining on its impact, and ensuring that it can only be used for the validated task.
We should include the following points:
- Does the General Data Protection Regulation (GDPR) apply?
- Are officials involved, e.g., the Data Protection Officer, the Privacy Consultant, the Information Security Officer, and the Chief Information Officer?
- How are you taking account of potential unwanted bias in the input, model, and output of the results? Bias can make assumptions regarding objects, people, or groups. Misinterpretation can lead to unjustified outcomes and should be prevented at all costs.
There is also a variation of libraries that can help in protecting privacy such as Faker [2] which allows changing real names, addresses, and other meta information into fake ones.
2. Governance.
If we talk about governance, we need to know who is the talk-to-person, maintainer, or even responsible for the data set. This is important because this person can likely tell us more about the ownership, usability, quality, integrity, and security of the data set. Besides data governance, there is also something such as model governance where a similar question needs to be addressed. Thus, who is the talk-to-person, maintainer, or responsible for the developed model? Because we also need to ensure the quality, integrity, and security of the model. Think of this; We can create a model that is compliant with all rules and may have a great performance but if someone uses the model with "incorrect" input data, then the output will not be trustworthy. This is also the case when, e.g., accidentally missing values are present in the data set, or when the ordering of the features is changed. There are many scenarios that can lead to unreliable results.
The takeaway is to have processes in place for testing the compliance of both the input data set and the model (output). Different organizations will have different processes, just make sure it is settled prior to the implementation of the AI system.
3. Data Input Quality.
Good quality of the input data set is essential to create reliable models. This may sound straightforward but it needs more attention than it gets today in data science projects. In the best case scenario, the data set should be free of bias, no inaccuracies, errors, and other kinds of mistakes. In practice, we need to cope with the data set that is available and deal with all the challenges in the best possible manner. This means that we need to take responsibility and carefully analyze the data set on its quality by making sanity checks, and plots and discussing the results with the data owner or the domain expert (see also governance).
The quality of the data set can be stepwise examined, starting with broad questions to the data owner or domain expert towards deeper technical analysis.
- How has the input (data) been collected and combined?
- Was there any change in data collection over time? Think of (new) sensors, change of environment, temperature or different storage, etc
- How is the data labeled?
- What factors impact the quality of the data?
The technical questions are for example the underneath but these may differ between use cases and data sets:
- The distribution of missing values
- Inspection of sample duplicates
- Inspect the categorical and continuous features
- Univariate analysis (per feature)
- Multivariate analyses to examine possible correlations
After the pre-inspection, we need to decide whether the task ahead is possible given the number of samples, the availability of features, and its quality. Note that this quality step is even before the preprocessing step because when we start with the pre-processing, we trust the data set and have a plan to get it in the best possible shape. A library that can help in the pre-inspection step is HNET [4]. It will show the relationship between variables and let you explore any multicollinearity. More details can be found here:
Explore and understand your data with a network of significant associations.
4. Output Quality.
Ensuring a reliable output is another important task that needs our attention. The type of output depends on the use case and can for example be in the form of a dashboard, advice reports, or presentation. It is important to note that our input data set can be of high quality, our model can be trained in a reliable manner but the output can still be unpredictable or even unusable.
Let me explain this by example. Suppose we create a model that processes satellite images and performs object detection. The accuracy can be great in the train/test and validation set but after deployment, the output can degrade over time or even result in poor/unusable predictions. The reasoning is that during winter time, the clouds can block sight, thus resulting in poor or no predictions.
The question is not only "if" the model works but also "when" the model works.
In my example, the advice could be to only use the trained model from April to October to retain the most reliable prediction results. For the remaining period? You need to think of something else. Important is that the model output is periodically monitored for its correctness.
5. Model Quality.
Model quality can be addressed by its accuracy, reliability, reproducibility, and explainability. Let's go through each of them.
Accuracy
The terminology "accuracy" is often intertwined with "performance" which is used to quantity the model results. However, with accuracy, I mean more than just a score from a technical metric. Metrics such as the F1 score are all fine to use but nonetheless, they only become meaningful when we set up the acceptance criteria for both the data and the model. Without acceptance criteria, any score or any model would be a valid one which doesn't make sense.
For every new project, we need to determine the acceptance criteria first.
Notably, the acceptance criteria are more than just a desired score. It can also describe the approach. As an example, it can be desired to have explainable results where the output needs to be binary. Or it can be desired that the best-performing model is the best out of the three tested models. Or maybe it is desired to have a minimum F1 score because only then a model become useful in contradiction to the traditional approach.
What is important is that the acceptance criteria are properly geared to the data and the intended purpose of the use case.
Reliability and Reproducibility
The terms reliability and reproducibility go hand-in-hand because a reliable model is one that can produce consistent results in similar cases.
For a model to become reliable we need to exactly know which samples and features were used in the training process and whether parts of the data were excluded from the learning process. If yes, then we need to limit our interpretation and conclusions to that particular set of samples and/or features. In addition, to build a reliable model, we need to separate the train/test and validation set to prevent data leakage. In case (hyper) parameter tuning is used, it may even require using a double loop cross-validation. The HGboost library uses such interventions to prevent data leakage and accidentally find overtrained models. More details can be found here:
A Guide to Find the Best Boosting Model using Bayesian Hyperparameter Tuning but without…
Some of the checks we can do to determine whether we have reproducibility in our project is by reconstructing the model on the basis of the input data, given hyperparameters, and fixed seed. In addition, we can check whether the output results can be reproduced.
Explainability
Explainability is whether the model and results are sufficiently explainable and interpretable to the developers. In addition, the design choices must be clear, and logical to the underlying model. Besides the technical parts, it is also desired to have sufficient documentation that can explain the way in which the model operates.
6. Ethics.
Ethics should be taken care of across the entire project. The key is awareness and integrity. In addition, it is always important to remain transparent with the involved parties, and users regarding the limitations, and to use common sense for developing and reviewing the model.
Being Responsible does not cost you more time!
Setting up a project with a high level of responsibility may need (at the start) some additional organization because you may need to set up some checkpoints and discuss who is doing what. Nevertheless, if all checkpoints are in place, it can easily become part of the project pipeline without taking a lot more additional time than regular projects. In addition, I expect that (more experienced) Data Scientists already handle a number of rAI points as described in the six parts. Many of the described parts, such as for input, model, and output quality are not "new" but may just need some extra attention.
Final Words.
There are many opportunities for optimizing processes using AI across many domains. Some will be more impactful than others but the baseline when working with AI should be in a responsible manner. Make sure that for impact full projects, the decision-making models are transparent and explainable and human hands need to be able to fact-check, correct, and adjust when required.
Be Safe, Stay Frosty.
Cheers E
If you find this article helpful, you are welcome to follow me because I write more about Data Science topics. If you are thinking of taking a Medium membership, you can support my work a bit by using my referral link. It is the same price as a coffee but allows you to read unlimited articles monthly!
Let's connect!
References
- https://towardsdatascience.com/the-path-to-success-in-data-science-is-about-your-ability-to-learn-but-what-to-learn-92efe11e34bf
- https://github.com/faker-ruby/faker
- E. Taskesen, A Guide to Find the Best Boosting Model using Bayesian Hyperparameter Tuning but without Overfitting, Medium Aug. 2022
- E. Taskesen, Explore and understand your data with a network of significant associations. Medium Aug. 2021
- Dan Hendrycks et al, An Overview of Catastrophic AI Risks, ArXiv, 2023.