Create an A.I. Driven Product with Computer Vision and ChatGPT

Author:Murphy | View: 22619 | Time: 2025-03-22 21:50:55

TL;DR

In this article you will learn how to

Train custom and pre-trained Computer Vision models
Build an LLM Language System for a medical use case
Walk through the use of the app
Take any dataset and create an A.I. driven product

Who is This Article for?

This article is intended for the following audiences:

Students new to data science who want to learn how to combine technology to create something useful within the scope of a data science bootcamp.
Data Scientist who want to learn how to integrate GPT models into their projects.
Employers looking to hire a Data Scientist or Machine Learning Engineer. You can contact me on LinkedIn.

Code

All code for this project can be found on my github portfilio.

my_portfolio/melanoma_cancer at main · Alexander-Barriga/my_portfolio

There is also a short demo video on the README file from which the above screen shot was taken.

! Disclaimer !

Before we get started with the main portion of the article please note that this prototype for melanoma classification is purely for educational purposes only! Real life melanoma diagnosis requires expert evaluation from trained medical professionals and laboratory testing of skin samples.

You can read more about the rigorous medical processes here on the Dana-Farber Cancer Institute site.

Furthermore, the classification solution provide in this articles has limitations that include but are not limited to:

Machine Learning models are very brittle algorithms that require extensive care and preparation on the data that is used to train them and the data used when seeking reliable predictions.
Uploading an image of a mole could potentially trick the model into thinking it's seeing malignant melanoma. You could even give the model non-skin images and it will still give you a prediction.

As machine learning practitioners we must remember the limitations of our tools. This is especially important when our work affects the health and lives of the living, humans and animals alike.

Motivation

I've taught 100+ data science students at General Assembly and BloomTech combined. At the end of the course students complete a capstone project that combines and show cases all the skills they've learned.

At BloomTech I taught the Deep Learning 101 course that included the various applications of deep learning, including Natural Language Processing (NLP) and Computer Vision (CV).

In this article, I will show you how to create a prototype of an A.I. driven software medical product. We will use a real-world dataset of melanoma images to train classification models to predict if the image shows signs of benign or malignant melanoma, a type of skin cancer.

Then I will show you how to create a custom LLM language system that incorporates Retrieval Augmented Generation (RAG) for the specific use case of melanoma information.

Lastly, I will show you how to plug your CV and GPT models into the backend of a dashboard app using Dash.

I believe that health is the #1 wealth investment that we can make in our lives. This project aligns with my personal values, investing me in its success.

Data Science Students

I'd like you notice this is at about the level of complexity (maybe a bit above) that you can expect from a capstone project at a data science bootcamp.

Working Data Scientist

I'd like you notice how relatively straight forward it is to prototype a new idea in order to make your pitch to PMs and other stakeholders that much more persuasive.

Computer Vision

In this section we will introduce the data and our computer vision tools as well as discuss model evaluation after training.

Deep Learning Libraries

There are many python deep learning libraries to choose from but a few popular options are:

Keras is a popular library with data science students and working professionals alike due to it easy-to-use API that empowers you to quickly iterate through experiments and demonstrate a Proof of Concept (POC). It is for this reason that we'll be using Keras for this project.

The dataset we'll be using is called Melanoma Skin Cancer Dataset of 10000 Images which can be found on Kaggle. It is a binary dataset containing 10K images of either benign or malignant melanoma. The benign and malignant classes are closely balanced and are provided in 300×300 resolution.

Sample of melanoma images (image by author).

batch_size = 32
train_gen = tf.keras.utils.image_dataset_from_directory(
            TRAIN_DIR,
            labels='inferred',
            label_mode='int',
            class_names=None,
            color_mode='rgb',
            batch_size=batch_size,
            image_size=(300, 300),
            shuffle=True,
            seed=SEED,
            interpolation='bilinear',
            crop_to_aspect_ratio=True,
        )

test_gen = tf.keras.utils.image_dataset_from_directory(
            TEST_DIR,
            labels='inferred',
            label_mode='int',
            class_names=None,
            color_mode='rgb',
            batch_size=batch_size,
            image_size=(300, 300),
            shuffle=True,
            seed=SEED,
            interpolation='bilinear',
            crop_to_aspect_ratio=True,
        )

With a 10,000 image dataset we need to think about memory management. Fortunately, Keras provides utility functions in which we can provide the location of the directory where the images are stored and they will be loaded into memory in batches by a generator.

Models

We will train two types of models: simple AlexNet model that I have built from scratch and a few larger pre-trained models found in Keras' pre-trained CNN model library.

def create_alexnet_model(input_shape, batch_size):
    model = Sequential([
        Input(shape=input_shape, batch_size=batch_size),
        Rescaling(1./255),
        Conv2D(96, (11, 11), strides=(4, 4), activation='relu'),
        MaxPooling2D((3, 3), strides=(2, 2)),
        Conv2D(256, (5, 5), padding='same', activation='relu'),
        MaxPooling2D((3, 3), strides=(2, 2)),
        Conv2D(384, (3, 3), padding='same', activation='relu'),
        Conv2D(384, (3, 3), padding='same', activation='relu'),
        Conv2D(256, (3, 3), padding='same', activation='relu'),
        MaxPooling2D((3, 3), strides=(2, 2)),
        Flatten(),
        Dense(4096, activation='relu'),
        Dropout(0.5),
        Dense(2096, activation='relu'),
        Dropout(0.5),
        Dense(1, activation='sigmoid') # binary classification
    ])
    return model

The pre-trained models that we'll use are

densenet 121
InceptionResNetV2
Xception

def transfer_learning_model(input_shape, batch_size):

    # Load pre-trained model without including top layers (fully connected layers)
    base_model = Xception(weights='imagenet', include_top=False, input_shape=input_shape)

    # Don't freeze the base model
    base_model.trainable = False

    # Add custom top layers for binary classification
    model = tf.keras.Sequential([
        #Input(shape=input_shape, batch_size=batch_size),
        Resizing(299, 299, crop_to_aspect_ratio=True),
        Rescaling(1./255),
        base_model,
        keras.layers.GlobalAveragePooling2D(),

        keras.layers.Dense(256, activation='relu'),  # Add additional Dense layers
        keras.layers.Dropout(0.5),  # dropout layer for regularization

        keras.layers.Dense(128, activation='relu'),  # Add another Dense layer
        keras.layers.Dropout(0.5),  # dropout layer for regularization

        keras.layers.Dense(1, activation='sigmoid')  # Final classification layer
    ])

    return model

Note that pre-trained CNN models will each have their own input requirements. Xception requires that input images have a resolution of 299×299, so a Resizing layer is included.

We have the option of including the original top layers (i.e. the fully connected forward feeding portion of the model which includes the classification layer) and whether we want to train the base model (i.e. convolutional layers).

While you're free to experiment which these options, preventing base model re-training will lead to a faster training time. Also most of these pre-trained models were trained on a large scale dataset called imagenet which contains thousands of unique images and therefore thousands of nodes in the classification layer. Our use case only requires 1 node for our binary task. This is why we need to create our own top layers to ensure the model is only predicting 1 of 2 classes: benign or malignant.

CPU vs GPUs vs TPUs

It is common to use GPUs or TPUs in place of CPUs when training deep learning models. This is especially true in the context of computer vision; larger images will typically require more weights to be trained therefore longer training times.

GPUs were originally designed for graphics processing but are well suited to perform the matrix multiplication that happens within the deep learning algorithms. Thereby dramatically increasing the speed of training.

TPUs on the other hand were created by google specifically with deep learning in mind. You can read about them here. Google describes a TPU as

a matrix processor specialized for neural network work loads. TPUs can't run word processors, control rocket engines, or execute bank transactions, but they can handle the massive multiplications and additions for neural networks, at blazingly fast speeds while consuming much less power and inside a smaller physical footprint.

Access GPUs and TPUs

In industry AWS is commonly used to spin up a training environment which includes access to virtually all the computational resources you could possible need to train large models and their large datasets.

However, a quick and ease (and free) way to take advantage of GPU and TPU acceleration when training your deep learning models is to use Google Colab. Although the free version puts a limit on GPU and TPU and memory access. It also has a low-cost Pro version which grants you more computational resource, such as uninterrupted access to GPUs and TPUs.

This project took advantage of Google Colab's Pro version. Just upload your notebook, select your processor of choice, and hit ‘run'.

Model Evaluation

ROC and AUC for each model (image by author)

After training all four models on a TPU, we now need to evaluate their performance. The ROC curve above shows the tradeoff between true and false positives predictions for each model on the test set.

The legend on the bottom right hand corner shows the AUC accuracy score for each model, which is the sum of the area under each curve. You can see from the curves and the AUC scores that all models perform about the same, while the transfer models perform slightly better than AlexNet. Recall that we didn't bother training the base model in order to get these stellar results. As a prototype, these results are great. However before moving this model into production we would want to explore re-training the baseline to see how close to 100% performance we can take the models, this is a medical lifesaving application after all.

The main benefit of plotting a ROC curve is that we can see for which model we should scan for the optimum probability threshold value in order to maximize True Positives and minimize False Positives.

To more forward we will scan for the optimum threshold value and pick a model based on the Confusion Matrix outcomes. This isn't something you can do by simply looking at the plot. You have to write a selection algorithm to programmatically do that for you.

def get_best_threshold(y_true, y_pred_prob):
    # Assuming you have computed the true labels and predicted probabilities
    fpr, tpr, thresholds = roc_curve(y_true, y_pred_prob)

    # Compute the False Negative Rate (FNR) and False Positive Rate (FPR) for each threshold
    fnr = 1 - tpr
    fpr_max = np.maximum(fpr, 0.1)  # Set a maximum threshold for FPR

    # Find the index of the threshold that minimizes the difference between FNR and FPR
    difference = fnr - fpr_max
    best_threshold_index = np.argmin(np.abs(difference))

    # Get the threshold value corresponding to the index
    best_threshold = thresholds[best_threshold_index]

    print("Best threshold:", best_threshold)
    print("TPR:", tpr[best_threshold_index])
    print("FPR:", fpr[best_threshold_index])

    return best_threshold, tpr[best_threshold_index], fpr[best_threshold_index]

For simplicity, we'll pick on of the transfer model's predicted probabilities to do the scan since all the transfer models perform about the same. By passing in Xception's predicted probabilities we get the following:

Best threshold: 0.3677

TPR: 0.897

FPR: 0.068

Confusion Matrices for all 4 models (image by author)

By selecting the Xception's optimum probability threshold value of 0.367 and reclassifying the test set images for all 4 models we arrive at the above confusion matrices.

We can see that they all perform very similar to each other. For our use case, we prioritized the minimization of False Negatives: we don't want to falsely inform someone that has malignant melanoma that it's benign. For this reason, we will move forward with the DenseNet model which has the lowest number of False Negatives.

DenseNet has the least amount False Negatives (Type 2 Errors) but the highest False Positive (Type 1 Error) count of all the models ( tied with AlexNet). The count of each type of error can be adjusted based on the needs of the project.

A False Negative means that a patience will be told that they don't have cancer but in fact do and a False Positive means that a patient will be told that they have cancer when in fact they do not. Subjectively, False Negatives are a worse type of error since untreated melanoma can be fatal while a False Positive can lead to a scare about a benign skin mark.

The following metrics are calculated for DenseNet on the test set using the new threshold.

Accuracy: 0.9176

Precision: 0.9403

Recall: 0.8909

F1-Score: 0.9149

The threshold we choose has led to the above binary classification metric values. All values are near or above 90%. This means that 9 out of 10 predictions will be correct (accuracy). 9 out 10 positive predictions will be correctly positive (precision). The proportion of actual positive cases that were correctly identified by the model out of all actual positive cases is also 90% (recall); this means that 10% of patients with melanoma will be classified as not having melanoma. And, finally, F1-score is a balanced average of Precision and Recall informing us that the model gets 90% of predictions correct and there isn't an imbalance between precision and recall.

Excellent! Now that we have validated our trained model's performance and selected DenseNet for our app, we can now move on to the Chatbot portion of our medical product.

Customize OpenAI's ChatGPT LLM

class chatbot():
    def __init__(self):
        self.llm = ChatOpenAI() 
        self.n_docs_retrieve = 10
        self.embeddings = OpenAIEmbeddings()

        self.__load_docs()
        self.__set_prompt()

    def __load_docs(self):
        docs = []
        for doc in os.listdir(os.getenv("PDF_FOLDER_PATH")):
            if doc.endswith('.pdf'):
                pdf_path = os.path.join(os.getenv("PDF_FOLDER_PATH"), doc)
                loader = PyPDFLoader(pdf_path)
                docs.extend(loader.load_and_split())

        self.faiss_index = FAISS.from_documents(docs, self.embeddings) 

    def __set_prompt(self):
        self.prompt = ChatPromptTemplate.from_template("""Your role is to provide information regarding melanoma. 
        Attempt to answer the following question based only on the provided context:

            
            {context}
            

        If the context doesn't provide an answer, provide an answer from your own knowledge.
        Question: {input}""")

    def get_retrieval_augmented_answer(self, query):
        # find docs sim to prompt
        docs = self.faiss_index.similarity_search(query, k=self.n_docs_retrieve) # find docs sim to prompt

        text_splitter = RecursiveCharacterTextSplitter()
        documents = text_splitter.split_documents(docs)
        vector = FAISS.from_documents(documents, self.embeddings) # we have this data indexed in a vectorstore

        document_chain = create_stuff_documents_chain(self.llm, self.prompt)

        # create retrieval 
        retriever = vector.as_retriever() 
        retrieval_chain = create_retrieval_chain(retriever, document_chain)

        # get response for prompt
        self.response = retrieval_chain.invoke({"input": query})
        return self.response["answer"]

The 2nd A.I. backend plugin for our medical product is the medical assistant chatbot. In the code above we can see how we have included OpenAI's GPT model to into a Retrieval Augmented Generation (RAG) language system. The framework that we are using in this example to create the Language Model System is Langchain.

RAG means that we have included reference documentation for the LLM to reference when answering a user's question. This is a highly desirable feature for two reasons:

It minimizes the possibility of the LLM hallucinating.
It provides the LLM with data that it might not have seen during its training.

While melanoma is a pretty well understand type of cancer, we don't want the LLM providing false answers to highly sensitive medical questions. Let's take a closer look at portions of our chatbot.

Loading Documents

def __load_docs(self):
        docs = []
        for doc in os.listdir(os.getenv("PDF_FOLDER_PATH")):
            if doc.endswith('.pdf'):
                pdf_path = os.path.join(os.getenv("PDF_FOLDER_PATH"), doc)
                loader = PyPDFLoader(pdf_path)
                docs.extend(loader.load_and_split())

        self.faiss_index = FAISS.from_documents(docs, self.embeddings)

This is where we load our curated documents about melanoma, split the documents into smaller portions, create vector embeddings, and store them in a vectorstore. The reason why we want to store these pdfs into vector embedding is for quicker look up times when searching for relevant information when answer a user's question. It provides a way to store unstructured data.

The vector store that we're using is called FAISS and was created by Meta. We load two pdf documents about melanoma. They are publicly available documents. You can view document 1 and document 2 if you'd like.

Set the Prompt

def __set_prompt(self):
        self.prompt = ChatPromptTemplate.from_template("""Your role is to provide information regarding melanoma. 
        Attempt to answer the following question based only on the provided context:

            
            {context}
            

        If the context doesn't provide an answer, provide an answer from your own knowledge.
        Question: {input}""")

Here we set up the prompt to be able to use whatever context we provide to answer a question. If we ask questions like "How does radiation therapy help treat malignant melanoma?" The language system will look up the relevant information and embed that into the prompt, paraphrasing it to answer the question.

In the prompt above, we are telling the LLM to use information about melanoma that it acquired during it's training to answer the question if the context is insufficient. We can also tell the LLM to not answer any questions for which there is no provided context as a safety protocol against providing potentially inaccurate information.

Retrieval Augmented Generation

def get_retrieval_augmented_answer(self, query):
    # find docs sim to prompt
    docs = self.faiss_index.similarity_search(query, k=self.n_docs_retrieve)
    text_splitter = RecursiveCharacterTextSplitter()
    documents = text_splitter.split_documents(docs)

    # we have the pdf data indexed in a vectorstore as embeddings
    vector = FAISS.from_documents(documents, self.embeddings) 

    document_chain = create_stuff_documents_chain(self.llm, self.prompt)

    # create retrieval 
    retriever = vector.as_retriever() 
    retrieval_chain = create_retrieval_chain(retriever, document_chain)

    # get response for prompt
    self.response = retrieval_chain.invoke({"input": query})
    return self.response["answer"]

This is main method that does the actual retrieval. First the query is passed into the vector store to perform a similarity search between the user's query (i.e. question) and the existing context (i.e. melanoma documents). Then it puts the prompt, llm, and retrieved documents into a chain (i.e. an order of operations). And finally returns the generated answer to the user.

The Dashboard

The CV and LLM are plugged into the app written using the Dash framework for builiding simple web Apps. We will not be breaking down the dashboard code here to prevent the article from becoming prohibitively long. Instead we are going to walk through how to use the app. Recall that you can view a demo video of the walk through (and all the code) on my Github repo.

Upload any sized image of suspected melanoma by clicking the Upload File button in the top left corner.
View DeepNet's prediction about the probability that the image contains malignant or benign melanoma on the top right corner.
View the reference images of benign and malignant melanoma on the bottom left.
Write any questions you have about melanoma to the AI chatbot on the bottom right and review the response.

That's it! This is how we can bring together a few technologies to create an A.I. driven software app. In this example I decided to address cancer because of my personal interest in health.

Tips on Starting a Project

Whether you are a data science student thinking of a capstone project or a data scientist thinking of starting a new personal project, I'd like to leave you with a few tips on how to make the brainstorming process a bit easier.

Think about your values

What do you value in life? Is it fitness, travel, dating, clean energy, video games, movies, maybe even philosophy and literature? This is a good starting because you are your values. So if you can see yourself in your work you'll find the motivation, significance, and importance you'll need to help clear the clutter and uncertainty in your mental space and see the value in potential projects.

Solutions to problems that other people are willing to pay for

Now you want to find an intersection between what you value and what other people value. In this article I chose health and cancer specifically but another example would be travel. A lot of people love to travel. If you could create an app that makes securing the logistics of travel easier, creation of an itinerary easier, and so on that will be worth something to a lot of people.

Technology

Now that you've defined a problem to solve and a market think about which technologies are best suited to solve the problem. In the travel example I can see LLMs being useful for translation, creation of itineraries, and much more.

Conclusion

We have seen how to take a open source data and technology to create a prototype of a medical application for the diagnosis and education of melanoma skin cancer. We've used Computer Vision to build an image classifier. We've used OpenAI's GPT model to create a Retrieval Augmented Generation language system. We then plugged both AI models into a dashboard created by Dash.

If you've made it this far and enjoyed the article, I'd like to ask you to clap for it and share it with anyone you believe would benefit from it. That would be very helpful in helping me reach more people.

About the Author

Alexander Barriga is a Data Scientist who has taught and applied Data Science for a variety of industries including clean energy, insurTech, edTech, and NASA.

He is also currently looking for Data Science or Machine Learning Engineering role. Reach out on LinkedIn.