Using AI to Classify Videos for Alzheimer’s(First Attempt)

Using Keras To Aid Researchers In the Race to Find a Cure for Alzheimer's

7 min readJun 13, 2020

NOTE: This article is my first attempt. For my most recent attempt, refer to the link below: https://towardsdatascience.com/detecting-precursors-of-alzheimers-by-utilizing-deep-learning-a6de0ee0e2d2

Alzheimer’s

We all forget stuff. At some point, we’ve gone into a room and simply forgot why we went there. For most people this doesn’t happen everyday, but for some people it does. For around 44 Million people, forgetting where they are, who they are with, what they just said is a fact of life. Alzheimer’s is a progressive disorder which is a result of brain cells dying out. Symptoms include a reduction in cognitive thinking, memory loss, disturbed sleep and loss of speech. One in 10 people will get Alzheimer’s in their life, but what makes the condition worse is that there is no cure. The current available treatments can only mildly reduce symptoms, which doesn’t sit well with me.

Why don’t we have a cure? It’s because scientists are still trying to figure out what exactly causes it. I decided to go on a little researching adventure to learn more about the causes of Alzheimer's. I stumbled upon a paper which stated that Alzheimer's is caused by a lack of blood flow to the cerebrum. Alzheimer's patients constantly live with a similar feeling to when you stand up to quickly and feel dizzy. That’s Terrible. What’s interesting is that the connection to Alzheimer's has been known for many years.

So why haven’t scientists targeted blood flow? Well, It’s because scientists don’t have an exact understanding of how lack of blood flow impacts memory. Some researches suggest that it may be caused by white blood cells sticking to capillaries, restricting blood flow. I went deeper into the rabbit hole and found an organization called StallCatchers, who collect videos of blood flowing in the brain of mice and classify them into either stalled or flowing. They are committed to classifying thousands of videos which go to scientists who use the classified videos to learn more about the stalling, and try to figure out ways to treat it. The videos come from Cornell researchers who use Two-photon excitation microscopy to take images strung into videos of live brain tissue in mice. Me being an AI enthusiast thought that there had to be a better way to classify these videos with Artificial Intelligence. To my surprise I found that Mathworks, a company big into data-science, was hosting a competition to make a video classifier to help out stall catchers. Naturally, I signed up, and here we are.

Model Overview

My idea was to use a Convolutional Neural Net(CNN) to classify each image and taking the average of all the frames in a video, outputting the class based on the average. Yes, I know this is very crude, but I plan on updating the model in the weeks to come.

Before I get into the code, here’s a very brief overview of CNNs:

Basic overview of CNN structure. Credit: Aphex34, via wiki (CC BY-SA 4.0)

Convolutional Neural Nets are generally used to classify images. Computers read images in the form of an array, and CNNs pass this array through multiple layers. These layers include: convolutional layers, pooling layers and fully connected layers. Convolutional and pooling layers have different sized kernels which pass over the image arrays outputting a single new value into a smaller matrix. Convolutional layers apply weights and biases to the pixel values, whereas pooling layers can for example take the highest value in the kernel range. Think of those layers as a flashlight passing over the image outputting the most important(or bright) values. Fully connected layers are similar to neural nets, as they have nodes which apply weights and biases ultimately giving an output class prediction.

*Note: if you want a more in-depth explanation of how the CNN I used works check out me previous article: Translating Sign Language in Real Time With AI*

Alright, now that we know what a CNN is, let’s dive into the first step: Data Preparation!

Data Preparation

I would say this was the most tedious part of this project was the data organization. It involved organizing the data, and making sure that the model was only training on the area of interest.

The end goal of the data prep. for each frame in thousands of videos

ROI Cropping and Frame Extraction

The first step is to extract each frame from the videos because the CNN trains on images. For best results, the images should represent the most important part of the image, or where the two classes have differences the algorithm can identify. In these images, a red circle can be seen showing the region of interest(ROI). It would make the most sense to focus only on this area of the picture, and to forget about the rest. I combined these two steps together to make things easier.

Here CV2 is used. It is used to read in the specific videos which I want to extract the frames from, and then crop out the region of interest. In order to crop out the area in the red circle, thresholding was used. Thresholding is a technique used to show only certain colors in the image, to make one area(the red circle) stand out. A rectangle was then drawn around the red circle, and then used to crop the original image based on the rectangle’s point coordinates. Due to different sized ROIs, padding is required to make each frame 48x48.

Organization

Once all the frames in the training videos are extracted, they need to be organized into training and validation folders. I chose to go with an 80–20 split with 80% in training, and 20 in validation.

ROI Cropping on Test Videos

The final step is to crop each test video down to the region of interest. This is slightly different than the previous step, in that we don’t want each frame to be extracted, instead wanting a video at the end. In addition, we want all the videos/frames to be the same size.

Parts of the code from https://gist.github.com/jdhao/f8422980355301ba30b6774f610484f2 were used for adding padding

For this step, CV2 is used again to take each frame crop it to the ROI, add padding to them to make them 48x48 and finally writing the frames to a .avi video file instead of individual frames. I chose to define the ROI rectangle coordinates on the first frame instead of every frame because the video turned out smoother that way.

Now it’s time to train!

Training the Model

The CNN model I used was almost the same as the one I used for my previous project. The model is only slightly different, in that instead of predicting 28 different class, it’s predicting and training on two(flowing or stalled). This prompted me to change the activation function to a sigmoid function, along with changing the loss function to binary cross-entropy, both of which are used when there are 2 classes. To accommodate the change, the last layer in the model now only has one node which will give an output ranging from 0-1. If the prediction is equal to or above .5, the final prediction is stalled.

The model has 4 convolutional layers, which use various size and amounts of filters over the images to simplify them to their most important parts. 2 fully connected layers are then used to classify the now simplified array/image. Finally, the output is run through a sigmoid activation function to give a prediction.

Results

The model had a suspiciously high accuracy with 96% accuracy. When looking at the loss and accuracy graphs, there may be some overfitting as the validation loss is a bit higher than the training loss. This means that the algorithm is not general enough, and will have low accuracy when testing on new videos.

Testing

Now the model has to classify new data that it hasn’t seen before. This is where we truly get to see if the model has learned.

Prediction

To test on the videos, cv2 is once again used to read each frame. Model.predict() is used to get the predicted class of that frame using forward propagation(sending the image through the model). This prediction is then added on to the “total” variable. It is also adding 1 to “n” each time a frame is read. Then, once all the frames in a video are read, the average is calculated by simply dividing the two.

Writing to CSV File

The average classification is then run through a simple if function: where if the average is greater than or equal to 0.5, the classification is “1”, and if it isn’t the classification is “0”. In order to submit a submission in the competition, a csv file with filename, and prediction must be created. Using csv_writer, the function “append_row” is used to open a csv file and write in two columns: the filename, and the prediction(either 0 or 1).

Submission and Rank Based on Accuracy

After submitting, the accuracy of the prediction is calculated using Matthew’s correlation coefficient (MCC). It spits out a value between -1 and +1, with 1 representing perfect prediction and 0 representing 50% accuracy. My accuracy turned out to be 0.05 which is quite low, but it is still in the top 25. When evaluating the model using model.evaluate, a test accuracy of 51% is shown which is very low and a big sign that improvement is necessary.

Next Steps

Evidently, if I want to get higher accuracy, I am going to have to make modification to my model. My plan is to next implement an LSTM on the end of the CNN because they are used for time series data, which is what videos are.

Thanks for reading, and stay tuned for more revisions in the coming weeks!

Connect with me:

Linkedin: https://www.linkedin.com/in/vikram-menon-986a67193

Email: vikrammenon03@gmail.com