Healthcare Analytics:

A Breast Cancer Application

By and |2019-09-18T10:37:57-04:00September 18, 2019||

Breast Cancer According to global statistics, breast cancer is one of the most common cancers among women. Despite awareness campaigns and high visibility, the number of women dying of metastatic breast cancer remains unchanged since 1970.1

What if advanced analytics and personalized treatment plans could help?

This article details a multivariate analysis on a publicly available dataset of mammogram images, and discusses what the data requirements would be in order to develop personalized treatment plans for breast cancer patients.

Artificial intelligence (AI) and digital image classification techniques can quantify information from images that is usually not detectable by the human eye, and this information can complement conventional clinical decision making. The main application of AI in breast cancer research is feature extraction and image classification to determine whether a tumor is Benign (B – non-cancerous) or Malignant (M – cancerous) based on mammogram images.

Using MVA For Image Classification

There are various AI algorithms that can be applied to classify mammogram images. This article demonstrates Multivariate Analysis (MVA), which is one of the most simple yet accurate classification algorithms.

The dataset examined had 570 images with (30 extracted features – 10 features from each cell nucleus and their corresponding mean, standard deviation, and “worst” (mean of the highest 3 values). ProSensus built a PCA (principal component analysis) model with two components on this dataset.

In other words, PCA summarizes the initial 30 variables (extracted features) in this dataset into two independent summary variables/components that capture most of the variation in the data. This variable reduction helps in model interpretation and reduces model complexity.

The Results

Breast Cancer - Score Plot PCAThe distinct clustering in the score plot (where each point represents 1 of the 570 images) shows a clear separation between B and M masses.

This result indicates that our model is effective at capturing variation between the 30 extracted features that is correlated to the outcome (B vs M).

Feature Differences – B vs M

The next question is, what is different between the extracted features of B vs M masses?
Mammogram Images Comparison of sample mammograms of benign (left) and malignant (right) masses2.

Though the untrained human eye may have difficulty distinguishing the Benign and Malignant mammogram images above, a contribution plot on the PCA model can readily summarize the key distinguishing features.

The contribution plot below shows that M (cancerous) images had higher concavity, radius, area, and compactness compared to B (non-cancerous) ones.

This plot is extremely useful – not only does it identify the key feature differences between M and B tumors, but it also suggests the required features for conclusive image classification. For instance, we would likely not be able to properly classify the tissue masses if we were to rely only on texture, symmetry, and smoothness features.
Breast Cancer - Contributions

Model Validation

Before claiming any success, one should test the model on new data to ensure that the results still hold. Therefore, we excluded 30% of the images to use as a testing set.

Breast Cancer - Score Plot Prediction

As evident in the score plot, most of the test runs (94%) were classified as expected; in other words, the Malignant test data fell on the Malignant training data (right side of the score plot), while the Benign test data fell on the Benign training data (left side of the score plot).

With this simple model, we see that 14% of malignancies are not classified correctly. For borderline cases, additional clinical testing is of course required. Nonetheless, this preliminary and straight-forward analysis demonstrates some of the potential of advanced analytics in the medical field.

What’s Next – Advanced Modeling & Personalized Treatment Plans

Ultimately, the goal is to reach the next milestone in the medical field – personalized treatment plans once cancer has been diagnosed. This requires a large dataset of past patients in order to build and validate an advanced predictive model. While such databases may exist, it is unlikely that they have been fully evaluated for these purposes.


An advanced predictive model can use these datasets to correlate cancer survival rate to a patient’s lifestyle features, stage at diagnosis, their unique genetic mutations, mammogram images as well as their treatment plan. After such model has been validated, it can be used to back-calculate an optimal personalized treatment plan that maximizes their chance of survival and perhaps even minimizes unpleasant side effects.


  1. Westervelt, A. (2019). Why Are So Many Women Still Dying of Breast Cancer? | Dame Magazine. Retrieved 12 August 2019, from
  2. Levenson, Richard & Krupinski, Elizabeth & Navarro, Víctor & Wasserman, Edward. (2015). Pigeons (Columba livia) as Trainable Observers of Pathology and Radiology Breast Cancer Images. PLoS ONE. 10. 10.1371/journal.pone.0141357.

About the Author: Marlene Cardin

Marlene Cardin
Marlene Cardin, MASc
Director of Projects
Since 2007, Marlene has helped many Fortune 500 companies in the pharmaceutical, food & beverage, speciality chemical, and oil & gas industries get actionable insights from their big data. In addition to working closely with our consulting clients, Marlene oversees the growth of our ProVision division which offers fully turnkey machine vision systems for real-time quality monitoring & control. Marlene has a Bachelor’s degree in Chemical Engineering and a Master’s Degree in Applied Science from McMaster University (Hamilton, Ontario). Marlene completed her Master’s degree under the supervision of ProSensus founder, Dr. John F. MacGregor.

About the Author: Monica Salib

Monica Salib
Monica Salib, B.Eng.
Project Engineer
Monica holds a chemical engineering degree from McMaster University. She has been involved with client projects in rapid product development, troubleshooting analyses, in-house courses, and advanced modeling sessions. Monica has impressed clients with her attention to detail and ability to organize large datasets. Meet Monica at the 2019 International Elastomer Conference in Cleveland.