Dr. John F. MacGregor, President & Founder of ProSensus, is a world renowned chemical engineer and statistician, and also a Distinguished University Professor Emeritus of McMaster University.
Here, he answers a few questions that you might have about multivariate analysis in manufacturing.
“I have no doubt that the use of the massively increasing amounts of data that we are collecting on processes will be the most important factor influencing control engineering practice over the next few decades.”Dr. John F. MacGregor, President & Founder, ProSensus (CONTROL magazine, March 2013)
Q: As a chemical engineer, how did you end up spending so much of your career using statistics?
A: I saw that so much of chemical engineering involved interpreting plant and research data, so I became very interested in the analysis of data. During my graduate work in chemical engineering at the University of Wisconsin, I became fascinated with the work of Professors George Box and William Hunter in the statistics department, and so I did my Ph.D. in statistics. My research career at McMaster University included polymer reaction engineering and advanced process control, but I always continued the engineering statistics research as well, until it eventually became my main research focus.
Q: Can you describe multivariate analysis, in a nutshell?
A: I sometimes like to say that multivariate analysis is like music. In the movie Amadeus, the Emperor says to Mozart “There are simply too many notes, that’s all.” It’s the same with data – the human brain can only comprehend so many data points. But multivariate analysis makes sense of the data, the same way that the ear makes sense of so many notes in a piece of music.
Q: What is your point of view on fundamental models vs. statistical models?
A: As a chemical engineer who worked in advanced control theory and polymer reaction engineering, I’m a big believer in using a fundamental model, if you have one. Often you don’t, and even if you do, it generally only covers a very small part of the whole process. The benefit of statistical models is that they can include all of the theoretical calcuations as well as raw data to come up with a much more complete interpretation. This is one of the reasons we use multivariate (latent variable) models in reduced dimensions as opposed to standard regression methods; because all of these measurements and calculations are not independant.
Q: Is multivariate analysis just another fad like neural networks?
A: Definitely not. The reason I’ve focused on multivariate models is that they handle real data from industry – large, highly correlated data sets with maybe only 4-5 independent dimensions, or latent variables. The latent variable methods are based on, and take advantage of, those features. Latent variable methods also allow us to easily handle missing data, which is always a feature of industrial data sets. Even when you build models on historical plant data, latent variable methods still produce unique, cause-and-effect models in the reduced dimensional space, which allows you to do control and optimization. This latter point is very important because we always have large amounts of historical data that have not necessarily come from designed experiments. Neural networks and other traditional regression methods are very useful for prediction as long as the correlation structure of the data never changes, but they don’t provide unique nor causal models, nor interpretability, and they are much less able to handle missing data.
Q: What does it mean when the correlation structure changes?
A: It means the underlying process is behaving differently, maybe due to raw material variations, or environmental factors, or equipment failure somewhere in the process, etc. The multivariate models are very good at detecting these situations whereas traditional regression methods just can’t.
Q: Before ProSensus was spun off in 2004, you already worked very closely with industry, solving practical problems with multivariate analysis. What was your motivation to spin-off a company?
A: There was lots of great research being done on multivariate methods within the McMaster Advanced Control Consortium, and I wanted to create a company that could develop these ideas further and apply them even more broadly in industry. I had students who wanted to continue working on multivariate methods as well, and starting ProSensus created an opportunity for them to continue high-tech research in engineering without having to spread out across the globe.
Q: Why is now the right time for companies to embrace multivariate analysis?
A: We’re in a world of ever-increasing amounts of data, and to really optimize our processes and develop new products, we need to make use of that data.
Q: Some companies would like to develop multivariate expertise in-house. What’s the advantage if they choose to work with ProSensus instead?
A: Two things: Number one, we greatly encourage companies to train some of their people in multivariate analysis because it greatly enhances the company’s ability to work with ProSensus and maintain continuity of the programs they initiate. A working knowledge of the methods is necessary to be able to champion multivariate analysis within a company. And number two, it’s foolish for companies to try to do all their work in-house and not rely on any expertise from outside…this is true not just for multivariate analysis but also for advanced control systems and many other aspects manufacturing. At ProSensus, our team has many years of advanced development and industrial experience in the application of multivariate analysis, so we can help our clients get to the right solution, quickly.