Everybody’s talking about Big Data, but do they really know what Big Data is, and are they using it to solve real problems and gain a competitive advantage?
Do they even have a strategy to implement Big Data? The Wall Street Journal says that for many companies, it’s all hype.
ProSensus Clients are Different
You’re looking for actionable insights in your data, no matter what the current buzzword – and we’re here to find those insights, so you can improve product quality, impress your customers, and save money.
Big Data isn’t a Synonym for Data Analysis.
Wikipedia says Big Data is “a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications”.
Industry heavyweights like IBM say data is big if it has 4 V’s – volume, variety, velocity, and veracity, and you need a distributed solution like Hadoop® to process it.
And Quartz reports that “Most data isn’t “big”, and businesses are wasting money pretending it is”.
Industrial Process Data has at least 3 V’s of Big Data.
- Variety. Industrial process data has
- real time measurements, like temperatures, pressures, and flows
- periodic lab measurements, like viscosity of fluids or counts of living cells
- array data from spectral instruments like near infrared or raman spectrometers
- Velocity. Real-time monitoring of industrial process data implies a velocity that depends on the system dynamics. And as manufacturing equipment becomes more highly instrumented and connected, aka the industrial “internet of things”, there will be more data streams to be analyzed.
- Veracity. In Big Data terms, veracity means problems in data accuracy and integrity. Industrial process data has noise in the measurements and missing data values. Missing data happens because of data connectivity issues, sensor malfunctions, or sporadic testing. But that’s ok! We use multivariate analysis methods, which handle noise and missing data implicitly.
- Volume. In typical areas of Big Data, there are huge numbers of observations like phone calls being made, internet searches being done, or cars on the highway. Industrial process data can have many observations too, but it also has many variables – hundreds of process sensors, raw material data, and QA lab measurements. We have yet to encounter a data set that requires distributed computing, but from a traditional statistics perspective, industrial data is big and messy. Traditional methods can’t handle:
- diverse data blocks that must be combined for analysis (process measurements, QA data, raw material properties)
- huge numbers of highly-correlated measurements
- simultaneous prediction of multiple y-variables
Multivariate Analysis to the Rescue
Our experienced engineers can make sense of the variety of data from industrial processes. First we structure the data according to the problem at hand. Then we build multivariate models that simultaneously predict multiple y-variables while managing the veracity problems of noise and missing data and the inevitable correlations that come with high volumes of data. We help you use the models to troubleshoot a problem or implement real-time multivariate monitoring to handle the velocity of your process data stream.
We can Call it Big Data if you Like
Our founder Dr. John F. MacGregor has been applying multivariate analysis to industrial process problems for decades, and just because there’s a new buzzword doesn’t change what we do. We’re here to help you find actionable insights in your data so you can monitor, predict, and improve product quality, and ultimately impress your customers.