A Deeper Look at the Evolution of Deep Learning
“Deep learning” or “machine learning” is a new technique made possible by the amazing computing power increases of recent years. In the avalanche of information about new uses for artificial intelligence (AI), sometimes it is forgotten that neural network calibration itself is not new, the tasks are just getting much more ambitious.
Fundamental developments in feedforward artificial neural networks from the previous thirty years were examined in a review article in 1990 [1]. The history, origination, operating characteristics, and basic theory of several supervised neural-network training algorithms (including the perceptron rule, the least-mean-square algorithm, three Madaline rules, and the backpropagation technique) are described. Neural Network calibration was one of the alternative techniques available in the 90’s for chemometricians correlating spectral data and chemical properties. A twenty-plus year-old article [1], for example, used infrared spectral points as inputs with adulterated or non-adulterated instant coffees. At that time the method was called Artificial Neural Network (ANN), but the idea behind it was the same as modern deep learning techniques, connecting inputs and outputs via layers of interconnected “neurons”, just like in the human brain. Glucose, starch and/or chicory were added to different streams of instant coffees, then diffuse reflectance Fourier transform infrared and ATR spectra were obtained in the 800-2000 cm-1 region and used as inputs. “Pure” and “Adulterated” were the 0 or 1 output variables for each measurement in the set. Standard or stochastic back-propagation techniques were applied to obtain the weights for the hidden layers.
The authors were warning of the danger of overfitting, i.e. calibrating on one specific set under specific conditions so exactly that it becomes difficult to get good fit on a different set, on a different day or under slightly different measurement conditions. They were therefore performing cross-validation, meaning that part of the available data set was not used for the calibration, only for validation. For a simple task like this, they used 50 out of the 146 spectra as validation and reported 100% classification with this technique.
It is important to note that modern deep learning techniques generally use more sophisticated network architecture and training strategies, better algorithms, more powerful computers and a lot more input variables, but the fundamental problems we are facing are pretty much the same as they were twenty years ago. Calibration, i.e. learning, is the same, when relating input and output variables. The variables may be more complex, such as high-resolution images or hyperspectral data cubes, but the need for proper validation and avoiding overfitting is the same as ever. Therefore, to build a successful deep learning model it is critically important to understand the inner workings of the model, the input variables (features, examples) you are trying to learn and the kind of outcome you are trying to predict.
[1] “30 years of adaptive neural networks: perceptron, Madaline and backpropagation” by Widrow, B., Lehr, M.A., Proc. IEEE, 78 (9) (1990)
[2] “Approaches to Adulteration Detection in Instant Coffees using Infrared Spectroscopy and Chemometrics” R. Briandet, E.K. Kemsley and R.H.Wilson, J Sci Food Agric 71, 359-366 (1996)