12.4: Neural Networks for automatic model construction
- Page ID
- 22519
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)
( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\id}{\mathrm{id}}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\kernel}{\mathrm{null}\,}\)
\( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\)
\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\)
\( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)
\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)
\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)
\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vectorC}[1]{\textbf{#1}} \)
\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)
\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)
\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)Authors: Group E - Ardemis Boghossian, James Brown, Sara Zak
Introduction
Multiple Input Multiple Output (MIMOs) are systems that require multiple inputs and generate multiple outputs. MIMOs are controlled by controllers that combine multiple input readings in an algorithm to generate multiple output signals. MIMOs can be used with a variety of algorithms. The most versatile algorithm used to date is the neural network. Neural networks, which were initially designed to imitate human neurons, work to store, analyze, and identify patterns in input readings to generate output signals. In chemical engineering, neural networks are used to predict the ouputs of systems such as distillation columns and CSTRs. This article will discuss how neural networks work, the advantages and disadvantages of neural networks, and some common applications of the networks.
MIMOs
As mentioned, Multiple Inputs Multiple Outputs (MIMOs) are systems that require multiple inputs and generate multiple outputs, as shown schematically in Figure 1.
As shown in the figure, MIMOs are controlled by controllers that combine multiple input readings in an algorithm to generate multiple output signals. Typically, MIMOs do not require that the the number of inputs and outputs be the same. Instead, any number of input readings could be used to generate any number of output signals.
Neural Networks
Various types of controllers can be used to control MIMOs. One of the most accurate and versatile controllers used for MIMOs is the neural network. Neural networks are controllers that crudely imitate the human neuron. Initially, these networks were designed to model neural brain activity. However, as people began to recognize the advantages of neural networks, these networks were applied to controller algorithm design. Like a human neuron, neural networks work to store, analyze, and identify patterns in data by performing learning tasks. The ability of these networks to "learn" parallels the human neuron's ability to learn, making these automatic controllers the closest analog to a human controller.
Neurons
Like neurons in the body, network neurons receive inputs, store this data, and transmit outputs to either another neuron or directly to the MIMO. In order to transmit this data, the neuron must relate the multiple inputs to the multiple outputs. A simple mathematical representation of this relationship is shown below.
\[y=f\left(w_{1} a_{1}+w_{2} a_{2}+w_{3} a_{3}+\ldots+w_{n} a_{n}\right)=f\left(\sum_{i=1}^{n} w_{i} a_{i}\right) \nonumber \]
where
- wi = weight
- ai = input
- y = output
- f = sigmoid function (any nonlinear function)
According to this relationship, the multiple input parameters are each multiplied by the corresponding weight factor, wi. These weight factors "weigh" the significance of each input, scaling each input proportionally to the effect it will have on the output. These weighted inputs are then added and the sum is input into the sigmoid function to generate an output. This output can then be sent to multiple neurons that, in turn, each generate its own output.
The sigmoid function in this relationship is a nonlinear, empirical function that relates the input readings to the output signals. This empirical function can take on many forms depending on the data set. The equation that is best able to predict the outputs for the given system will be used (polynomial, sine, logarithmic, etc.). For example, one form this function may take is the hyperbolic sine function, where
\[f(x)=\sinh (\alpha x) \nonumber \]
where
x = sum of weighted inputs =
α = empirical parameter
In this sigmoid function, α is an empirical parameter that adjusts the function outputs. The effect of α on this sigmoid function is shown in Figure 2.
As shown in Figure 2, increasing α increases the output for this particular sigmoid function. Like most empirical functions, the hyperbolic sine sigmoid function can only be used within a specified range of values (x values between the vertical asymptotes). In this case, the range would depend on the value of α.
Combining Neurons into Neural Networks
Once neurons have been programmed to correlate input and output data, they can be connected in a feedforward series to produce a neural network, or neural net (NN). A schematic diagram of a neural network is shown in Figure 3.
Figure 3 shows MIMO parameters, such as temperature, pressure, and flow readings, are first processed in the first layer of neurons. The outputs of the first layer of neurons then serve as the inputs to the second layer. The outputs of the second layer then become the inputs to the third layer, and so on, until the final output of the network is used to directly affect MIMO controls such as valves. The layers of neurons between the initial and final layers are known as hidden layers.
The way neurons within hidden layers correlate the inputs and outputs is analogous to the way individual neurons correlate these variables. As shown in the flowchart inset of the diagram, a neuron within the network receives multiple inputs. Each of these input parameters are then weighted, having the most effective parameters weighted more heavily. These weighted input values are added and input into the particular sigmoid function the neuron is programmed to follow. The output of this function is then sent to other neurons as input. These input values are then reweighed for that particular neuron. This process is continued until the final output of the neuron network is used to adjust the desired controls. Although the diagram shows only one hidden layer, these neuron networks can consist of multiple layers. Although almost all continuous functions can be approximated by a single hidden layer, incorporating multiple hidden layers decreases the numbers of weights used. Since more layers will result in more parameters, each of these individual parameters will be weighted less. With more parameters and less weights, the system becomes more sensitive to parameter change (a greater "rippling" effect within the network).
The "rippling" effect of a neuron network makes the system difficult to model analytically. A small change in a single input variable would result in multiple changes throughout the the entire network. Although modelling these complex networks is beyond the scope of the class, only a basic, qualitative understanding of how neural networks function is necessary to analyze neural network controllers and their effects on input and output parameters.
Learning Process
The ability of neural networks to learn distinguishes them from most automatic controllers. Like humans, neural networks learn by example, and thus need to be trained. Neural networks are usually configured to specific applications and have the ability to process large amounts of data. Complex trends and patterns can be detected by neural networks that would otherwise be imperceptible to humans or other computing programs. Within neural networks, there are learning procedures that allow the device to recognize a certain pattern and carry out a specific task. These learning procedures consist of an algorithm that enables the network to determine the weighting parameters in order to match the given data (inputs and outputs) with a function. In this iterative procedure, the initial input values are used to generate initial output values. Based on these input and output values, the weights within the network are adjusted to match the data. These adjusted weights are then used to correlate the next pair of input and output values. Again, these values are used to adjust weights. This process continues until the network obtains a good fit for the data. A flow chart summarizing this iterative process is shown in Figure 4.
The neural network learns by tracing the path of an object over a specific area and sequence. For this application, inputs for the neural network included the position, velocity, and direction. Once the the neural network has completed its training, it will use what it learned to predict trajectories. The neural network can predict where the object will move given the position, velocity, and direction of the object.
The trajectories shown in these videos demonstrate an important characteristic of neural networks. In these videos, the neural network can only predict outputs for objects that behave similar to the output it encountered during training. For instance, if a person was to start running forward, then stop, spin in a circle, and start running backwards, the neural network output would be unpredictable.
Once the neural network becomes capable of predicting system outputs, it is run like any other controller. In real-time, this often entails the chemical process and a computer system of some sort. Often, such as in LabVIEW, the chemical engineer is presented with a user-friendly data acquisition program that allows the user to set desired temperatures, flowrates, etc. and displays the system's input and outputs. Although the neural network continually sends signals to the system controllers (such as valves), the network algorithm is embedded in the acquisition program. Therefore, once the system is up and running, the user does not directly see the algorithm working.
Advantages and Disadvantages
Given the interesting, human-like behavior of neural networks, one would expect all process applications to be controlled by neural networks. However, the advantages and disadvantages of neural networks limit their use in applications. The following lists summarize these advantages and disadvantages.
Advantages
- Neural networks are very general and can capture a variety of patterns very accurately
- The static, non-linear function used by neural networks provide a method to fit the parameters of a particular function to a given set of data.
- A wide variety of functions can be used to fit a given set of data
- Neural networks do not require excessive statistical training
- There is no need to assume an underlying input data distribution when programming a neural network
- Neural networks can detect all possible, complex nonlinear relationships between input and outputs
- Neural networks are the closest thing to having an actual human operate a system (i.e. they can "learn")
Disadvantages
- Neural networks are difficult to design. One must determine the optimal number of nodes, hidden layers, sigmoid function, etc.
- Neural networks are diffcult to model analytically because a small change in a single input will affect the entire network
- The operation of neural networks is limited to the training process. If the network is trained poorly, then it will operate poorly and the outputs cannot be guaranteed.
- There is a great computational burden associated with neural networks
- Neural networks require a large sample size in order to empirically fit data
- Neural networks have a “black box” nature. Therefore, errors within the complex network are difficult to target.
- Outside of their data training range, neural networks are unpredictable. This occurs because neural networks may "overfit" data. For instance, during training, a neural network may fit a 10th order polynomial to only 5 data points. When using this model to make predictions of values outside this 5-point range, the neural network behaves unpredictably.
Applications of Neural Networks
Each neural network will generate a different algorithm based on the inputs and outputs to the system. Because neural networks fit a specific function to the given data, they can be used in a variety of applications. Within chemical engineering, neural networks are frequently used to predict how changing one input (such as pressure, temperature, etc.) on a distillation column will influence the compositions and flow rates of the streams exiting the column. The network training is performed on various inputs to the column, and thus can predict how changing one input will affect the product streams. Within a CSTR, neural networks can be used to determine the effect of one input parameter such as temperature or pressure on the products.
In addition to its applications to chemical equipment, neural networks can also be applied to model material responsess as a function of various loads under various conditions. These models can then be used in product development to create a device for a particular application, or to improve an existing device. For example, by modeling the corrosion of steel under different temperature and pH conditions, implanted biomedical devices can be manufactured or improved.
Neural networks are also often used in biology and biological applications to predict the outcome of a certain event. For example, neural networks can be used to predict the growth of cells and bacteria in cell culture labs, given a set of varying conditions, such as temperature and pH. In addition, neural network models have been used to predict the mortality rate in intensive care units in hospitals. Data was collected from different patients, and a neural network model was created to predict the mortality of future patients given a set of specified conditions. Neural networks have also been used to diagnose breast cancer in patients by predicting the effects of a tumor provided specified input conditions of the patient.
Neural networks are also used in applications beyond the chemical aspect of controllers. For instance, neural networks are used to predict travel time based on different travel conditions. Signs on the highways that give estimated travel times are examples of neural networks. They predict the amount of time required to reach a certain destination given the varying traffic volume, road conditions, and weather conditions.
Neural networks are not limited to the applications listed above. They can be used to model most predictable events, and the complexity of the network will increase depending on the situation. For more information on the uses listed above, refer to the journal articles listed in the references section of this wiki.
Hypothetical Industries has expanded to now include a biology lab! As an employee, you are working on developing a new antibiotic, and thus have been assigned to predict the growth of bacteria. Your boss wants to know how the growth of the bacteria is affected by different conditions, such as temperatures, pH, and provided nutrients. You don't really feel like manipulating all these conditions, and then sitting and watching bacteria grow on a petri dish, so you decide to come up with a way to predict the how the bacteria will grow. Using the information presented in the wiki, determine what the inputs and outputs to this neural network are.
Solution
The inputs to the neural network are each of the parameters that may affect the growth of the bacteria. In this situation, the inputs are the temperature, pH, and nutrients provided (such as sugars, amino acids, and antibiotics). The outputs from this system include the growth of the bacteria.
As seen in the example above, a neural network can be used to predict bacterial growth. Given the information presented in the wiki, explain the advantages and disadvantages of using a neural network to model bacterial growth.
Solution
Advantages
- Because there are so many inputs (temperature, pH, etc.), a neural network fits a function to this data that is able to predict how future conditions would affect the bacterial growth.
- Neural networks provide a mechanical method of modeling the system of bacterial growth that is extremely similar to have an acutal human predict the growth of bacteria.
- Someone monitoring the system does not need much statistical training to use the neural network.
Disadvantages
- You must first run a large number of samples with varying conditions in order to obtain a good fit for the neural network. Based on how well you would like your network to predict bacterial growth, the number of samples that must be run in order to create the network changes. The network will only function as well as it is trained, so if a high degree of accuracy is desired, then more data inputs will be required. This becomes time comsuming and expensive.
- Because of the black box nature of the neural networks, it is difficult to determine how the individual parameters such as temperature or pH will affect the bacterial growth.
- Neural networks cannot be used to predict the growth patterns of the bacteria outside of the given data ranges.
Why would someone want to increase the number of hidden layers when combining neurons into a neural network?
- To decrease the amount of programming needed
- To decrease the number of weights required
- To increase the cost associated with the system
- To increase the aesthetic structure of the system
What does the neural network output if its inputs are outside its training range?
- Zero
- Input^2
- sqrt(input)
- The outputs outside of range are unpredictable
References
- Accurate freeway travel time prediction with state-space neural networks under missing data. Van-zuylen. Transportation research part c. 2005/10/12. 13(5-6) pp347-369.
- Astrom, K.J., Hagglund, T. Advanced PID Control. ISA- The Istrumentation Systems and Automation Society.
- Campos, Lucio P.A., Silva, Aristorfanes C. Barros, Allan Kardec. Diagnosis of breast cancer in digital mammograms using independent component analysis and neural networks. Lecture Notes in Computer Science, vol 3773, pub 2005, p460-469.
- Chan, CH, Chow, PY. Application of artificial neural networks to establish a predictive mortality risk model in children admitted to pediatric intensive care unit. Singapore Med J. Volume 47, Issue 11, Date 2006/10/31, pages 928-934.
- Yu, C. Davidson, VJ, Yang, SX. A neural network approach to predict durvival/death and growth/no-growth interfaces for Escherichia coli O157:H7. Food Microbiology [Food Microbiol.]. Vol. 23, no. 6, pp. 552-560. Sep 2006