Search

4.2: Universality with one input and one output

https://eng.libretexts.org/Bookshelves/Computer_Science/Applied_Programming/Neural_Networks_and_Deep_Learning_(Nielsen)/04%3A_A_visual_proof_that_neural_nets_can_compute_any_function/4.02%3A_Universality_with_one_input_and_one_output

You'll see that the graph changes shape when this happens, since we have moved from a situation where the top hidden neuron is the first to be activated to a situation where the bottom hidden neuron i...You'll see that the graph changes shape when this happens, since we have moved from a situation where the top hidden neuron is the first to be activated to a situation where the bottom hidden neuron is the first to be activated. In other words, we can directly manipulate the function appearing in the graph on the right, and see that reflected in the \(h\) values on the left.

4.4: Extension beyond sigmoid neurons

https://eng.libretexts.org/Bookshelves/Computer_Science/Applied_Programming/Neural_Networks_and_Deep_Learning_(Nielsen)/04%3A_A_visual_proof_that_neural_nets_can_compute_any_function/4.04%3A_Extension_beyond_sigmoid_neurons

Recall that in a sigmoid neuron the inputs \(x_1,x_2,…\) result in the output \(σ(\sum_j{w_jx_j+b})\),where \(w_j\) are the weights, \(b\) is the bias, and σσ is the sigmoid function: We can use this ...Recall that in a sigmoid neuron the inputs \(x_1,x_2,…\) result in the output \(σ(\sum_j{w_jx_j+b})\),where \(w_j\) are the weights, \(b\) is the bias, and σσ is the sigmoid function: We can use this activation function to get a step function, just as we did with the sigmoid. Just as with the sigmoid, this causes the activation function to contract, and ultimately it becomes a very good approximation to a step function.

4.6: Section 6-

https://eng.libretexts.org/Bookshelves/Computer_Science/Applied_Programming/Neural_Networks_and_Deep_Learning_(Nielsen)/04%3A_A_visual_proof_that_neural_nets_can_compute_any_function/4.06%3A_Section_6-

7: Appendix- Is there a simple algorithm for intelligence?

https://eng.libretexts.org/Bookshelves/Computer_Science/Applied_Programming/Neural_Networks_and_Deep_Learning_(Nielsen)/07%3A_Appendix-_Is_there_a_simple_algorithm_for_intelligence

2.2: Warm up- a fast matrix-based approach to computing the output from a neural network

https://eng.libretexts.org/Bookshelves/Computer_Science/Applied_Programming/Neural_Networks_and_Deep_Learning_(Nielsen)/02%3A_How_the_Backpropagation_Algorithm_Works/2.02%3A_Warm_up-_a_fast_matrix-based_approach_to_computing_the_output_from_a_neural_network

Here's a preview of the ways we'll delve more deeply into the equations later in the chapter: I'll give a short proof of the equations, which helps explain why they are true; we'll restate the equatio...Here's a preview of the ways we'll delve more deeply into the equations later in the chapter: I'll give a short proof of the equations, which helps explain why they are true; we'll restate the equations in algorithmic form as pseudocode, and see how the pseudocode can be implemented as real, running Python code; and, in the final section of the chapter, we'll develop an intuitive picture of what the backpropagation equations mean, and how someone might discover them from scratch.

1.6: Learning with gradient descent

https://eng.libretexts.org/Bookshelves/Computer_Science/Applied_Programming/Neural_Networks_and_Deep_Learning_(Nielsen)/01%3A_Using_neural_nets_to_recognize_handwritten_digits/1.06%3A_Learning_with_gradient_descent

Here, \(w\) denotes the collection of all weights in the network, \(b\) all the biases, nn is the total number of training inputs, aa is the vector of outputs from the network when \(x\) is input, and...Here, \(w\) denotes the collection of all weights in the network, \(b\) all the biases, nn is the total number of training inputs, aa is the vector of outputs from the network when \(x\) is input, and the sum is over all training inputs, \(x\).

6.2: Convolutional neural networks in practice

https://eng.libretexts.org/Bookshelves/Computer_Science/Applied_Programming/Neural_Networks_and_Deep_Learning_(Nielsen)/06%3A_Deep_Learning/6.02%3A_Convolutional_neural_networks_in_practice

In particular: (1) Using convolutional layers greatly reduces the number of parameters in those layers, making the learning problem much easier; (2) Using more powerful regularization techniques (nota...In particular: (1) Using convolutional layers greatly reduces the number of parameters in those layers, making the learning problem much easier; (2) Using more powerful regularization techniques (notably dropout and convolutional layers) to reduce overfitting, which is otherwise more of a problem in more complex networks; (3) Using rectified linear units instead of sigmoid neurons, to speed up training - empirically, often by a factor of \(3-5\); (4) Using GPUs and being willing to train for a …

Acknowledgements

https://eng.libretexts.org/Bookshelves/Computer_Science/Applied_Programming/Neural_Networks_and_Deep_Learning_(Nielsen)/00%3A_Front_Matter/Acknowledgements

The book grew out of a set of notes I prepared for an online study group on neural networks and deep learning. Many thanks to all the participants in that study group: Paul Bloore, Chris Dawson, Andre...The book grew out of a set of notes I prepared for an online study group on neural networks and deep learning. Many thanks to all the participants in that study group: Paul Bloore, Chris Dawson, Andrew Doherty, Ilya Grigorik, Alex Kosorukoff, Chris Olah, and Rob Spekkens. I am particularly grateful to Rob, for providing so many insightful questions and ideas, and to Chris, who has continued to share his rapidly expanding knowledge of neural networks.

5.1: The vanishing gradient problem

https://eng.libretexts.org/Bookshelves/Computer_Science/Applied_Programming/Neural_Networks_and_Deep_Learning_(Nielsen)/05%3A_Why_are_deep_neural_networks_hard_to_train/5.01%3A_The_vanishing_gradient_problem

*Back in Chapter 2 we referred to this as the error, but here we'll adopt the informal term "gradient". I say "informal" because of course this doesn't explicitly include the partial derivatives of th...*Back in Chapter 2 we referred to this as the error, but here we'll adopt the informal term "gradient". I say "informal" because of course this doesn't explicitly include the partial derivatives of the cost with respect to the weights, \(∂C/∂w\).. We can think of the gradient \(δ^1\) as a vector whose entries determine how quickly the first hidden layer learns, and \(δ^2\) as a vector whose entries determine how quickly the second hidden layer learns.

1.1: Introduction

https://eng.libretexts.org/Bookshelves/Computer_Science/Applied_Programming/Neural_Networks_and_Deep_Learning_(Nielsen)/01%3A_Using_neural_nets_to_recognize_handwritten_digits/1.01%3A_Introduction

Of course, if the point of the chapter was only to write a computer program to recognize handwritten digits, then the chapter would be much shorter! But along the way we'll develop many key ideas abou...Of course, if the point of the chapter was only to write a computer program to recognize handwritten digits, then the chapter would be much shorter! But along the way we'll develop many key ideas about neural networks, including two important types of artificial neuron (the perception and the sigmoid neuron), and the standard learning algorithm for neural networks, known as stochastic gradient descent.

1.5: A simple network to classify handwritten digits

https://eng.libretexts.org/Bookshelves/Computer_Science/Applied_Programming/Neural_Networks_and_Deep_Learning_(Nielsen)/01%3A_Using_neural_nets_to_recognize_handwritten_digits/1.05%3A_A_simple_network_to_classify_handwritten_digits

In a similar way, let's suppose for the sake of argument that the second, third, and fourth neurons in the hidden layer detect whether or not the following images are present: Of course, that's not th...In a similar way, let's suppose for the sake of argument that the second, third, and fourth neurons in the hidden layer detect whether or not the following images are present: Of course, that's not the only sort of evidence we can use to conclude that the image was a \(0\).we could legitimately get a \(0\) in many other ways (say, through translations of the above images, or slight distortions).

Search

Text Color

Text Size

Margin Size

Font Type

Support Center

How can we help?