Search
- Filter Results
- Location
- Classification
- Include attachments
- https://eng.libretexts.org/Bookshelves/Computer_Science/Applied_Programming/Neural_Networks_and_Deep_Learning_(Nielsen)/04%3A_A_visual_proof_that_neural_nets_can_compute_any_function/4.02%3A_Universality_with_one_input_and_one_outputYou'll see that the graph changes shape when this happens, since we have moved from a situation where the top hidden neuron is the first to be activated to a situation where the bottom hidden neuron i...You'll see that the graph changes shape when this happens, since we have moved from a situation where the top hidden neuron is the first to be activated to a situation where the bottom hidden neuron is the first to be activated. In other words, we can directly manipulate the function appearing in the graph on the right, and see that reflected in the \(h\) values on the left.
- https://eng.libretexts.org/Bookshelves/Computer_Science/Applied_Programming/Neural_Networks_and_Deep_Learning_(Nielsen)/04%3A_A_visual_proof_that_neural_nets_can_compute_any_function/4.04%3A_Extension_beyond_sigmoid_neuronsRecall that in a sigmoid neuron the inputs \(x_1,x_2,…\) result in the output \(σ(\sum_j{w_jx_j+b})\),where \(w_j\) are the weights, \(b\) is the bias, and σσ is the sigmoid function: We can use this ...Recall that in a sigmoid neuron the inputs \(x_1,x_2,…\) result in the output \(σ(\sum_j{w_jx_j+b})\),where \(w_j\) are the weights, \(b\) is the bias, and σσ is the sigmoid function: We can use this activation function to get a step function, just as we did with the sigmoid. Just as with the sigmoid, this causes the activation function to contract, and ultimately it becomes a very good approximation to a step function.
- https://eng.libretexts.org/Bookshelves/Computer_Science/Applied_Programming/Neural_Networks_and_Deep_Learning_(Nielsen)/04%3A_A_visual_proof_that_neural_nets_can_compute_any_function/4.06%3A_Section_6-
- https://eng.libretexts.org/Bookshelves/Computer_Science/Applied_Programming/Neural_Networks_and_Deep_Learning_(Nielsen)/07%3A_Appendix-_Is_there_a_simple_algorithm_for_intelligence
- https://eng.libretexts.org/Bookshelves/Computer_Science/Applied_Programming/Neural_Networks_and_Deep_Learning_(Nielsen)/02%3A_How_the_Backpropagation_Algorithm_Works/2.02%3A_Warm_up-_a_fast_matrix-based_approach_to_computing_the_output_from_a_neural_networkHere's a preview of the ways we'll delve more deeply into the equations later in the chapter: I'll give a short proof of the equations, which helps explain why they are true; we'll restate the equatio...Here's a preview of the ways we'll delve more deeply into the equations later in the chapter: I'll give a short proof of the equations, which helps explain why they are true; we'll restate the equations in algorithmic form as pseudocode, and see how the pseudocode can be implemented as real, running Python code; and, in the final section of the chapter, we'll develop an intuitive picture of what the backpropagation equations mean, and how someone might discover them from scratch.
- https://eng.libretexts.org/Bookshelves/Computer_Science/Applied_Programming/Neural_Networks_and_Deep_Learning_(Nielsen)/01%3A_Using_neural_nets_to_recognize_handwritten_digits/1.06%3A_Learning_with_gradient_descentHere, \(w\) denotes the collection of all weights in the network, \(b\) all the biases, nn is the total number of training inputs, aa is the vector of outputs from the network when \(x\) is input, and...Here, \(w\) denotes the collection of all weights in the network, \(b\) all the biases, nn is the total number of training inputs, aa is the vector of outputs from the network when \(x\) is input, and the sum is over all training inputs, \(x\).
- https://eng.libretexts.org/Bookshelves/Computer_Science/Applied_Programming/Neural_Networks_and_Deep_Learning_(Nielsen)/06%3A_Deep_Learning/6.02%3A_Convolutional_neural_networks_in_practiceIn particular: (1) Using convolutional layers greatly reduces the number of parameters in those layers, making the learning problem much easier; (2) Using more powerful regularization techniques (nota...In particular: (1) Using convolutional layers greatly reduces the number of parameters in those layers, making the learning problem much easier; (2) Using more powerful regularization techniques (notably dropout and convolutional layers) to reduce overfitting, which is otherwise more of a problem in more complex networks; (3) Using rectified linear units instead of sigmoid neurons, to speed up training - empirically, often by a factor of \(3-5\); (4) Using GPUs and being willing to train for a …
- https://eng.libretexts.org/Bookshelves/Computer_Science/Applied_Programming/Neural_Networks_and_Deep_Learning_(Nielsen)/00%3A_Front_Matter/AcknowledgementsThe book grew out of a set of notes I prepared for an online study group on neural networks and deep learning. Many thanks to all the participants in that study group: Paul Bloore, Chris Dawson, Andre...The book grew out of a set of notes I prepared for an online study group on neural networks and deep learning. Many thanks to all the participants in that study group: Paul Bloore, Chris Dawson, Andrew Doherty, Ilya Grigorik, Alex Kosorukoff, Chris Olah, and Rob Spekkens. I am particularly grateful to Rob, for providing so many insightful questions and ideas, and to Chris, who has continued to share his rapidly expanding knowledge of neural networks.
- https://eng.libretexts.org/Bookshelves/Computer_Science/Applied_Programming/Neural_Networks_and_Deep_Learning_(Nielsen)/05%3A_Why_are_deep_neural_networks_hard_to_train/5.01%3A_The_vanishing_gradient_problem*Back in Chapter 2 we referred to this as the error, but here we'll adopt the informal term "gradient". I say "informal" because of course this doesn't explicitly include the partial derivatives of th...*Back in Chapter 2 we referred to this as the error, but here we'll adopt the informal term "gradient". I say "informal" because of course this doesn't explicitly include the partial derivatives of the cost with respect to the weights, \(∂C/∂w\).. We can think of the gradient \(δ^1\) as a vector whose entries determine how quickly the first hidden layer learns, and \(δ^2\) as a vector whose entries determine how quickly the second hidden layer learns.
- https://eng.libretexts.org/Bookshelves/Computer_Science/Applied_Programming/Neural_Networks_and_Deep_Learning_(Nielsen)/01%3A_Using_neural_nets_to_recognize_handwritten_digits/1.01%3A_IntroductionOf course, if the point of the chapter was only to write a computer program to recognize handwritten digits, then the chapter would be much shorter! But along the way we'll develop many key ideas abou...Of course, if the point of the chapter was only to write a computer program to recognize handwritten digits, then the chapter would be much shorter! But along the way we'll develop many key ideas about neural networks, including two important types of artificial neuron (the perception and the sigmoid neuron), and the standard learning algorithm for neural networks, known as stochastic gradient descent.
- https://eng.libretexts.org/Bookshelves/Computer_Science/Applied_Programming/Neural_Networks_and_Deep_Learning_(Nielsen)/01%3A_Using_neural_nets_to_recognize_handwritten_digits/1.05%3A_A_simple_network_to_classify_handwritten_digitsIn a similar way, let's suppose for the sake of argument that the second, third, and fourth neurons in the hidden layer detect whether or not the following images are present: Of course, that's not th...In a similar way, let's suppose for the sake of argument that the second, third, and fourth neurons in the hidden layer detect whether or not the following images are present: Of course, that's not the only sort of evidence we can use to conclude that the image was a \(0\).we could legitimately get a \(0\) in many other ways (say, through translations of the above images, or slight distortions).