The reason, of course, is understanding. At the heart of backpropagation is an expression for the partial derivative $$∂C/∂w$$ of the cost function $$C$$ with respect to any weight $$w$$ (or bias $$b$$) in the network. The expression tells us how quickly the cost changes when we change the weights and biases. And while the expression is somewhat complex, it also has a beauty to it, with each element having a natural, intuitive interpretation. And so backpropagation isn't just a fast algorithm for learning. It actually gives us detailed insights into how changing the weights and biases changes the overall behaviour of the network. That's well worth studying in detail.