Suppose we have scaled the inputs for our one parameter linear regression problem and ...

x = -0.05       # our input variable
y = -0.05       # our output variable
w =  1.00       # the actual parameter
w_hat = 0.90    # our current estimate of the parameter
learning_rate = 0.1

a) If y_hat = w_hat * x, what is the value of the mean squared error (mse) loss function for this example?

y_hat
= w_hat * x
= 0.9 * -0.05
= -0.045

... so ...

mse
= (y - y_hat)**2
= (-0.05 - (-0.045))**2
= (-0.005)**2
= 0.000025

b) What is the gradient of the mean squared error loss with respect to the weight estimate w_hat?

Don't forget to use the chain rule:

gradient = (partial derivative of loss with respect to activation)
         * (partial derivative of activation with respect to product)
         * (partial derivative of product with respect to weight)

gradient(loss, w_hat)
= gradient(loss, activation) * gradient(activation, product) * gradient(product, weight)
= (2 * (y_hat - y)) * 1 * x
= (2 * (-0.045 - (-0.05))) * 1 * (-0.05)
= (2 * 0.005) * 1 * (-0.05)
= 0.01 * 1 * (-0.05)
= - 0.0005

c) What is the updated estimate of w_hat?

Don't forget that we are using gradient descent; i.e. new_weight = old_weight - learning_rate * gradient.

new_weight
= old_weight - learning_rate * gradient(loss, w_hat)
= 0.9 - 0.1 * (-0.0005)
= 0.9 + 0.00005
= 0.90005

d) What is the value of the mean squared error loss function for this example, after updating the weight?

y_hat
= w_hat * x
= 0.90005 * -0.05
= -0.0450025

... so ...

mse
= (y - y_hat)**2
= (-0.05 - (-0.0450025))**2
= (-0.0049975)**2
= 0.00002497500625

e) Has "learning" reduced the loss function?

Yes, because mean squared error has been reduced from ...
0.00002500000000 to ...
0.00002497500625; reducing error by ...
0.00000002499375 [which is a 0.099975% reduction in error (almost a tenth of 1%)]