Gradients#
- class ivy.data_classes.array.gradients._ArrayWithGradients[source]#
Bases:
ABC
- _abc_impl = <_abc._abc_data object>#
- adam_step(mw, vw, step, /, *, beta1=0.9, beta2=0.999, epsilon=1e-07, out=None)[source]#
ivy.Array instance method variant of ivy.adam_step. This method simply wraps the function, and so the docstring for ivy.adam_step also applies to this method with minimal changes.
- Parameters:
self (
Array
) – Derivates of the cost c with respect to the weights ws, [dc/dw for w in ws].mw (
Union
[Array
,NativeArray
]) – running average of the gradients.vw (
Union
[Array
,NativeArray
]) – running average of second moments of the gradients.step (
Union
[int
,float
]) – training step.beta1 (
float
, default:0.9
) – gradient forgetting factor (Default value = 0.9).beta2 (
float
, default:0.999
) – second moment of gradient forgetting factor (Default value = 0.999).epsilon (
float
, default:1e-07
) – divisor during adam update, preventing division by zero (Default value = 1e-7).out (
Optional
[Array
], default:None
) – optional output array, for writing the result to. It must have a shape that the inputs broadcast to.
- Return type:
Array
- Returns:
ret – The adam step delta.
Examples
With
ivy.Array
inputs:>>> dcdw = ivy.array([1, 2, 3]) >>> mw = ivy.ones(3) >>> vw = ivy.ones(1) >>> step = ivy.array(3) >>> adam_step_delta = dcdw.adam_step(mw, vw, step) >>> print(adam_step_delta) (ivy.array([0.2020105,0.22187898,0.24144873]), ivy.array([1.,1.10000002,1.20000005]), ivy.array([1.,1.00300002,1.00800002]))
- adam_update(dcdw, lr, mw_tm1, vw_tm1, step, /, *, beta1=0.9, beta2=0.999, epsilon=1e-07, stop_gradients=True, out=None)[source]#
ivy.Array instance method variant of ivy.adam_update. This method simply wraps the function, and so the docstring for ivy.adam_update also applies to this method with minimal changes.
- Parameters:
self (
Array
) – Weights of the function to be updated.dcdw (
Union
[Array
,NativeArray
]) – Derivates of the cost c with respect to the weights ws, [dc/dw for w in ws].lr (
Union
[float
,Array
,NativeArray
]) – Learning rate(s), the rate(s) at which the weights should be updated relative to the gradient.mw_tm1 (
Union
[Array
,NativeArray
]) – running average of the gradients, from the previous time-step.vw_tm1 (
Union
[Array
,NativeArray
]) – running average of second moments of the gradients, from the previous time-step.step (
int
) – training step.beta1 (
float
, default:0.9
) – gradient forgetting factor (Default value = 0.9).beta2 (
float
, default:0.999
) – second moment of gradient forgetting factor (Default value = 0.999).epsilon (
float
, default:1e-07
) – divisor during adam update, preventing division by zero (Default value = 1e-7).stop_gradients (
bool
, default:True
) – Whether to stop the gradients of the variables after each gradient step. Default isTrue
.out (
Optional
[Array
], default:None
) – optional output array, for writing the result to. It must have a shape that the inputs broadcast to.
- Return type:
Array
- Returns:
ret – The new function weights ws_new, and also new mw and vw, following the adam updates.
Examples
With
ivy.Array
inputs:>>> w = ivy.array([1., 2, 3.]) >>> dcdw = ivy.array([0.2,0.1,0.3]) >>> lr = ivy.array(0.1) >>> vw_tm1 = ivy.zeros(1) >>> mw_tm1 = ivy.zeros(3) >>> step = 2 >>> updated_weights = w.adam_update(dcdw, lr, mw_tm1, vw_tm1, step) >>> print(updated_weights) (ivy.array([0.92558753, 1.92558873, 2.92558718]), ivy.array([0.02, 0.01, 0.03]), ivy.array([4.00000063e-05, 1.00000016e-05, 9.00000086e-05]))
- gradient_descent_update(dcdw, lr, /, *, stop_gradients=True, out=None)[source]#
ivy.Array instance method variant of ivy.gradient_descent_update. This method simply wraps the function, and so the docstring for ivy.gradient_descent_update also applies to this method with minimal changes.
- Parameters:
self (
Array
) – Weights of the function to be updated.dcdw (
Union
[Array
,NativeArray
]) – Derivates of the cost c with respect to the weights ws, [dc/dw for w in ws].lr (
Union
[float
,Array
,NativeArray
]) – Learning rate(s), the rate(s) at which the weights should be updated relative to the gradient.stop_gradients (
bool
, default:True
) – Whether to stop the gradients of the variables after each gradient step. Default isTrue
.out (
Optional
[Array
], default:None
) – optional output array, for writing the result to. It must have a shape that the inputs broadcast to.
- Return type:
Array
- Returns:
ret – The new weights, following the gradient descent updates.
Examples
With
ivy.Array
inputs:>>> w = ivy.array([[1., 2, 3], ... [4, 6, 1], ... [1, 0, 7]]) >>> dcdw = ivy.array([[0.5, 0.2, 0.1], ... [0.3, 0.6, 0.4], ... [0.4, 0.7, 0.2]]) >>> lr = ivy.array(0.1) >>> new_weights = w.gradient_descent_update(dcdw, lr, stop_gradients = True) >>> print(new_weights) ivy.array([[ 0.95, 1.98, 2.99], ... [ 3.97, 5.94, 0.96], ... [ 0.96, -0.07, 6.98]])
- lamb_update(dcdw, lr, mw_tm1, vw_tm1, step, /, *, beta1=0.9, beta2=0.999, epsilon=1e-07, max_trust_ratio=10, decay_lambda=0, stop_gradients=True, out=None)[source]#
ivy.Array instance method variant of ivy.lamb_update. This method simply wraps the function, and so the docstring for ivy.lamb_update also applies to this method with minimal changes.
- Parameters:
self (
Array
) – Weights of the function to be updated.dcdw (
Union
[Array
,NativeArray
]) – Derivates of the cost c with respect to the weights ws, [dc/dw for w in ws].lr (
Union
[float
,Array
,NativeArray
]) – Learning rate(s), the rate(s) at which the weights should be updated relative to the gradient.mw_tm1 (
Union
[Array
,NativeArray
]) – running average of the gradients, from the previous time-step.vw_tm1 (
Union
[Array
,NativeArray
]) – running average of second moments of the gradients, from the previous time-step.step (
int
) – training step.beta1 (
float
, default:0.9
) – gradient forgetting factor (Default value = 0.9).beta2 (
float
, default:0.999
) – second moment of gradient forgetting factor (Default value = 0.999).epsilon (
float
, default:1e-07
) – divisor during adam update, preventing division by zero (Default value = 1e-7).max_trust_ratio (
Union
[int
,float
], default:10
) – The maximum value for the trust ratio. Default is 10.decay_lambda (
float
, default:0
) – The factor used for weight decay. Default is zero.stop_gradients (
bool
, default:True
) – Whether to stop the gradients of the variables after each gradient step. Default isTrue
.out (
Optional
[Array
], default:None
) – optional output array, for writing the result to. It must have a shape that the inputs broadcast to.
- Return type:
Array
- Returns:
ret – The new function weights ws_new, following the LAMB updates.
Examples
With
ivy.Array
inputs:>>> w = ivy.array([1., 2, 3]) >>> dcdw = ivy.array([0.5,0.2,0.1]) >>> lr = ivy.array(0.1) >>> vw_tm1 = ivy.zeros(1) >>> mw_tm1 = ivy.zeros(3) >>> step = ivy.array(1) >>> new_weights = w.lamb_update(dcdw, lr, mw_tm1, vw_tm1, step) >>> print(new_weights) (ivy.array([0.784, 1.78 , 2.78 ]), ivy.array([0.05, 0.02, 0.01]), ivy.array([2.5e-04, 4.0e-05, 1.0e-05]))
- lars_update(dcdw, lr, /, *, decay_lambda=0, stop_gradients=True, out=None)[source]#
ivy.Array instance method variant of ivy.lars_update. This method simply wraps the function, and so the docstring for ivy.lars_update also applies to this method with minimal changes.
- Parameters:
self (
Array
) – Weights of the function to be updated.dcdw (
Union
[Array
,NativeArray
]) – Derivates of the cost c with respect to the weights ws, [dc/dw for w in ws].lr (
Union
[float
,Array
,NativeArray
]) – Learning rate, the rate at which the weights should be updated relative to the gradient.decay_lambda (
float
, default:0
) – The factor used for weight decay. Default is zero.stop_gradients (
bool
, default:True
) – Whether to stop the gradients of the variables after each gradient step. Default isTrue
.out (
Optional
[Array
], default:None
) – optional output array, for writing the result to. It must have a shape that the inputs broadcast to.
- Return type:
Array
- Returns:
ret – The new function weights ws_new, following the LARS updates.
Examples
With
ivy.Array
inputs:>>> w = ivy.array([[3., 1, 5], ... [7, 2, 9]]) >>> dcdw = ivy.array([[0.3, 0.1, 0.2], ... [0.1, 0.2, 0.4]]) >>> lr = ivy.array(0.1) >>> new_weights = w.lars_update(dcdw, lr, stop_gradients = True) >>> print(new_weights) ivy.array([[2.34077978, 0.78025991, 4.56051969], ... [6.78026009, 1.56051981, 8.12103939]])
- optimizer_update(effective_grad, lr, /, *, stop_gradients=True, out=None)[source]#
ivy.Array instance method variant of ivy.optimizer_update. This method simply wraps the function, and so the docstring for ivy.optimizer_update also applies to this method with minimal changes.
- Parameters:
self (
Array
) – Weights of the function to be updated.effective_grad (
Union
[Array
,NativeArray
]) – Effective gradients of the cost c with respect to the weights ws, [dc/dw for w in ws].lr (
Union
[float
,Array
,NativeArray
]) – Learning rate(s), the rate(s) at which the weights should be updated relative to the gradient.stop_gradients (
bool
, default:True
) – Whether to stop the gradients of the variables after each gradient step. Default isTrue
.out (
Optional
[Array
], default:None
) – optional output array, for writing the result to. It must have a shape that the inputs broadcast to.
- Return type:
Array
- Returns:
ret – The new function weights ws_new, following the optimizer updates.
Examples
>>> w = ivy.array([1., 2., 3.]) >>> effective_grad = ivy.zeros(3) >>> lr = 3e-4 >>> ws_new = w.optimizer_update(effective_grad, lr) >>> print(ws_new) ivy.array([1., 2., 3.])
- stop_gradient(*, preserve_type=True, out=None)[source]#
ivy.Array instance method variant of ivy.stop_gradient. This method simply wraps the function, and so the docstring for ivy.stop_gradient also applies to this method with minimal changes.
- Parameters:
self (
Array
) – Array for which to stop the gradient.preserve_type (
bool
, default:True
) – Whether to preserve gradient computation on ivy.Array instances. Default is True.out (
Optional
[Array
], default:None
) – optional output array, for writing the result to. It must have a shape that the inputs broadcast to.
- Return type:
Array
- Returns:
ret – The same array x, but with no gradient information.
Examples
>>> x = ivy.array([1., 2., 3.]) >>> y = x.stop_gradient(preserve_type=True) >>> print(y) ivy.array([1., 2., 3.])
This should have hopefully given you an overview of the gradients submodule, if you have any questions, please feel free to reach out on our discord!