Gradients#
- class ivy.data_classes.container.gradients._ContainerWithGradients(dict_in=None, queues=None, queue_load_sizes=None, container_combine_method='list_join', queue_timeout=None, print_limit=10, key_length_limit=None, print_indent=4, print_line_spacing=0, ivyh=None, default_key_color='green', keyword_color_dict=None, rebuild_child_containers=False, types_to_iteratively_nest=None, alphabetical_keys=True, dynamic_backend=None, build_callable=False, **kwargs)[source]#
Bases:
ContainerBase
- _abc_impl = <_abc._abc_data object>#
- static _static_stop_gradient(x, /, *, preserve_type=True, key_chains=None, to_apply=True, prune_unapplied=False, map_sequences=False, out=None)[source]#
ivy.Container static method variant of ivy.stop_gradient. This method simply wraps the function, and so the docstring for ivy.stop_gradient also applies to this method with minimal changes.
- Parameters:
x (
Union
[Container
,Array
,NativeArray
]) – Array or Container for which to stop the gradient.key_chains (
Optional
[Union
[List
[str
],Dict
[str
,str
],Container
]], default:None
) – The key-chains to apply or not apply the method to. Default isNone
.to_apply (
Union
[bool
,Container
], default:True
) – If True, the method will be applied to key_chains, otherwise key_chains will be skipped. Default isTrue
.prune_unapplied (
Union
[bool
,Container
], default:False
) – Whether to prune key_chains for which the function was not applied. Default isFalse
.map_sequences (
Union
[bool
,Container
], default:False
) – Whether to also map method to sequences (lists, tuples). Default isFalse
.preserve_type (
Union
[bool
,Container
], default:True
) – Whether to preserve gradient computation on ivy.Array instances. Default is True.out (
Optional
[Container
], default:None
) – optional output array, for writing the result to. It must have a shape that the inputs broadcast to.
- Return type:
Container
- Returns:
ret – The same array x, but with no gradient information.
Examples
With one
ivy.Container
inputs:>>> x = ivy.Container(a=ivy.array([0., 1., 2.]), ... b=ivy.array([3., 4., 5.])) >>> y = ivy.Container.static_stop_gradient(x, preserve_type=False) >>> print(y) { a: ivy.array([0., 1., 2.]), b: ivy.array([3., 4., 5.]) }
With multiple
ivy.Container
inputs:>>> x = ivy.Container(a=ivy.array([0., 1., 2.]), ... b=ivy.array([3., 4., 5.])) >>> ivy.Container.static_stop_gradient(x, preserve_type=True, out=x) >>> print(x) { a: ivy.array([0., 1., 2.]), b: ivy.array([3., 4., 5.]) }
- adam_step(mw, vw, step, /, *, beta1=0.9, beta2=0.999, epsilon=1e-07, out=None)[source]#
ivy.Container instance method variant of ivy.adam_step. This method simply wraps the function, and so the docstring for ivy.adam_step also applies to this method with minimal changes.
- Parameters:
self (
Container
) – Derivates of the cost c with respect to the weights ws, [dc/dw for w in ws].mw (
Union
[Array
,NativeArray
,Container
]) – running average of the gradients.vw (
Union
[Array
,NativeArray
,Container
]) – running average of second moments of the gradients.step (
Union
[int
,float
,Container
]) – training step.beta1 (
Union
[float
,Container
], default:0.9
) – gradient forgetting factor (Default value = 0.9).beta2 (
Union
[float
,Container
], default:0.999
) – second moment of gradient forgetting factor (Default value = 0.999).epsilon (
Union
[float
,Container
], default:1e-07
) – divisor during adam update, preventing division by zero (Default value = 1e-7).out (
Optional
[Container
], default:None
) – optional output container, for writing the result to. It must have a shape that the inputs broadcast to.
- Return type:
Container
- Returns:
ret – The adam step delta.
Examples
With one
ivy.Container
input:>>> dcdw = ivy.Container(a=ivy.array([0., 1., 2.]), ... b=ivy.array([3., 4., 5.])) >>> mw = ivy.array([1., 4., 9.]) >>> vw = ivy.array([0.,]) >>> step = ivy.array([3.4]) >>> beta1 = 0.87 >>> beta2 = 0.976 >>> epsilon = 1e-5 >>> adam_step_delta = dcdw.adam_step(mw, vw, step, beta1=beta1, beta2=beta2, ... epsilon=epsilon) >>> print(adam_step_delta) ({ a: ivy.array([6.49e+04, 1.74e+01, 1.95e+01]), b: ivy.array([2.02, 4.82, 8.17]) }, { a: ivy.array([0.87, 3.61, 8.09]), b: ivy.array([1.26, 4., 8.48]) }, { a: ivy.array([0., 0.024, 0.096]), b: ivy.array([0.216, 0.384, 0.6]) })
With multiple
ivy.Container
inputs:>>> dcdw = ivy.Container(a=ivy.array([0., 1., 2.]), ... b=ivy.array([3., 4., 5.])) >>> mw = ivy.Container(a=ivy.array([0., 0., 0.]), ... b=ivy.array([0., 0., 0.])) >>> vw = ivy.Container(a=ivy.array([0.,]), ... b=ivy.array([0.,])) >>> step = ivy.array([3.4]) >>> beta1 = 0.87 >>> beta2 = 0.976 >>> epsilon = 1e-5 >>> adam_step_delta = dcdw.adam_step(mw, vw, step, beta1=beta1, beta2=beta2, ... epsilon=epsilon) >>> print(adam_step_delta) ({ a: ivy.array([0., 0.626, 0.626]), b: ivy.array([0.626, 0.626, 0.626]) }, { a: ivy.array([0., 0.13, 0.26]), b: ivy.array([0.39, 0.52, 0.65]) }, { a: ivy.array([0., 0.024, 0.096]), b: ivy.array([0.216, 0.384, 0.6]) })
- adam_update(dcdw, lr, mw_tm1, vw_tm1, step, /, *, beta1=0.9, beta2=0.999, epsilon=1e-07, stop_gradients=True, out=None)[source]#
Update weights ws of some function, given the derivatives of some cost c with respect to ws, using ADAM update. `[reference]
<https://en.wikipedia.org/wiki/Stochastic_gradient_descent#Adam>`_
- Parameters:
self (
Container
) – Weights of the function to be updated.dcdw (
Union
[Array
,NativeArray
,Container
]) – Derivates of the cost c with respect to the weights ws, [dc/dw for w in ws].lr (
Union
[float
,Array
,NativeArray
,Container
]) – Learning rate(s), the rate(s) at which the weights should be updated relative to the gradient.mw_tm1 (
Union
[Array
,NativeArray
,Container
]) – running average of the gradients, from the previous time-step.vw_tm1 (
Union
[Array
,NativeArray
,Container
]) – running average of second moments of the gradients, from the previous time-step.step (
Union
[int
,Container
]) – training step.beta1 (
Union
[float
,Container
], default:0.9
) – gradient forgetting factor (Default value = 0.9).beta2 (
Union
[float
,Container
], default:0.999
) – second moment of gradient forgetting factor (Default value = 0.999).epsilon (
Union
[float
,Container
], default:1e-07
) – divisor during adam update, preventing division by zero (Default value = 1e-7).stop_gradients (
Union
[bool
,Container
], default:True
) – Whether to stop the gradients of the variables after each gradient step. Default isTrue
.out (
Optional
[Container
], default:None
) – optional output container, for writing the result to. It must have a shape that the inputs broadcast to.
- Return type:
Container
- Returns:
ret – The new function weights ws_new, and also new mw and vw, following the adam updates.
Examples
With one
ivy.Container
inputs:>>> w = ivy.Container(a=ivy.array([1., 2., 3.]), b=ivy.array([4., 5., 6.])) >>> dcdw = ivy.array([1., 0.2, 0.4]) >>> mw_tm1 = ivy.array([0., 0., 0.]) >>> vw_tm1 = ivy.array([0.]) >>> lr = ivy.array(0.01) >>> step = 2 >>> updated_weights = w.adam_update(dcdw, mw_tm1, vw_tm1, lr, step) >>> print(updated_weights) ({ a: ivy.array([1., 2., 3.]), b: ivy.array([4., 5., 6.]) }, ivy.array([0.1 , 0.02, 0.04]), ivy.array([0.01099, 0.01003, 0.01015]))
With multiple
ivy.Container
inputs:>>> x = ivy.Container(a=ivy.array([0., 1., 2.]), ... b=ivy.array([3., 4., 5.])) >>> dcdw = ivy.Container(a=ivy.array([0.1,0.3,0.3]), ... b=ivy.array([0.3,0.2,0.2])) >>> lr = ivy.array(0.001) >>> mw_tm1 = ivy.Container(a=ivy.array([0.,0.,0.]), ... b=ivy.array([0.,0.,0.])) >>> vw_tm1 = ivy.Container(a=ivy.array([0.,]), ... b=ivy.array([0.,])) >>> step = 3 >>> beta1 = 0.9 >>> beta2 = 0.999 >>> epsilon = 1e-7 >>> stop_gradients = False >>> updated_weights = w.adam_update(dcdw, lr, mw_tm1, vw_tm1, step, beta1=beta1, ... beta2=beta2, epsilon=epsilon, ... stop_gradients=stop_gradients) >>> print(updated_weights) ({ a: ivy.array([0.99936122, 1.99936116, 2.99936128]), b: ivy.array([3.99936128, 4.99936104, 5.99936104]) }, { a: ivy.array([0.01, 0.03, 0.03]), b: ivy.array([0.03, 0.02, 0.02]) }, { a: ivy.array([1.00000016e-05, 9.00000086e-05, 9.00000086e-05]), b: ivy.array([9.00000086e-05, 4.00000063e-05, 4.00000063e-05]) })
- gradient_descent_update(dcdw, lr, /, *, stop_gradients=True, out=None)[source]#
ivy.Container instance method variant of ivy.gradient_descent_update. This method simply wraps the function, and so the docstring for ivy.gradient_descent_update also applies to this method with minimal changes.
- Parameters:
self (
Container
) – Weights of the function to be updated.dcdw (
Union
[Array
,NativeArray
,Container
]) – Derivates of the cost c with respect to the weights ws, [dc/dw for w in ws].lr (
Union
[float
,Array
,NativeArray
,Container
]) – Learning rate(s), the rate(s) at which the weights should be updated relative to the gradient.key_chains – The key-chains to apply or not apply the method to. Default is
None
.to_apply – If True, the method will be applied to key_chains, otherwise key_chains will be skipped. Default is
True
.prune_unapplied – Whether to prune key_chains for which the function was not applied. Default is
False
.map_sequences – Whether to also map method to sequences (lists, tuples). Default is
False
.stop_gradients (
Union
[bool
,Container
], default:True
) – Whether to stop the gradients of the variables after each gradient step. Default isTrue
.out (
Optional
[Container
], default:None
) – optional output container, for writing the result to. It must have a shape that the inputs broadcast to.
- Return type:
Container
- Returns:
ret – The new weights, following the gradient descent updates.
Examples
With one
ivy.Container
inputs:>>> w = ivy.Container(a=ivy.array([1., 2., 3.]), ... b=ivy.array([3.48, 5.72, 1.98])) >>> dcdw = ivy.array([0.5, 0.2, 0.1]) >>> lr = ivy.array(0.3) >>> w_new = w.gradient_descent_update(dcdw, lr) >>> print(w_new) { a: ivy.array([0.85, 1.94, 2.97]), b: ivy.array([3.33, 5.66, 1.95]) }
With multiple
ivy.Container
inputs:>>> w = ivy.Container(a=ivy.array([1., 2., 3.]), ... b=ivy.array([3.48, 5.72, 1.98])) >>> dcdw = ivy.Container(a=ivy.array([0.5, 0.2, 0.1]), ... b=ivy.array([2., 3.42, 1.69])) >>> lr = ivy.array(0.3) >>> w_new = w.gradient_descent_update(dcdw, lr) >>> print(w_new) { a: ivy.array([0.85, 1.94, 2.97]), b: ivy.array([2.88, 4.69, 1.47]) }
- lamb_update(dcdw, lr, mw_tm1, vw_tm1, step, /, *, beta1=0.9, beta2=0.999, epsilon=1e-07, max_trust_ratio=10, decay_lambda=0, stop_gradients=True, out=None)[source]#
Update weights ws of some function, given the derivatives of some cost c with respect to ws, [dc/dw for w in ws], by applying LAMB method.
- Parameters:
self (
Container
) – Weights of the function to be updated.dcdw (
Union
[Array
,NativeArray
,Container
]) – Derivates of the cost c with respect to the weights ws, [dc/dw for w in ws].lr (
Union
[float
,Array
,NativeArray
,Container
]) – Learning rate(s), the rate(s) at which the weights should be updated relative to the gradient.mw_tm1 (
Union
[Array
,NativeArray
,Container
]) – running average of the gradients, from the previous time-step.vw_tm1 (
Union
[Array
,NativeArray
,Container
]) – running average of second moments of the gradients, from the previous time-step.step (
Union
[int
,Container
]) – training step.beta1 (
Union
[float
,Container
], default:0.9
) – gradient forgetting factor (Default value = 0.9).beta2 (
Union
[float
,Container
], default:0.999
) – second moment of gradient forgetting factor (Default value = 0.999).epsilon (
Union
[float
,Container
], default:1e-07
) – divisor during adam update, preventing division by zero (Default value = 1e-7).max_trust_ratio (
Union
[int
,float
,Container
], default:10
) – The maximum value for the trust ratio. Default is 10.decay_lambda (
Union
[float
,Container
], default:0
) – The factor used for weight decay. Default is zero.stop_gradients (
Union
[bool
,Container
], default:True
) – Whether to stop the gradients of the variables after each gradient step. Default isTrue
.out (
Optional
[Container
], default:None
) – optional output array, for writing the result to. It must have a shape that the inputs broadcast to.
- Return type:
Container
- Returns:
ret – The new function weights ws_new, following the LAMB updates.
Examples
With one
ivy.Container
inputs:>>> w = ivy.Container(a=ivy.array([1., 2., 3.]), b=ivy.array([4., 5., 6.])) >>> dcdw = ivy.array([3., 4., 5.]) >>> mw_tm1 = ivy.array([0., 0., 0.]) >>> vw_tm1 = ivy.array([0.]) >>> lr = ivy.array(1.) >>> step = ivy.array([2]) >>> new_weights = w.lamb_update(dcdw, mw_tm1, vw_tm1, lr, step) >>> print(new_weights) ({ a: ivy.array([1., 2., 3.]), b: ivy.array([4., 5., 6.]) }, ivy.array([0.3, 0.4, 0.5]), ivy.array([1.01, 1.01, 1.02]))
With multiple
ivy.Container
inputs:>>> w = ivy.Container(a=ivy.array([1.,3.,5.]), ... b=ivy.array([3.,4.,2.])) >>> dcdw = ivy.Container(a=ivy.array([0.2,0.3,0.6]), ... b=ivy.array([0.6,0.4,0.7])) >>> mw_tm1 = ivy.Container(a=ivy.array([0.,0.,0.]), ... b=ivy.array([0.,0.,0.]))
>>> vw_tm1 = ivy.Container(a=ivy.array([0.,]), ... b=ivy.array([0.,])) >>> step = ivy.array([3.4]) >>> beta1 = 0.9 >>> beta2 = 0.999 >>> epsilon = 1e-7 >>> max_trust_ratio = 10 >>> decay_lambda = 0 >>> stop_gradients = True >>> lr = ivy.array(0.5) >>> new_weights = w.lamb_update(dcdw, lr, mw_tm1, vw_tm1, step, beta1=beta1, ... beta2=beta2, epsilon=epsilon, ... max_trust_ratio=max_trust_ratio, ... decay_lambda=decay_lambda, ... stop_gradients=stop_gradients) >>> print(new_weights) ({ a: ivy.array([-0.708, 1.29, 3.29]), b: ivy.array([1.45, 2.45, 0.445]) }, { a: ivy.array([0.02, 0.03, 0.06]), b: ivy.array([0.06, 0.04, 0.07]) }, { a: ivy.array([4.0e-05, 9.0e-05, 3.6e-04]), b: ivy.array([0.00036, 0.00016, 0.00049]) })
- lars_update(dcdw, lr, /, *, decay_lambda=0, stop_gradients=True, out=None)[source]#
Update weights ws of some function, given the derivatives of some cost c with respect to ws, [dc/dw for w in ws], by applying Layerwise Adaptive Rate Scaling (LARS) method.
- Parameters:
self (
Container
) – Weights of the function to be updated.dcdw (
Union
[Array
,NativeArray
,Container
]) – Derivates of the cost c with respect to the weights ws, [dc/dw for w in ws].lr (
Union
[float
,Array
,NativeArray
,Container
]) – Learning rate, the rate at which the weights should be updated relative to the gradient.decay_lambda (
Union
[float
,Container
], default:0
) – The factor used for weight decay. Default is zero.stop_gradients (
Union
[bool
,Container
], default:True
) – Whether to stop the gradients of the variables after each gradient step. Default isTrue
.out (
Optional
[Container
], default:None
) – optional output container, for writing the result to. It must have a shape that the inputs broadcast to.
- Returns:
ret – The new function weights ws_new, following the LARS updates.
Examples
With one
ivy.Container
inputs:>>> w = ivy.Container(a=ivy.array([3.2, 2.6, 1.3]), ... b=ivy.array([1.4, 3.1, 5.1])) >>> dcdw = ivy.array([0.2, 0.4, 0.1]) >>> lr = ivy.array(0.1) >>> new_weights = w.lars_update(dcdw, lr) >>> print(new_weights) { a: ivy.array([3.01132035, 2.22264051, 1.2056601]), b: ivy.array([1.1324538, 2.56490755, 4.96622658]) }
With multiple
ivy.Container
inputs:>>> w = ivy.Container(a=ivy.array([3.2, 2.6, 1.3]), ... b=ivy.array([1.4, 3.1, 5.1])) >>> dcdw = ivy.Container(a=ivy.array([0.2, 0.4, 0.1]), ... b=ivy.array([0.3,0.1,0.2])) >>> lr = ivy.array(0.1) >>> new_weights = w.lars_update(dcdw, lr) >>> print(new_weights) { a: ivy.array([3.01132035, 2.22264051, 1.2056601]), b: ivy.array([0.90848625, 2.93616199, 4.77232409]) }
- optimizer_update(effective_grad, lr, /, *, stop_gradients=True, out=None)[source]#
Update weights ws of some function, given the true or effective derivatives of some cost c with respect to ws, [dc/dw for w in ws].
- Parameters:
self (
Container
) – Weights of the function to be updated.effective_grad (
Union
[Array
,NativeArray
,Container
]) – Effective gradients of the cost c with respect to the weights ws, [dc/dw for w in ws].lr (
Union
[float
,Array
,NativeArray
,Container
]) – Learning rate(s), the rate(s) at which the weights should be updated relative to the gradient.stop_gradients (
Union
[bool
,Container
], default:True
) – Whether to stop the gradients of the variables after each gradient step. Default isTrue
.out (
Optional
[Container
], default:None
) – optional output container, for writing the result to. It must have a shape that the inputs broadcast to.
- Return type:
Container
- Returns:
ret – The new function weights ws_new, following the optimizer updates.
Examples
With one
ivy.Container
input:>>> w = ivy.Container(a=ivy.array([0., 1., 2.]), ... b=ivy.array([3., 4., 5.])) >>> effective_grad = ivy.array([0., 0., 0.]) >>> lr = 3e-4 >>> ws_new = w.optimizer_update(effective_grad, lr) >>> print(ws_new) { a: ivy.array([0., 1., 2.]), b: ivy.array([3., 4., 5.]) }
With multiple
ivy.Container
inputs:>>> w = ivy.Container(a=ivy.array([0., 1., 2.]), ... b=ivy.array([3., 4., 5.])) >>> effective_grad = ivy.Container(a=ivy.array([0., 0., 0.]), ... b=ivy.array([0., 0., 0.])) >>> lr = 3e-4 >>> ws_new = w.optimizer_update(effective_grad, lr, out=w) >>> print(w) { a: ivy.array([0., 1., 2.]), b: ivy.array([3., 4., 5.]) }
>>> w = ivy.Container(a=ivy.array([0., 1., 2.]), ... b=ivy.array([3., 4., 5.])) >>> effective_grad = ivy.Container(a=ivy.array([0., 0., 0.]), ... b=ivy.array([0., 0., 0.])) >>> lr = ivy.array([3e-4]) >>> ws_new = w.optimizer_update(effective_grad, lr, stop_gradients=False) >>> print(ws_new) { a: ivy.array([0., 1., 2.]), b: ivy.array([3., 4., 5.]) }
- stop_gradient(*, key_chains=None, to_apply=True, prune_unapplied=False, map_sequences=False, preserve_type=True, out=None)[source]#
ivy.Container instance method variant of ivy.stop_gradient. This method simply wraps the function, and so the docstring for ivy.stop_gradient also applies to this method with minimal changes.
- Parameters:
self (
Container
) – Container for which to stop the gradient.key_chains (
Optional
[Union
[List
[str
],Dict
[str
,str
],Container
]], default:None
) – The key-chains to apply or not apply the method to. Default isNone
.to_apply (
Union
[bool
,Container
], default:True
) – If True, the method will be applied to key_chains, otherwise key_chains will be skipped. Default isTrue
.prune_unapplied (
Union
[bool
,Container
], default:False
) – Whether to prune key_chains for which the function was not applied. Default isFalse
.map_sequences (
Union
[bool
,Container
], default:False
) – Whether to also map method to sequences (lists, tuples). Default isFalse
.preserve_type (
Union
[bool
,Container
], default:True
) – Whether to preserve gradient computation on ivy.Array instances. Default is True.out (
Optional
[Container
], default:None
) – optional output array, for writing the result to. It must have a shape that the inputs broadcast to.
- Return type:
Container
- Returns:
ret – The same array x, but with no gradient information.
Examples
With one
ivy.Container
inputs:>>> x = ivy.Container(a=ivy.array([0., 1., 2.]), ... b=ivy.array([3., 4., 5.])) >>> y = x.stop_gradient(preserve_type=False) >>> print(y) { a: ivy.array([0., 1., 2.]), b: ivy.array([3., 4., 5.]) }
With multiple
ivy.Container
inputs:>>> x = ivy.Container(a=ivy.array([0., 1., 2.]), ... b=ivy.array([3., 4., 5.])) >>> x.stop_gradient(preserve_type=True, out=x) >>> print(x) { a: ivy.array([0., 1., 2.]), b: ivy.array([3., 4., 5.]) }
This should have hopefully given you an overview of the gradients submodule, if you have any questions, please feel free to reach out on our discord!