Gradients#

class ivy.data_classes.container.gradients._ContainerWithGradients(dict_in=None, queues=None, queue_load_sizes=None, container_combine_method='list_join', queue_timeout=None, print_limit=10, key_length_limit=None, print_indent=4, print_line_spacing=0, ivyh=None, default_key_color='green', keyword_color_dict=None, rebuild_child_containers=False, types_to_iteratively_nest=None, alphabetical_keys=True, dynamic_backend=None, build_callable=False, **kwargs)[source]#

Bases: ContainerBase

_abc_impl = <_abc._abc_data object>#
static _static_stop_gradient(x, /, *, preserve_type=True, key_chains=None, to_apply=True, prune_unapplied=False, map_sequences=False, out=None)[source]#

ivy.Container static method variant of ivy.stop_gradient. This method simply wraps the function, and so the docstring for ivy.stop_gradient also applies to this method with minimal changes.

Parameters:
  • x (Union[Container, Array, NativeArray]) – Array or Container for which to stop the gradient.

  • key_chains (Optional[Union[List[str], Dict[str, str], Container]], default: None) – The key-chains to apply or not apply the method to. Default is None.

  • to_apply (Union[bool, Container], default: True) – If True, the method will be applied to key_chains, otherwise key_chains will be skipped. Default is True.

  • prune_unapplied (Union[bool, Container], default: False) – Whether to prune key_chains for which the function was not applied. Default is False.

  • map_sequences (Union[bool, Container], default: False) – Whether to also map method to sequences (lists, tuples). Default is False.

  • preserve_type (Union[bool, Container], default: True) – Whether to preserve gradient computation on ivy.Array instances. Default is True.

  • out (Optional[Container], default: None) – optional output array, for writing the result to. It must have a shape that the inputs broadcast to.

Return type:

Container

Returns:

ret – The same array x, but with no gradient information.

Examples

With one ivy.Container inputs:

>>> x = ivy.Container(a=ivy.array([0., 1., 2.]),
...                      b=ivy.array([3., 4., 5.]))
>>> y = ivy.Container.static_stop_gradient(x, preserve_type=False)
>>> print(y)
{
    a: ivy.array([0., 1., 2.]),
    b: ivy.array([3., 4., 5.])
}

With multiple ivy.Container inputs:

>>> x = ivy.Container(a=ivy.array([0., 1., 2.]),
...                      b=ivy.array([3., 4., 5.]))
>>> ivy.Container.static_stop_gradient(x, preserve_type=True, out=x)
>>> print(x)
{
    a: ivy.array([0., 1., 2.]),
    b: ivy.array([3., 4., 5.])
}
adam_step(mw, vw, step, /, *, beta1=0.9, beta2=0.999, epsilon=1e-07, out=None)[source]#

ivy.Container instance method variant of ivy.adam_step. This method simply wraps the function, and so the docstring for ivy.adam_step also applies to this method with minimal changes.

Parameters:
  • self (Container) – Derivates of the cost c with respect to the weights ws, [dc/dw for w in ws].

  • mw (Union[Array, NativeArray, Container]) – running average of the gradients.

  • vw (Union[Array, NativeArray, Container]) – running average of second moments of the gradients.

  • step (Union[int, float, Container]) – training step.

  • beta1 (Union[float, Container], default: 0.9) – gradient forgetting factor (Default value = 0.9).

  • beta2 (Union[float, Container], default: 0.999) – second moment of gradient forgetting factor (Default value = 0.999).

  • epsilon (Union[float, Container], default: 1e-07) – divisor during adam update, preventing division by zero (Default value = 1e-7).

  • out (Optional[Container], default: None) – optional output container, for writing the result to. It must have a shape that the inputs broadcast to.

Return type:

Container

Returns:

ret – The adam step delta.

Examples

With one ivy.Container input:

>>> dcdw = ivy.Container(a=ivy.array([0., 1., 2.]),
...                         b=ivy.array([3., 4., 5.]))
>>> mw = ivy.array([1., 4., 9.])
>>> vw = ivy.array([0.,])
>>> step = ivy.array([3.4])
>>> beta1 = 0.87
>>> beta2 = 0.976
>>> epsilon = 1e-5
>>> adam_step_delta = dcdw.adam_step(mw, vw, step, beta1=beta1, beta2=beta2,
...                                     epsilon=epsilon)
>>> print(adam_step_delta)
({
    a: ivy.array([6.49e+04, 1.74e+01, 1.95e+01]),
    b: ivy.array([2.02, 4.82, 8.17])
}, {
    a: ivy.array([0.87, 3.61, 8.09]),
    b: ivy.array([1.26, 4., 8.48])
}, {
    a: ivy.array([0., 0.024, 0.096]),
    b: ivy.array([0.216, 0.384, 0.6])
})

With multiple ivy.Container inputs:

>>> dcdw = ivy.Container(a=ivy.array([0., 1., 2.]),
...                        b=ivy.array([3., 4., 5.]))
>>> mw = ivy.Container(a=ivy.array([0., 0., 0.]),
...                    b=ivy.array([0., 0., 0.]))
>>> vw = ivy.Container(a=ivy.array([0.,]),
...                    b=ivy.array([0.,]))
>>> step = ivy.array([3.4])
>>> beta1 = 0.87
>>> beta2 = 0.976
>>> epsilon = 1e-5
>>> adam_step_delta = dcdw.adam_step(mw, vw, step, beta1=beta1, beta2=beta2,
...                                     epsilon=epsilon)
>>> print(adam_step_delta)
({
    a: ivy.array([0., 0.626, 0.626]),
    b: ivy.array([0.626, 0.626, 0.626])
}, {
    a: ivy.array([0., 0.13, 0.26]),
    b: ivy.array([0.39, 0.52, 0.65])
}, {
    a: ivy.array([0., 0.024, 0.096]),
    b: ivy.array([0.216, 0.384, 0.6])
})
adam_update(dcdw, lr, mw_tm1, vw_tm1, step, /, *, beta1=0.9, beta2=0.999, epsilon=1e-07, stop_gradients=True, out=None)[source]#

Update weights ws of some function, given the derivatives of some cost c with respect to ws, using ADAM update. `[reference]

<https://en.wikipedia.org/wiki/Stochastic_gradient_descent#Adam>`_

Parameters:
  • self (Container) – Weights of the function to be updated.

  • dcdw (Union[Array, NativeArray, Container]) – Derivates of the cost c with respect to the weights ws, [dc/dw for w in ws].

  • lr (Union[float, Array, NativeArray, Container]) – Learning rate(s), the rate(s) at which the weights should be updated relative to the gradient.

  • mw_tm1 (Union[Array, NativeArray, Container]) – running average of the gradients, from the previous time-step.

  • vw_tm1 (Union[Array, NativeArray, Container]) – running average of second moments of the gradients, from the previous time-step.

  • step (Union[int, Container]) – training step.

  • beta1 (Union[float, Container], default: 0.9) – gradient forgetting factor (Default value = 0.9).

  • beta2 (Union[float, Container], default: 0.999) – second moment of gradient forgetting factor (Default value = 0.999).

  • epsilon (Union[float, Container], default: 1e-07) – divisor during adam update, preventing division by zero (Default value = 1e-7).

  • stop_gradients (Union[bool, Container], default: True) – Whether to stop the gradients of the variables after each gradient step. Default is True.

  • out (Optional[Container], default: None) – optional output container, for writing the result to. It must have a shape that the inputs broadcast to.

Return type:

Container

Returns:

ret – The new function weights ws_new, and also new mw and vw, following the adam updates.

Examples

With one ivy.Container inputs:

>>> w = ivy.Container(a=ivy.array([1., 2., 3.]), b=ivy.array([4., 5., 6.]))
>>> dcdw = ivy.array([1., 0.2, 0.4])
>>> mw_tm1 = ivy.array([0., 0., 0.])
>>> vw_tm1 = ivy.array([0.])
>>> lr = ivy.array(0.01)
>>> step = 2
>>> updated_weights = w.adam_update(dcdw, mw_tm1, vw_tm1, lr, step)
>>> print(updated_weights)
({
    a: ivy.array([1., 2., 3.]),
    b: ivy.array([4., 5., 6.])
}, ivy.array([0.1 , 0.02, 0.04]), ivy.array([0.01099, 0.01003, 0.01015]))

With multiple ivy.Container inputs:

>>> x = ivy.Container(a=ivy.array([0., 1., 2.]),
...                   b=ivy.array([3., 4., 5.]))
>>> dcdw = ivy.Container(a=ivy.array([0.1,0.3,0.3]),
...                      b=ivy.array([0.3,0.2,0.2]))
>>> lr = ivy.array(0.001)
>>> mw_tm1 = ivy.Container(a=ivy.array([0.,0.,0.]),
...                        b=ivy.array([0.,0.,0.]))
>>> vw_tm1 = ivy.Container(a=ivy.array([0.,]),
...                        b=ivy.array([0.,]))
>>> step = 3
>>> beta1 = 0.9
>>> beta2 = 0.999
>>> epsilon = 1e-7
>>> stop_gradients = False
>>> updated_weights = w.adam_update(dcdw, lr, mw_tm1, vw_tm1, step, beta1=beta1,
...                                beta2=beta2, epsilon=epsilon,
...                                stop_gradients=stop_gradients)
>>> print(updated_weights)
({
    a: ivy.array([0.99936122, 1.99936116, 2.99936128]),
    b: ivy.array([3.99936128, 4.99936104, 5.99936104])
}, {
    a: ivy.array([0.01, 0.03, 0.03]),
    b: ivy.array([0.03, 0.02, 0.02])
}, {
    a: ivy.array([1.00000016e-05, 9.00000086e-05, 9.00000086e-05]),
    b: ivy.array([9.00000086e-05, 4.00000063e-05, 4.00000063e-05])
})
gradient_descent_update(dcdw, lr, /, *, stop_gradients=True, out=None)[source]#

ivy.Container instance method variant of ivy.gradient_descent_update. This method simply wraps the function, and so the docstring for ivy.gradient_descent_update also applies to this method with minimal changes.

Parameters:
  • self (Container) – Weights of the function to be updated.

  • dcdw (Union[Array, NativeArray, Container]) – Derivates of the cost c with respect to the weights ws, [dc/dw for w in ws].

  • lr (Union[float, Array, NativeArray, Container]) – Learning rate(s), the rate(s) at which the weights should be updated relative to the gradient.

  • key_chains – The key-chains to apply or not apply the method to. Default is None.

  • to_apply – If True, the method will be applied to key_chains, otherwise key_chains will be skipped. Default is True.

  • prune_unapplied – Whether to prune key_chains for which the function was not applied. Default is False.

  • map_sequences – Whether to also map method to sequences (lists, tuples). Default is False.

  • stop_gradients (Union[bool, Container], default: True) – Whether to stop the gradients of the variables after each gradient step. Default is True.

  • out (Optional[Container], default: None) – optional output container, for writing the result to. It must have a shape that the inputs broadcast to.

Return type:

Container

Returns:

ret – The new weights, following the gradient descent updates.

Examples

With one ivy.Container inputs:

>>> w = ivy.Container(a=ivy.array([1., 2., 3.]),
...                      b=ivy.array([3.48, 5.72, 1.98]))
>>> dcdw = ivy.array([0.5, 0.2, 0.1])
>>> lr = ivy.array(0.3)
>>> w_new = w.gradient_descent_update(dcdw, lr)
>>> print(w_new)
{
    a: ivy.array([0.85, 1.94, 2.97]),
    b: ivy.array([3.33, 5.66, 1.95])
}

With multiple ivy.Container inputs:

>>> w = ivy.Container(a=ivy.array([1., 2., 3.]),
...                      b=ivy.array([3.48, 5.72, 1.98]))
>>> dcdw = ivy.Container(a=ivy.array([0.5, 0.2, 0.1]),
...                         b=ivy.array([2., 3.42, 1.69]))
>>> lr = ivy.array(0.3)
>>> w_new = w.gradient_descent_update(dcdw, lr)
>>> print(w_new)
{
    a: ivy.array([0.85, 1.94, 2.97]),
    b: ivy.array([2.88, 4.69, 1.47])
}
lamb_update(dcdw, lr, mw_tm1, vw_tm1, step, /, *, beta1=0.9, beta2=0.999, epsilon=1e-07, max_trust_ratio=10, decay_lambda=0, stop_gradients=True, out=None)[source]#

Update weights ws of some function, given the derivatives of some cost c with respect to ws, [dc/dw for w in ws], by applying LAMB method.

Parameters:
  • self (Container) – Weights of the function to be updated.

  • dcdw (Union[Array, NativeArray, Container]) – Derivates of the cost c with respect to the weights ws, [dc/dw for w in ws].

  • lr (Union[float, Array, NativeArray, Container]) – Learning rate(s), the rate(s) at which the weights should be updated relative to the gradient.

  • mw_tm1 (Union[Array, NativeArray, Container]) – running average of the gradients, from the previous time-step.

  • vw_tm1 (Union[Array, NativeArray, Container]) – running average of second moments of the gradients, from the previous time-step.

  • step (Union[int, Container]) – training step.

  • beta1 (Union[float, Container], default: 0.9) – gradient forgetting factor (Default value = 0.9).

  • beta2 (Union[float, Container], default: 0.999) – second moment of gradient forgetting factor (Default value = 0.999).

  • epsilon (Union[float, Container], default: 1e-07) – divisor during adam update, preventing division by zero (Default value = 1e-7).

  • max_trust_ratio (Union[int, float, Container], default: 10) – The maximum value for the trust ratio. Default is 10.

  • decay_lambda (Union[float, Container], default: 0) – The factor used for weight decay. Default is zero.

  • stop_gradients (Union[bool, Container], default: True) – Whether to stop the gradients of the variables after each gradient step. Default is True.

  • out (Optional[Container], default: None) – optional output array, for writing the result to. It must have a shape that the inputs broadcast to.

Return type:

Container

Returns:

ret – The new function weights ws_new, following the LAMB updates.

Examples

With one ivy.Container inputs:

>>> w = ivy.Container(a=ivy.array([1., 2., 3.]), b=ivy.array([4., 5., 6.]))
>>> dcdw = ivy.array([3., 4., 5.])
>>> mw_tm1 = ivy.array([0., 0., 0.])
>>> vw_tm1 = ivy.array([0.])
>>> lr = ivy.array(1.)
>>> step = ivy.array([2])
>>> new_weights = w.lamb_update(dcdw, mw_tm1, vw_tm1, lr, step)
>>> print(new_weights)
({
    a: ivy.array([1., 2., 3.]),
    b: ivy.array([4., 5., 6.])
}, ivy.array([0.3, 0.4, 0.5]), ivy.array([1.01, 1.01, 1.02]))

With multiple ivy.Container inputs:

>>> w = ivy.Container(a=ivy.array([1.,3.,5.]),
...                      b=ivy.array([3.,4.,2.]))
>>> dcdw = ivy.Container(a=ivy.array([0.2,0.3,0.6]),
...                         b=ivy.array([0.6,0.4,0.7]))
>>> mw_tm1 = ivy.Container(a=ivy.array([0.,0.,0.]),
...                           b=ivy.array([0.,0.,0.]))
>>> vw_tm1 = ivy.Container(a=ivy.array([0.,]),
...                           b=ivy.array([0.,]))
>>> step = ivy.array([3.4])
>>> beta1 = 0.9
>>> beta2 = 0.999
>>> epsilon = 1e-7
>>> max_trust_ratio = 10
>>> decay_lambda = 0
>>> stop_gradients = True
>>> lr = ivy.array(0.5)
>>> new_weights = w.lamb_update(dcdw, lr, mw_tm1, vw_tm1, step, beta1=beta1,
...                                beta2=beta2, epsilon=epsilon,
...                                max_trust_ratio=max_trust_ratio,
...                                decay_lambda=decay_lambda,
...                                stop_gradients=stop_gradients)
>>> print(new_weights)
({
    a: ivy.array([-0.708, 1.29, 3.29]),
    b: ivy.array([1.45, 2.45, 0.445])
}, {
    a: ivy.array([0.02, 0.03, 0.06]),
    b: ivy.array([0.06, 0.04, 0.07])
}, {
    a: ivy.array([4.0e-05, 9.0e-05, 3.6e-04]),
    b: ivy.array([0.00036, 0.00016, 0.00049])
})
lars_update(dcdw, lr, /, *, decay_lambda=0, stop_gradients=True, out=None)[source]#

Update weights ws of some function, given the derivatives of some cost c with respect to ws, [dc/dw for w in ws], by applying Layerwise Adaptive Rate Scaling (LARS) method.

Parameters:
  • self (Container) – Weights of the function to be updated.

  • dcdw (Union[Array, NativeArray, Container]) – Derivates of the cost c with respect to the weights ws, [dc/dw for w in ws].

  • lr (Union[float, Array, NativeArray, Container]) – Learning rate, the rate at which the weights should be updated relative to the gradient.

  • decay_lambda (Union[float, Container], default: 0) – The factor used for weight decay. Default is zero.

  • stop_gradients (Union[bool, Container], default: True) – Whether to stop the gradients of the variables after each gradient step. Default is True.

  • out (Optional[Container], default: None) – optional output container, for writing the result to. It must have a shape that the inputs broadcast to.

Returns:

ret – The new function weights ws_new, following the LARS updates.

Examples

With one ivy.Container inputs:

>>> w = ivy.Container(a=ivy.array([3.2, 2.6, 1.3]),
...                    b=ivy.array([1.4, 3.1, 5.1]))
>>> dcdw = ivy.array([0.2, 0.4, 0.1])
>>> lr = ivy.array(0.1)
>>> new_weights = w.lars_update(dcdw, lr)
>>> print(new_weights)
{
    a: ivy.array([3.01132035, 2.22264051, 1.2056601]),
    b: ivy.array([1.1324538, 2.56490755, 4.96622658])
}

With multiple ivy.Container inputs:

>>> w = ivy.Container(a=ivy.array([3.2, 2.6, 1.3]),
...                    b=ivy.array([1.4, 3.1, 5.1]))
>>> dcdw = ivy.Container(a=ivy.array([0.2, 0.4, 0.1]),
...                       b=ivy.array([0.3,0.1,0.2]))
>>> lr = ivy.array(0.1)
>>> new_weights = w.lars_update(dcdw, lr)
>>> print(new_weights)
{
    a: ivy.array([3.01132035, 2.22264051, 1.2056601]),
    b: ivy.array([0.90848625, 2.93616199, 4.77232409])
}
optimizer_update(effective_grad, lr, /, *, stop_gradients=True, out=None)[source]#

Update weights ws of some function, given the true or effective derivatives of some cost c with respect to ws, [dc/dw for w in ws].

Parameters:
  • self (Container) – Weights of the function to be updated.

  • effective_grad (Union[Array, NativeArray, Container]) – Effective gradients of the cost c with respect to the weights ws, [dc/dw for w in ws].

  • lr (Union[float, Array, NativeArray, Container]) – Learning rate(s), the rate(s) at which the weights should be updated relative to the gradient.

  • stop_gradients (Union[bool, Container], default: True) – Whether to stop the gradients of the variables after each gradient step. Default is True.

  • out (Optional[Container], default: None) – optional output container, for writing the result to. It must have a shape that the inputs broadcast to.

Return type:

Container

Returns:

ret – The new function weights ws_new, following the optimizer updates.

Examples

With one ivy.Container input:

>>> w = ivy.Container(a=ivy.array([0., 1., 2.]),
...                    b=ivy.array([3., 4., 5.]))
>>> effective_grad = ivy.array([0., 0., 0.])
>>> lr = 3e-4
>>> ws_new = w.optimizer_update(effective_grad, lr)
>>> print(ws_new)
{
    a: ivy.array([0., 1., 2.]),
    b: ivy.array([3., 4., 5.])
}

With multiple ivy.Container inputs:

>>> w = ivy.Container(a=ivy.array([0., 1., 2.]),
...                      b=ivy.array([3., 4., 5.]))
>>> effective_grad = ivy.Container(a=ivy.array([0., 0., 0.]),
...                                   b=ivy.array([0., 0., 0.]))
>>> lr = 3e-4
>>> ws_new = w.optimizer_update(effective_grad, lr, out=w)
>>> print(w)
{
    a: ivy.array([0., 1., 2.]),
    b: ivy.array([3., 4., 5.])
}
>>> w = ivy.Container(a=ivy.array([0., 1., 2.]),
...                    b=ivy.array([3., 4., 5.]))
>>> effective_grad = ivy.Container(a=ivy.array([0., 0., 0.]),
...                                b=ivy.array([0., 0., 0.]))
>>> lr = ivy.array([3e-4])
>>> ws_new = w.optimizer_update(effective_grad, lr, stop_gradients=False)
>>> print(ws_new)
{
    a: ivy.array([0., 1., 2.]),
    b: ivy.array([3., 4., 5.])
}
stop_gradient(*, key_chains=None, to_apply=True, prune_unapplied=False, map_sequences=False, preserve_type=True, out=None)[source]#

ivy.Container instance method variant of ivy.stop_gradient. This method simply wraps the function, and so the docstring for ivy.stop_gradient also applies to this method with minimal changes.

Parameters:
  • self (Container) – Container for which to stop the gradient.

  • key_chains (Optional[Union[List[str], Dict[str, str], Container]], default: None) – The key-chains to apply or not apply the method to. Default is None.

  • to_apply (Union[bool, Container], default: True) – If True, the method will be applied to key_chains, otherwise key_chains will be skipped. Default is True.

  • prune_unapplied (Union[bool, Container], default: False) – Whether to prune key_chains for which the function was not applied. Default is False.

  • map_sequences (Union[bool, Container], default: False) – Whether to also map method to sequences (lists, tuples). Default is False.

  • preserve_type (Union[bool, Container], default: True) – Whether to preserve gradient computation on ivy.Array instances. Default is True.

  • out (Optional[Container], default: None) – optional output array, for writing the result to. It must have a shape that the inputs broadcast to.

Return type:

Container

Returns:

ret – The same array x, but with no gradient information.

Examples

With one ivy.Container inputs:

>>> x = ivy.Container(a=ivy.array([0., 1., 2.]),
...                      b=ivy.array([3., 4., 5.]))
>>> y = x.stop_gradient(preserve_type=False)
>>> print(y)
{
    a: ivy.array([0., 1., 2.]),
    b: ivy.array([3., 4., 5.])
}

With multiple ivy.Container inputs:

>>> x = ivy.Container(a=ivy.array([0., 1., 2.]),
...                      b=ivy.array([3., 4., 5.]))
>>> x.stop_gradient(preserve_type=True, out=x)
>>> print(x)
{
    a: ivy.array([0., 1., 2.]),
    b: ivy.array([3., 4., 5.])
}

This should have hopefully given you an overview of the gradients submodule, if you have any questions, please feel free to reach out on our discord!