adam_update#

ivy.adam_update(w, dcdw, lr, mw_tm1, vw_tm1, step, /, *, beta1=0.9, beta2=0.999, epsilon=1e-07, stop_gradients=True, out=None)[source]#

Update weights ws of some function, given the derivatives of some cost c with respect to ws, using ADAM update. `[reference]

<https://en.wikipedia.org/wiki/Stochastic_gradient_descent#Adam>`_

Parameters:

w (Union[Array, NativeArray]) – Weights of the function to be updated.
dcdw (Union[Array, NativeArray]) – Derivates of the cost c with respect to the weights ws, [dc/dw for w in ws].
lr (Union[float, Array, NativeArray]) – Learning rate(s), the rate(s) at which the weights should be updated relative to the gradient.
mw_tm1 (Union[Array, NativeArray]) – running average of the gradients, from the previous time-step.
vw_tm1 (Union[Array, NativeArray]) – running average of second moments of the gradients, from the previous time-step.
step (int) – training step.
beta1 (float, default: 0.9) – gradient forgetting factor (Default value = 0.9).
beta2 (float, default: 0.999) – second moment of gradient forgetting factor (Default value = 0.999).
epsilon (float, default: 1e-07) – divisor during adam update, preventing division by zero (Default value = 1e-7).
stop_gradients (bool, default: True) – Whether to stop the gradients of the variables after each gradient step. Default is True.
out (Optional[Array], default: None) – optional output array, for writing the new function weights ws_new to. It must have a shape that the inputs broadcast to.

Return type:

Tuple[Array, Array, Array]

Returns:

ret – The new function weights ws_new, and also new mw and vw, following the adam updates.

Examples

With ivy.Array inputs:

>>> w = ivy.array([1., 2, 3])
>>> dcdw = ivy.array([0.5,0.2,0.1])
>>> lr = ivy.array(0.1)
>>> vw_tm1 = ivy.zeros(1)
>>> mw_tm1 = ivy.zeros(3)
>>> step = 1
>>> updated_weights = ivy.adam_update(w, dcdw, lr, mw_tm1, vw_tm1, step)
>>> print(updated_weights)
(ivy.array([0.90000075, 1.90000164, 2.9000032 ]),
ivy.array([0.05, 0.02, 0.01]),
ivy.array([2.50000012e-04, 4.00000063e-05, 1.00000016e-05]))

>>> w = ivy.array([[1., 2, 3],[4, 2, 4],[6, 4, 2]])
>>> dcdw = ivy.array([[0.1, 0.2, 0.3],[0.4, 0.5, 0.1],[0.1, 0.5, 0.3]])
>>> lr = ivy.array(0.1)
>>> mw_tm1 = ivy.zeros((3,3))
>>> vw_tm1 = ivy.zeros(3)
>>> step = 2
>>> beta1 = 0.9
>>> beta2 = 0.999
>>> epsilon = 1e-7
>>> out = ivy.zeros_like(w)
>>> stop_gradients = True
>>> updated_weights = ivy.adam_update(w, dcdw, lr, mw_tm1, vw_tm1, step,
...                               beta1=beta1, beta2=beta2,
...                               epsilon=epsilon, out=out,
...                               stop_gradients=stop_gradients)
>>> print(updated_weights)
(
ivy.array([[0.92558873, 1.92558754, 2.92558718],
           [3.92558694, 1.92558682, 3.92558861],
           [5.92558861, 3.92558694, 1.92558718]]),
ivy.array([[0.01, 0.02, 0.03],
           [0.04, 0.05, 0.01],
           [0.01, 0.05, 0.03]]),
ivy.array([[1.00000016e-05, 4.00000063e-05, 9.00000086e-05],
           [1.60000025e-04, 2.50000012e-04, 1.00000016e-05],
           [1.00000016e-05, 2.50000012e-04, 9.00000086e-05]])
)

With one ivy.Container input:

>>> w = ivy.Container(a=ivy.array([1., 2., 3.]), b=ivy.array([4., 5., 6.]))
>>> dcdw = ivy.array([0.5, 0.2, 0.4])
>>> mw_tm1 = ivy.array([0., 0., 0.])
>>> vw_tm1 = ivy.array([0.])
>>> lr = ivy.array(0.01)
>>> step = 2
>>> updated_weights = ivy.adam_update(w, dcdw, mw_tm1, vw_tm1, lr, step)
>>> print(updated_weights)
({
    a: ivy.array([1., 2., 3.]),
    b: ivy.array([4., 5., 6.])
}, ivy.array([0.05, 0.02, 0.04]), ivy.array([0.01024, 0.01003, 0.01015]))

With multiple ivy.Container inputs:

>>> x = ivy.Container(a=ivy.array([0., 1., 2.]),
...                   b=ivy.array([3., 4., 5.]))
>>> dcdw = ivy.Container(a=ivy.array([0.1,0.3,0.3]),
...                      b=ivy.array([0.3,0.2,0.2]))
>>> mw_tm1 = ivy.Container(a=ivy.array([0.,0.,0.]),
...                        b=ivy.array([0.,0.,0.]))
>>> vw_tm1 = ivy.Container(a=ivy.array([0.,]),
...                        b=ivy.array([0.,]))
>>> step = 3
>>> beta1 = 0.9
>>> beta2 = 0.999
>>> epsilon = 1e-7
>>> stop_gradients = False
>>> lr = ivy.array(0.001)
>>> updated_weights = ivy.adam_update(w, dcdw, lr, mw_tm1, vw_tm1, step,
...                               beta1=beta1,
...                               beta2=beta2, epsilon=epsilon,
...                               stop_gradients=stop_gradients)
>>> print(updated_weights)
({
    a: ivy.array([0.99936122, 1.99936116, 2.99936128]),
    b: ivy.array([3.99936128, 4.99936104, 5.99936104])
}, {
    a: ivy.array([0.01, 0.03, 0.03]),
    b: ivy.array([0.03, 0.02, 0.02])
}, {
    a: ivy.array([1.00000016e-05, 9.00000086e-05, 9.00000086e-05]),
    b: ivy.array([9.00000086e-05, 4.00000063e-05, 4.00000063e-05])
})

Array.adam_update(self, dcdw, lr, mw_tm1, vw_tm1, step, /, *, beta1=0.9, beta2=0.999, epsilon=1e-07, stop_gradients=True, out=None)[source]#

ivy.Array instance method variant of ivy.adam_update. This method simply wraps the function, and so the docstring for ivy.adam_update also applies to this method with minimal changes.

Parameters:

self (Array) – Weights of the function to be updated.
dcdw (Union[Array, NativeArray]) – Derivates of the cost c with respect to the weights ws, [dc/dw for w in ws].
lr (Union[float, Array, NativeArray]) – Learning rate(s), the rate(s) at which the weights should be updated relative to the gradient.
mw_tm1 (Union[Array, NativeArray]) – running average of the gradients, from the previous time-step.
vw_tm1 (Union[Array, NativeArray]) – running average of second moments of the gradients, from the previous time-step.
step (int) – training step.
beta1 (float, default: 0.9) – gradient forgetting factor (Default value = 0.9).
beta2 (float, default: 0.999) – second moment of gradient forgetting factor (Default value = 0.999).
epsilon (float, default: 1e-07) – divisor during adam update, preventing division by zero (Default value = 1e-7).
stop_gradients (bool, default: True) – Whether to stop the gradients of the variables after each gradient step. Default is True.
out (Optional[Array], default: None) – optional output array, for writing the result to. It must have a shape that the inputs broadcast to.

Return type:

Array

Returns:

ret – The new function weights ws_new, and also new mw and vw, following the adam updates.

Examples

With ivy.Array inputs:

>>> w = ivy.array([1., 2, 3.])
>>> dcdw = ivy.array([0.2,0.1,0.3])
>>> lr = ivy.array(0.1)
>>> vw_tm1 = ivy.zeros(1)
>>> mw_tm1 = ivy.zeros(3)
>>> step = 2
>>> updated_weights = w.adam_update(dcdw, lr, mw_tm1, vw_tm1, step)
>>> print(updated_weights)
(ivy.array([0.92558753, 1.92558873, 2.92558718]),
ivy.array([0.02, 0.01, 0.03]),
ivy.array([4.00000063e-05, 1.00000016e-05, 9.00000086e-05]))

Container.adam_update(self, dcdw, lr, mw_tm1, vw_tm1, step, /, *, beta1=0.9, beta2=0.999, epsilon=1e-07, stop_gradients=True, out=None)[source]#

Update weights ws of some function, given the derivatives of some cost c with respect to ws, using ADAM update. `[reference]

<https://en.wikipedia.org/wiki/Stochastic_gradient_descent#Adam>`_

Parameters:

self (Container) – Weights of the function to be updated.
dcdw (Union[Array, NativeArray, Container]) – Derivates of the cost c with respect to the weights ws, [dc/dw for w in ws].
lr (Union[float, Array, NativeArray, Container]) – Learning rate(s), the rate(s) at which the weights should be updated relative to the gradient.
mw_tm1 (Union[Array, NativeArray, Container]) – running average of the gradients, from the previous time-step.
vw_tm1 (Union[Array, NativeArray, Container]) – running average of second moments of the gradients, from the previous time-step.
step (Union[int, Container]) – training step.
beta1 (Union[float, Container], default: 0.9) – gradient forgetting factor (Default value = 0.9).
beta2 (Union[float, Container], default: 0.999) – second moment of gradient forgetting factor (Default value = 0.999).
epsilon (Union[float, Container], default: 1e-07) – divisor during adam update, preventing division by zero (Default value = 1e-7).
stop_gradients (Union[bool, Container], default: True) – Whether to stop the gradients of the variables after each gradient step. Default is True.
out (Optional[Container], default: None) – optional output container, for writing the result to. It must have a shape that the inputs broadcast to.

Return type:

Container

Returns:

ret – The new function weights ws_new, and also new mw and vw, following the adam updates.

Examples

With one ivy.Container inputs:

>>> w = ivy.Container(a=ivy.array([1., 2., 3.]), b=ivy.array([4., 5., 6.]))
>>> dcdw = ivy.array([1., 0.2, 0.4])
>>> mw_tm1 = ivy.array([0., 0., 0.])
>>> vw_tm1 = ivy.array([0.])
>>> lr = ivy.array(0.01)
>>> step = 2
>>> updated_weights = w.adam_update(dcdw, mw_tm1, vw_tm1, lr, step)
>>> print(updated_weights)
({
    a: ivy.array([1., 2., 3.]),
    b: ivy.array([4., 5., 6.])
}, ivy.array([0.1 , 0.02, 0.04]), ivy.array([0.01099, 0.01003, 0.01015]))

With multiple ivy.Container inputs:

>>> x = ivy.Container(a=ivy.array([0., 1., 2.]),
...                   b=ivy.array([3., 4., 5.]))
>>> dcdw = ivy.Container(a=ivy.array([0.1,0.3,0.3]),
...                      b=ivy.array([0.3,0.2,0.2]))
>>> lr = ivy.array(0.001)
>>> mw_tm1 = ivy.Container(a=ivy.array([0.,0.,0.]),
...                        b=ivy.array([0.,0.,0.]))
>>> vw_tm1 = ivy.Container(a=ivy.array([0.,]),
...                        b=ivy.array([0.,]))
>>> step = 3
>>> beta1 = 0.9
>>> beta2 = 0.999
>>> epsilon = 1e-7
>>> stop_gradients = False
>>> updated_weights = w.adam_update(dcdw, lr, mw_tm1, vw_tm1, step, beta1=beta1,
...                                beta2=beta2, epsilon=epsilon,
...                                stop_gradients=stop_gradients)
>>> print(updated_weights)
({
    a: ivy.array([0.99936122, 1.99936116, 2.99936128]),
    b: ivy.array([3.99936128, 4.99936104, 5.99936104])
}, {
    a: ivy.array([0.01, 0.03, 0.03]),
    b: ivy.array([0.03, 0.02, 0.02])
}, {
    a: ivy.array([1.00000016e-05, 9.00000086e-05, 9.00000086e-05]),
    b: ivy.array([9.00000086e-05, 4.00000063e-05, 4.00000063e-05])
})