adam_step#

ivy.adam_step(dcdw, mw, vw, step, /, *, beta1=0.9, beta2=0.999, epsilon=1e-07, out=None)[source]#

Compute adam step delta, given the derivatives of some cost c with respect to weights ws, using ADAM update. `[reference]

<https://en.wikipedia.org/wiki/Stochastic_gradient_descent#Adam>`_

Parameters:

dcdw (Union[Array, NativeArray]) – Derivates of the cost c with respect to the weights ws, [dc/dw for w in ws].
mw (Union[Array, NativeArray]) – running average of the gradients
vw (Union[Array, NativeArray]) – running average of second moments of the gradients
step (Union[int, float]) – training step
beta1 (float, default: 0.9) – gradient forgetting factor (Default value = 0.9)
beta2 (float, default: 0.999) – second moment of gradient forgetting factor (Default value = 0.999)
epsilon (float, default: 1e-07) – divisor during adam update, preventing division by zero (Default value = 1e-7)
out (Optional[Array], default: None) – optional output array, for writing the effective grad of adam_step to. It must have a shape that the inputs broadcast to.

Return type:

Tuple[Array, Array, Array]

Returns:

ret – The adam step delta.

Examples

With ivy.Array inputs:

>>> dcdw = ivy.array([1, 2, 3])
>>> mw = ivy.ones(3)
>>> vw = ivy.ones(1)
>>> step = ivy.array(3)
>>> adam_step_delta = ivy.adam_step(dcdw, mw, vw, step)
>>> print(adam_step_delta)
(ivy.array([0.2020105 , 0.22187898, 0.24144873]),
ivy.array([0.99999998, 1.09999998, 1.19999998]),
ivy.array([1.00000001, 1.00300001, 1.00800001]))

>>> dcdw = ivy.array([[1., 4., -3.], [2., 3., 0.5]])
>>> mw = ivy.zeros((2,3))
>>> vw = ivy.zeros(3)
>>> step = ivy.array(1)
>>> beta1 = 0.86
>>> beta2 = 0.95
>>> epsilon = 1e-6
>>> adam_step_delta = ivy.adam_step(dcdw, mw, vw, step, beta1=beta1, beta2=beta2,
...                                 epsilon=epsilon)
>>> print(adam_step_delta)
(ivy.array([[ 1.,  1., -1.],
            [ 1.,  1.,  1.]]),
    ivy.array([[ 0.14,  0.56, -0.42],
               [ 0.28,  0.42,  0.07]]),
 ivy.array([[0.05  , 0.8   , 0.45  ],
            [0.2   , 0.45  , 0.0125]]))

>>> dcdw = ivy.array([0.1, -0.7, 2])
>>> mw = ivy.ones(1)
>>> vw = ivy.ones(1)
>>> step = ivy.array(3.6)
>>> out = ivy.zeros_like(dcdw)
>>> adam_step_delta = ivy.adam_step(dcdw, mw, vw, step, out=out)
>>> print(out)
ivy.array([0.17294501, 0.15770318, 0.20863818])

With one ivy.Container input:

>>> dcdw = ivy.Container(a=ivy.array([0., 1., 2.]),
...                      b=ivy.array([3., 4., 5.]))
>>> mw = ivy.array([1., 4., 9.])
>>> vw = ivy.array([0.,])
>>> step = ivy.array([3.4])
>>> beta1 = 0.87
>>> beta2 = 0.976
>>> epsilon = 1e-5
>>> adam_step_delta = ivy.adam_step(dcdw, mw, vw, step, beta1=beta1, beta2=beta2,
...                                 epsilon=epsilon)
>>> print(adam_step_delta)
({
    a: ivy.array([6.49e+04, 1.74e+01, 1.95e+01]),
    b: ivy.array([2.02, 4.82, 8.17])
}, {
    a: ivy.array([0.87, 3.61, 8.09]),
    b: ivy.array([1.26, 4., 8.48])
}, {
    a: ivy.array([0., 0.024, 0.096]),
    b: ivy.array([0.216, 0.384, 0.6])
})

With multiple ivy.Container inputs:

>>> dcdw = ivy.Container(a=ivy.array([0., 1., 2.]),
...                      b=ivy.array([3., 4., 5.]))
>>> mw = ivy.Container(a=ivy.array([0., 0., 0.]),
...                    b=ivy.array([0., 0., 0.]))
>>> vw = ivy.Container(a=ivy.array([0.,]),
...                    b=ivy.array([0.,]))
>>> step = ivy.array([3.4])
>>> beta1 = 0.87
>>> beta2 = 0.976
>>> epsilon = 1e-5
>>> adam_step_delta = ivy.adam_step(dcdw, mw, vw, step, beta1=beta1, beta2=beta2,
...                                 epsilon=epsilon)
>>> print(adam_step_delta)
({
    a: ivy.array([0., 0.626, 0.626]),
    b: ivy.array([0.626, 0.626, 0.626])
}, {
    a: ivy.array([0., 0.13, 0.26]),
    b: ivy.array([0.39, 0.52, 0.65])
}, {
    a: ivy.array([0., 0.024, 0.096]),
    b: ivy.array([0.216, 0.384, 0.6])
})

Array.adam_step(self, mw, vw, step, /, *, beta1=0.9, beta2=0.999, epsilon=1e-07, out=None)[source]#

ivy.Array instance method variant of ivy.adam_step. This method simply wraps the function, and so the docstring for ivy.adam_step also applies to this method with minimal changes.

Parameters:

self (Array) – Derivates of the cost c with respect to the weights ws, [dc/dw for w in ws].
mw (Union[Array, NativeArray]) – running average of the gradients.
vw (Union[Array, NativeArray]) – running average of second moments of the gradients.
step (Union[int, float]) – training step.
beta1 (float, default: 0.9) – gradient forgetting factor (Default value = 0.9).
beta2 (float, default: 0.999) – second moment of gradient forgetting factor (Default value = 0.999).
epsilon (float, default: 1e-07) – divisor during adam update, preventing division by zero (Default value = 1e-7).
out (Optional[Array], default: None) – optional output array, for writing the result to. It must have a shape that the inputs broadcast to.

Return type:

Array

Returns:

ret – The adam step delta.

Examples

With ivy.Array inputs:

>>> dcdw = ivy.array([1, 2, 3])
>>> mw = ivy.ones(3)
>>> vw = ivy.ones(1)
>>> step = ivy.array(3)
>>> adam_step_delta = dcdw.adam_step(mw, vw, step)
>>> print(adam_step_delta)
(ivy.array([0.2020105,0.22187898,0.24144873]),
    ivy.array([1.,1.10000002,1.20000005]),
    ivy.array([1.,1.00300002,1.00800002]))

Container.adam_step(self, mw, vw, step, /, *, beta1=0.9, beta2=0.999, epsilon=1e-07, out=None)[source]#

ivy.Container instance method variant of ivy.adam_step. This method simply wraps the function, and so the docstring for ivy.adam_step also applies to this method with minimal changes.

Parameters:

self (Container) – Derivates of the cost c with respect to the weights ws, [dc/dw for w in ws].
mw (Union[Array, NativeArray, Container]) – running average of the gradients.
vw (Union[Array, NativeArray, Container]) – running average of second moments of the gradients.
step (Union[int, float, Container]) – training step.
beta1 (Union[float, Container], default: 0.9) – gradient forgetting factor (Default value = 0.9).
beta2 (Union[float, Container], default: 0.999) – second moment of gradient forgetting factor (Default value = 0.999).
epsilon (Union[float, Container], default: 1e-07) – divisor during adam update, preventing division by zero (Default value = 1e-7).
out (Optional[Container], default: None) – optional output container, for writing the result to. It must have a shape that the inputs broadcast to.

Return type:

Container

Returns:

ret – The adam step delta.

Examples

With one ivy.Container input:

>>> dcdw = ivy.Container(a=ivy.array([0., 1., 2.]),
...                         b=ivy.array([3., 4., 5.]))
>>> mw = ivy.array([1., 4., 9.])
>>> vw = ivy.array([0.,])
>>> step = ivy.array([3.4])
>>> beta1 = 0.87
>>> beta2 = 0.976
>>> epsilon = 1e-5
>>> adam_step_delta = dcdw.adam_step(mw, vw, step, beta1=beta1, beta2=beta2,
...                                     epsilon=epsilon)
>>> print(adam_step_delta)
({
    a: ivy.array([6.49e+04, 1.74e+01, 1.95e+01]),
    b: ivy.array([2.02, 4.82, 8.17])
}, {
    a: ivy.array([0.87, 3.61, 8.09]),
    b: ivy.array([1.26, 4., 8.48])
}, {
    a: ivy.array([0., 0.024, 0.096]),
    b: ivy.array([0.216, 0.384, 0.6])
})

With multiple ivy.Container inputs:

>>> dcdw = ivy.Container(a=ivy.array([0., 1., 2.]),
...                        b=ivy.array([3., 4., 5.]))
>>> mw = ivy.Container(a=ivy.array([0., 0., 0.]),
...                    b=ivy.array([0., 0., 0.]))
>>> vw = ivy.Container(a=ivy.array([0.,]),
...                    b=ivy.array([0.,]))
>>> step = ivy.array([3.4])
>>> beta1 = 0.87
>>> beta2 = 0.976
>>> epsilon = 1e-5
>>> adam_step_delta = dcdw.adam_step(mw, vw, step, beta1=beta1, beta2=beta2,
...                                     epsilon=epsilon)
>>> print(adam_step_delta)
({
    a: ivy.array([0., 0.626, 0.626]),
    b: ivy.array([0.626, 0.626, 0.626])
}, {
    a: ivy.array([0., 0.13, 0.26]),
    b: ivy.array([0.39, 0.52, 0.65])
}, {
    a: ivy.array([0., 0.024, 0.096]),
    b: ivy.array([0.216, 0.384, 0.6])
})