adam_step#
- ivy.adam_step(dcdw, mw, vw, step, /, *, beta1=0.9, beta2=0.999, epsilon=1e-07, out=None)[source]#
Compute adam step delta, given the derivatives of some cost c with respect to weights ws, using ADAM update. `[reference]
<https://en.wikipedia.org/wiki/Stochastic_gradient_descent#Adam>`_
- Parameters:
dcdw (
Union
[Array
,NativeArray
]) – Derivates of the cost c with respect to the weights ws, [dc/dw for w in ws].mw (
Union
[Array
,NativeArray
]) – running average of the gradientsvw (
Union
[Array
,NativeArray
]) – running average of second moments of the gradientsstep (
Union
[int
,float
]) – training stepbeta1 (
float
, default:0.9
) – gradient forgetting factor (Default value = 0.9)beta2 (
float
, default:0.999
) – second moment of gradient forgetting factor (Default value = 0.999)epsilon (
float
, default:1e-07
) – divisor during adam update, preventing division by zero (Default value = 1e-7)out (
Optional
[Array
], default:None
) – optional output array, for writing the effective grad of adam_step to. It must have a shape that the inputs broadcast to.
- Return type:
- Returns:
ret – The adam step delta.
Examples
With
ivy.Array
inputs:>>> dcdw = ivy.array([1, 2, 3]) >>> mw = ivy.ones(3) >>> vw = ivy.ones(1) >>> step = ivy.array(3) >>> adam_step_delta = ivy.adam_step(dcdw, mw, vw, step) >>> print(adam_step_delta) (ivy.array([0.2020105 , 0.22187898, 0.24144873]), ivy.array([0.99999998, 1.09999998, 1.19999998]), ivy.array([1.00000001, 1.00300001, 1.00800001]))
>>> dcdw = ivy.array([[1., 4., -3.], [2., 3., 0.5]]) >>> mw = ivy.zeros((2,3)) >>> vw = ivy.zeros(3) >>> step = ivy.array(1) >>> beta1 = 0.86 >>> beta2 = 0.95 >>> epsilon = 1e-6 >>> adam_step_delta = ivy.adam_step(dcdw, mw, vw, step, beta1=beta1, beta2=beta2, ... epsilon=epsilon) >>> print(adam_step_delta) (ivy.array([[ 1., 1., -1.], [ 1., 1., 1.]]), ivy.array([[ 0.14, 0.56, -0.42], [ 0.28, 0.42, 0.07]]), ivy.array([[0.05 , 0.8 , 0.45 ], [0.2 , 0.45 , 0.0125]]))
>>> dcdw = ivy.array([0.1, -0.7, 2]) >>> mw = ivy.ones(1) >>> vw = ivy.ones(1) >>> step = ivy.array(3.6) >>> out = ivy.zeros_like(dcdw) >>> adam_step_delta = ivy.adam_step(dcdw, mw, vw, step, out=out) >>> print(out) ivy.array([0.17294501, 0.15770318, 0.20863818])
With one
ivy.Container
input:>>> dcdw = ivy.Container(a=ivy.array([0., 1., 2.]), ... b=ivy.array([3., 4., 5.])) >>> mw = ivy.array([1., 4., 9.]) >>> vw = ivy.array([0.,]) >>> step = ivy.array([3.4]) >>> beta1 = 0.87 >>> beta2 = 0.976 >>> epsilon = 1e-5 >>> adam_step_delta = ivy.adam_step(dcdw, mw, vw, step, beta1=beta1, beta2=beta2, ... epsilon=epsilon) >>> print(adam_step_delta) ({ a: ivy.array([6.49e+04, 1.74e+01, 1.95e+01]), b: ivy.array([2.02, 4.82, 8.17]) }, { a: ivy.array([0.87, 3.61, 8.09]), b: ivy.array([1.26, 4., 8.48]) }, { a: ivy.array([0., 0.024, 0.096]), b: ivy.array([0.216, 0.384, 0.6]) })
With multiple
ivy.Container
inputs:>>> dcdw = ivy.Container(a=ivy.array([0., 1., 2.]), ... b=ivy.array([3., 4., 5.])) >>> mw = ivy.Container(a=ivy.array([0., 0., 0.]), ... b=ivy.array([0., 0., 0.])) >>> vw = ivy.Container(a=ivy.array([0.,]), ... b=ivy.array([0.,])) >>> step = ivy.array([3.4]) >>> beta1 = 0.87 >>> beta2 = 0.976 >>> epsilon = 1e-5 >>> adam_step_delta = ivy.adam_step(dcdw, mw, vw, step, beta1=beta1, beta2=beta2, ... epsilon=epsilon) >>> print(adam_step_delta) ({ a: ivy.array([0., 0.626, 0.626]), b: ivy.array([0.626, 0.626, 0.626]) }, { a: ivy.array([0., 0.13, 0.26]), b: ivy.array([0.39, 0.52, 0.65]) }, { a: ivy.array([0., 0.024, 0.096]), b: ivy.array([0.216, 0.384, 0.6]) })
- Array.adam_step(self, mw, vw, step, /, *, beta1=0.9, beta2=0.999, epsilon=1e-07, out=None)[source]#
ivy.Array instance method variant of ivy.adam_step. This method simply wraps the function, and so the docstring for ivy.adam_step also applies to this method with minimal changes.
- Parameters:
self (
Array
) – Derivates of the cost c with respect to the weights ws, [dc/dw for w in ws].mw (
Union
[Array
,NativeArray
]) – running average of the gradients.vw (
Union
[Array
,NativeArray
]) – running average of second moments of the gradients.step (
Union
[int
,float
]) – training step.beta1 (
float
, default:0.9
) – gradient forgetting factor (Default value = 0.9).beta2 (
float
, default:0.999
) – second moment of gradient forgetting factor (Default value = 0.999).epsilon (
float
, default:1e-07
) – divisor during adam update, preventing division by zero (Default value = 1e-7).out (
Optional
[Array
], default:None
) – optional output array, for writing the result to. It must have a shape that the inputs broadcast to.
- Return type:
Array
- Returns:
ret – The adam step delta.
Examples
With
ivy.Array
inputs:>>> dcdw = ivy.array([1, 2, 3]) >>> mw = ivy.ones(3) >>> vw = ivy.ones(1) >>> step = ivy.array(3) >>> adam_step_delta = dcdw.adam_step(mw, vw, step) >>> print(adam_step_delta) (ivy.array([0.2020105,0.22187898,0.24144873]), ivy.array([1.,1.10000002,1.20000005]), ivy.array([1.,1.00300002,1.00800002]))
- Container.adam_step(self, mw, vw, step, /, *, beta1=0.9, beta2=0.999, epsilon=1e-07, out=None)[source]#
ivy.Container instance method variant of ivy.adam_step. This method simply wraps the function, and so the docstring for ivy.adam_step also applies to this method with minimal changes.
- Parameters:
self (
Container
) – Derivates of the cost c with respect to the weights ws, [dc/dw for w in ws].mw (
Union
[Array
,NativeArray
,Container
]) – running average of the gradients.vw (
Union
[Array
,NativeArray
,Container
]) – running average of second moments of the gradients.step (
Union
[int
,float
,Container
]) – training step.beta1 (
Union
[float
,Container
], default:0.9
) – gradient forgetting factor (Default value = 0.9).beta2 (
Union
[float
,Container
], default:0.999
) – second moment of gradient forgetting factor (Default value = 0.999).epsilon (
Union
[float
,Container
], default:1e-07
) – divisor during adam update, preventing division by zero (Default value = 1e-7).out (
Optional
[Container
], default:None
) – optional output container, for writing the result to. It must have a shape that the inputs broadcast to.
- Return type:
Container
- Returns:
ret – The adam step delta.
Examples
With one
ivy.Container
input:>>> dcdw = ivy.Container(a=ivy.array([0., 1., 2.]), ... b=ivy.array([3., 4., 5.])) >>> mw = ivy.array([1., 4., 9.]) >>> vw = ivy.array([0.,]) >>> step = ivy.array([3.4]) >>> beta1 = 0.87 >>> beta2 = 0.976 >>> epsilon = 1e-5 >>> adam_step_delta = dcdw.adam_step(mw, vw, step, beta1=beta1, beta2=beta2, ... epsilon=epsilon) >>> print(adam_step_delta) ({ a: ivy.array([6.49e+04, 1.74e+01, 1.95e+01]), b: ivy.array([2.02, 4.82, 8.17]) }, { a: ivy.array([0.87, 3.61, 8.09]), b: ivy.array([1.26, 4., 8.48]) }, { a: ivy.array([0., 0.024, 0.096]), b: ivy.array([0.216, 0.384, 0.6]) })
With multiple
ivy.Container
inputs:>>> dcdw = ivy.Container(a=ivy.array([0., 1., 2.]), ... b=ivy.array([3., 4., 5.])) >>> mw = ivy.Container(a=ivy.array([0., 0., 0.]), ... b=ivy.array([0., 0., 0.])) >>> vw = ivy.Container(a=ivy.array([0.,]), ... b=ivy.array([0.,])) >>> step = ivy.array([3.4]) >>> beta1 = 0.87 >>> beta2 = 0.976 >>> epsilon = 1e-5 >>> adam_step_delta = dcdw.adam_step(mw, vw, step, beta1=beta1, beta2=beta2, ... epsilon=epsilon) >>> print(adam_step_delta) ({ a: ivy.array([0., 0.626, 0.626]), b: ivy.array([0.626, 0.626, 0.626]) }, { a: ivy.array([0., 0.13, 0.26]), b: ivy.array([0.39, 0.52, 0.65]) }, { a: ivy.array([0., 0.024, 0.096]), b: ivy.array([0.216, 0.384, 0.6]) })