adam_update#
- ivy.adam_update(w, dcdw, lr, mw_tm1, vw_tm1, step, /, *, beta1=0.9, beta2=0.999, epsilon=1e-07, stop_gradients=True, out=None)[source]#
Update weights ws of some function, given the derivatives of some cost c with respect to ws, using ADAM update. `[reference]
<https://en.wikipedia.org/wiki/Stochastic_gradient_descent#Adam>`_
- Parameters:
w (
Union
[Array
,NativeArray
]) – Weights of the function to be updated.dcdw (
Union
[Array
,NativeArray
]) – Derivates of the cost c with respect to the weights ws, [dc/dw for w in ws].lr (
Union
[float
,Array
,NativeArray
]) – Learning rate(s), the rate(s) at which the weights should be updated relative to the gradient.mw_tm1 (
Union
[Array
,NativeArray
]) – running average of the gradients, from the previous time-step.vw_tm1 (
Union
[Array
,NativeArray
]) – running average of second moments of the gradients, from the previous time-step.step (
int
) – training step.beta1 (
float
, default:0.9
) – gradient forgetting factor (Default value = 0.9).beta2 (
float
, default:0.999
) – second moment of gradient forgetting factor (Default value = 0.999).epsilon (
float
, default:1e-07
) – divisor during adam update, preventing division by zero (Default value = 1e-7).stop_gradients (
bool
, default:True
) – Whether to stop the gradients of the variables after each gradient step. Default isTrue
.out (
Optional
[Array
], default:None
) – optional output array, for writing the new function weights ws_new to. It must have a shape that the inputs broadcast to.
- Return type:
- Returns:
ret – The new function weights ws_new, and also new mw and vw, following the adam updates.
Examples
With
ivy.Array
inputs:>>> w = ivy.array([1., 2, 3]) >>> dcdw = ivy.array([0.5,0.2,0.1]) >>> lr = ivy.array(0.1) >>> vw_tm1 = ivy.zeros(1) >>> mw_tm1 = ivy.zeros(3) >>> step = 1 >>> updated_weights = ivy.adam_update(w, dcdw, lr, mw_tm1, vw_tm1, step) >>> print(updated_weights) (ivy.array([0.90000075, 1.90000164, 2.9000032 ]), ivy.array([0.05, 0.02, 0.01]), ivy.array([2.50000012e-04, 4.00000063e-05, 1.00000016e-05]))
>>> w = ivy.array([[1., 2, 3],[4, 2, 4],[6, 4, 2]]) >>> dcdw = ivy.array([[0.1, 0.2, 0.3],[0.4, 0.5, 0.1],[0.1, 0.5, 0.3]]) >>> lr = ivy.array(0.1) >>> mw_tm1 = ivy.zeros((3,3)) >>> vw_tm1 = ivy.zeros(3) >>> step = 2 >>> beta1 = 0.9 >>> beta2 = 0.999 >>> epsilon = 1e-7 >>> out = ivy.zeros_like(w) >>> stop_gradients = True >>> updated_weights = ivy.adam_update(w, dcdw, lr, mw_tm1, vw_tm1, step, ... beta1=beta1, beta2=beta2, ... epsilon=epsilon, out=out, ... stop_gradients=stop_gradients) >>> print(updated_weights) ( ivy.array([[0.92558873, 1.92558754, 2.92558718], [3.92558694, 1.92558682, 3.92558861], [5.92558861, 3.92558694, 1.92558718]]), ivy.array([[0.01, 0.02, 0.03], [0.04, 0.05, 0.01], [0.01, 0.05, 0.03]]), ivy.array([[1.00000016e-05, 4.00000063e-05, 9.00000086e-05], [1.60000025e-04, 2.50000012e-04, 1.00000016e-05], [1.00000016e-05, 2.50000012e-04, 9.00000086e-05]]) )
With one
ivy.Container
input:>>> w = ivy.Container(a=ivy.array([1., 2., 3.]), b=ivy.array([4., 5., 6.])) >>> dcdw = ivy.array([0.5, 0.2, 0.4]) >>> mw_tm1 = ivy.array([0., 0., 0.]) >>> vw_tm1 = ivy.array([0.]) >>> lr = ivy.array(0.01) >>> step = 2 >>> updated_weights = ivy.adam_update(w, dcdw, mw_tm1, vw_tm1, lr, step) >>> print(updated_weights) ({ a: ivy.array([1., 2., 3.]), b: ivy.array([4., 5., 6.]) }, ivy.array([0.05, 0.02, 0.04]), ivy.array([0.01024, 0.01003, 0.01015]))
With multiple
ivy.Container
inputs:>>> x = ivy.Container(a=ivy.array([0., 1., 2.]), ... b=ivy.array([3., 4., 5.])) >>> dcdw = ivy.Container(a=ivy.array([0.1,0.3,0.3]), ... b=ivy.array([0.3,0.2,0.2])) >>> mw_tm1 = ivy.Container(a=ivy.array([0.,0.,0.]), ... b=ivy.array([0.,0.,0.])) >>> vw_tm1 = ivy.Container(a=ivy.array([0.,]), ... b=ivy.array([0.,])) >>> step = 3 >>> beta1 = 0.9 >>> beta2 = 0.999 >>> epsilon = 1e-7 >>> stop_gradients = False >>> lr = ivy.array(0.001) >>> updated_weights = ivy.adam_update(w, dcdw, lr, mw_tm1, vw_tm1, step, ... beta1=beta1, ... beta2=beta2, epsilon=epsilon, ... stop_gradients=stop_gradients) >>> print(updated_weights) ({ a: ivy.array([0.99936122, 1.99936116, 2.99936128]), b: ivy.array([3.99936128, 4.99936104, 5.99936104]) }, { a: ivy.array([0.01, 0.03, 0.03]), b: ivy.array([0.03, 0.02, 0.02]) }, { a: ivy.array([1.00000016e-05, 9.00000086e-05, 9.00000086e-05]), b: ivy.array([9.00000086e-05, 4.00000063e-05, 4.00000063e-05]) })
- Array.adam_update(self, dcdw, lr, mw_tm1, vw_tm1, step, /, *, beta1=0.9, beta2=0.999, epsilon=1e-07, stop_gradients=True, out=None)[source]#
ivy.Array instance method variant of ivy.adam_update. This method simply wraps the function, and so the docstring for ivy.adam_update also applies to this method with minimal changes.
- Parameters:
self (
Array
) – Weights of the function to be updated.dcdw (
Union
[Array
,NativeArray
]) – Derivates of the cost c with respect to the weights ws, [dc/dw for w in ws].lr (
Union
[float
,Array
,NativeArray
]) – Learning rate(s), the rate(s) at which the weights should be updated relative to the gradient.mw_tm1 (
Union
[Array
,NativeArray
]) – running average of the gradients, from the previous time-step.vw_tm1 (
Union
[Array
,NativeArray
]) – running average of second moments of the gradients, from the previous time-step.step (
int
) – training step.beta1 (
float
, default:0.9
) – gradient forgetting factor (Default value = 0.9).beta2 (
float
, default:0.999
) – second moment of gradient forgetting factor (Default value = 0.999).epsilon (
float
, default:1e-07
) – divisor during adam update, preventing division by zero (Default value = 1e-7).stop_gradients (
bool
, default:True
) – Whether to stop the gradients of the variables after each gradient step. Default isTrue
.out (
Optional
[Array
], default:None
) – optional output array, for writing the result to. It must have a shape that the inputs broadcast to.
- Return type:
Array
- Returns:
ret – The new function weights ws_new, and also new mw and vw, following the adam updates.
Examples
With
ivy.Array
inputs:>>> w = ivy.array([1., 2, 3.]) >>> dcdw = ivy.array([0.2,0.1,0.3]) >>> lr = ivy.array(0.1) >>> vw_tm1 = ivy.zeros(1) >>> mw_tm1 = ivy.zeros(3) >>> step = 2 >>> updated_weights = w.adam_update(dcdw, lr, mw_tm1, vw_tm1, step) >>> print(updated_weights) (ivy.array([0.92558753, 1.92558873, 2.92558718]), ivy.array([0.02, 0.01, 0.03]), ivy.array([4.00000063e-05, 1.00000016e-05, 9.00000086e-05]))
- Container.adam_update(self, dcdw, lr, mw_tm1, vw_tm1, step, /, *, beta1=0.9, beta2=0.999, epsilon=1e-07, stop_gradients=True, out=None)[source]#
Update weights ws of some function, given the derivatives of some cost c with respect to ws, using ADAM update. `[reference]
<https://en.wikipedia.org/wiki/Stochastic_gradient_descent#Adam>`_
- Parameters:
self (
Container
) – Weights of the function to be updated.dcdw (
Union
[Array
,NativeArray
,Container
]) – Derivates of the cost c with respect to the weights ws, [dc/dw for w in ws].lr (
Union
[float
,Array
,NativeArray
,Container
]) – Learning rate(s), the rate(s) at which the weights should be updated relative to the gradient.mw_tm1 (
Union
[Array
,NativeArray
,Container
]) – running average of the gradients, from the previous time-step.vw_tm1 (
Union
[Array
,NativeArray
,Container
]) – running average of second moments of the gradients, from the previous time-step.step (
Union
[int
,Container
]) – training step.beta1 (
Union
[float
,Container
], default:0.9
) – gradient forgetting factor (Default value = 0.9).beta2 (
Union
[float
,Container
], default:0.999
) – second moment of gradient forgetting factor (Default value = 0.999).epsilon (
Union
[float
,Container
], default:1e-07
) – divisor during adam update, preventing division by zero (Default value = 1e-7).stop_gradients (
Union
[bool
,Container
], default:True
) – Whether to stop the gradients of the variables after each gradient step. Default isTrue
.out (
Optional
[Container
], default:None
) – optional output container, for writing the result to. It must have a shape that the inputs broadcast to.
- Return type:
Container
- Returns:
ret – The new function weights ws_new, and also new mw and vw, following the adam updates.
Examples
With one
ivy.Container
inputs:>>> w = ivy.Container(a=ivy.array([1., 2., 3.]), b=ivy.array([4., 5., 6.])) >>> dcdw = ivy.array([1., 0.2, 0.4]) >>> mw_tm1 = ivy.array([0., 0., 0.]) >>> vw_tm1 = ivy.array([0.]) >>> lr = ivy.array(0.01) >>> step = 2 >>> updated_weights = w.adam_update(dcdw, mw_tm1, vw_tm1, lr, step) >>> print(updated_weights) ({ a: ivy.array([1., 2., 3.]), b: ivy.array([4., 5., 6.]) }, ivy.array([0.1 , 0.02, 0.04]), ivy.array([0.01099, 0.01003, 0.01015]))
With multiple
ivy.Container
inputs:>>> x = ivy.Container(a=ivy.array([0., 1., 2.]), ... b=ivy.array([3., 4., 5.])) >>> dcdw = ivy.Container(a=ivy.array([0.1,0.3,0.3]), ... b=ivy.array([0.3,0.2,0.2])) >>> lr = ivy.array(0.001) >>> mw_tm1 = ivy.Container(a=ivy.array([0.,0.,0.]), ... b=ivy.array([0.,0.,0.])) >>> vw_tm1 = ivy.Container(a=ivy.array([0.,]), ... b=ivy.array([0.,])) >>> step = 3 >>> beta1 = 0.9 >>> beta2 = 0.999 >>> epsilon = 1e-7 >>> stop_gradients = False >>> updated_weights = w.adam_update(dcdw, lr, mw_tm1, vw_tm1, step, beta1=beta1, ... beta2=beta2, epsilon=epsilon, ... stop_gradients=stop_gradients) >>> print(updated_weights) ({ a: ivy.array([0.99936122, 1.99936116, 2.99936128]), b: ivy.array([3.99936128, 4.99936104, 5.99936104]) }, { a: ivy.array([0.01, 0.03, 0.03]), b: ivy.array([0.03, 0.02, 0.02]) }, { a: ivy.array([1.00000016e-05, 9.00000086e-05, 9.00000086e-05]), b: ivy.array([9.00000086e-05, 4.00000063e-05, 4.00000063e-05]) })