lamb_update#

ivy.lamb_update(w, dcdw, lr, mw_tm1, vw_tm1, step, /, *, beta1=0.9, beta2=0.999, epsilon=1e-07, max_trust_ratio=10, decay_lambda=0, stop_gradients=True, out=None)[source]#

Update weights ws of some function, given the derivatives of some cost c with respect to ws, [dc/dw for w in ws], by applying LAMB method.

Parameters:

w (Union[Array, NativeArray]) – Weights of the function to be updated.
dcdw (Union[Array, NativeArray]) – Derivates of the cost c with respect to the weights ws, [dc/dw for w in ws].
lr (Union[float, Array, NativeArray]) – Learning rate(s), the rate(s) at which the weights should be updated relative to the gradient.
mw_tm1 (Union[Array, NativeArray]) – running average of the gradients, from the previous time-step.
vw_tm1 (Union[Array, NativeArray]) – running average of second moments of the gradients, from the previous time-step.
step (int) – training step.
beta1 (float, default: 0.9) – gradient forgetting factor (Default value = 0.9).
beta2 (float, default: 0.999) – second moment of gradient forgetting factor (Default value = 0.999).
epsilon (float, default: 1e-07) – divisor during adam update, preventing division by zero (Default value = 1e-7).
max_trust_ratio (Union[int, float], default: 10) – The maximum value for the trust ratio. (Default value = 10)
decay_lambda (float, default: 0) – The factor used for weight decay. (Default value = 0).
stop_gradients (bool, default: True) – Whether to stop the gradients of the variables after each gradient step. Default is True.
out (Optional[Array], default: None) – optional output array, for writing the new function weights ws_new to. It must have a shape that the inputs broadcast to.

Return type:

Tuple[Array, Array, Array]

Returns:

ret – The new function weights ws_new, following the LAMB updates.

Examples

With ivy.Array inputs:

>>> w = ivy.array([1., 2, 3])
>>> dcdw = ivy.array([0.5,0.2,0.1])
>>> lr = ivy.array(0.1)
>>> vw_tm1 = ivy.zeros(1)
>>> mw_tm1 = ivy.zeros(3)
>>> step = ivy.array(1)
>>> new_weights = ivy.lamb_update(w, dcdw, lr, mw_tm1, vw_tm1, step)
>>> print(new_weights)
(ivy.array([0.784, 1.78 , 2.78 ]),
... ivy.array([0.05, 0.02, 0.01]),
... ivy.array([2.5e-04, 4.0e-05, 1.0e-05]))

>>> w = ivy.array([[1., 2, 3],[4, 6, 1],[1, 0, 7]])
>>> dcdw = ivy.array([[0.5, 0.2, 0.1],[0.3, 0.6, 0.4],[0.4, 0.7, 0.2]])
>>> lr = ivy.array(0.1)
>>> mw_tm1 = ivy.zeros((3,3))
>>> vw_tm1 = ivy.zeros(3)
>>> step = ivy.array(1)
>>> beta1 = 0.9
>>> beta2 = 0.999
>>> epsilon = 1e-7
>>> max_trust_ratio = 10
>>> decay_lambda = 0
>>> out = ivy.zeros_like(w)
>>> stop_gradients = True
>>> new_weights = ivy.lamb_update(w, dcdw, lr, mw_tm1, vw_tm1, step, beta1=beta1,
...                               beta2=beta2, epsilon=epsilon,
...                               max_trust_ratio=max_trust_ratio,
...                               decay_lambda=decay_lambda, out=out,
...                               stop_gradients=stop_gradients)
>>> print(out)
ivy.array([[ 0.639,  1.64 ,  2.64 ],
...        [ 3.64 ,  5.64 ,  0.639],
...        [ 0.639, -0.361,  6.64 ]])

With one ivy.Container inputs:

>>> w = ivy.Container(a=ivy.array([1., 2., 3.]), b=ivy.array([4., 5., 6.]))
>>> dcdw = ivy.array([3., 4., 5.])
>>> mw_tm1 = ivy.array([0., 0., 0.])
>>> vw_tm1 = ivy.array([0.])
>>> lr = ivy.array(1.)
>>> step = ivy.array([2])
>>> new_weights = ivy.lamb_update(w, dcdw, mw_tm1, vw_tm1, lr, step)
>>> print(new_weights)
({
    a: ivy.array([1., 2., 3.]),
    b: ivy.array([4., 5., 6.])
}, ivy.array([0.3, 0.4, 0.5]), ivy.array([1.01, 1.01, 1.02]))

With multiple ivy.Container inputs:

>>> w = ivy.Container(a=ivy.array([1.,3.,5.]),
...                   b=ivy.array([3.,4.,2.]))
>>> dcdw = ivy.Container(a=ivy.array([0.2,0.3,0.6]),
...                      b=ivy.array([0.6,0.4,0.7]))
>>> mw_tm1 = ivy.Container(a=ivy.array([0.,0.,0.]),
...                        b=ivy.array([0.,0.,0.]))

>>> vw_tm1 = ivy.Container(a=ivy.array([0.,]),
...                        b=ivy.array([0.,]))
>>> step = ivy.array([3.4])
>>> beta1 = 0.9
>>> beta2 = 0.999
>>> epsilon = 1e-7
>>> max_trust_ratio = 10
>>> decay_lambda = 0
>>> stop_gradients = True
>>> lr = ivy.array(0.5)
>>> new_weights = ivy.lamb_update(w, dcdw, lr, mw_tm1, vw_tm1, step, beta1=beta1,
...                               beta2=beta2, epsilon=epsilon,
...                               max_trust_ratio=max_trust_ratio,
...                               decay_lambda=decay_lambda,
...                               stop_gradients=stop_gradients)
>>> print(new_weights)
({
    a: ivy.array([-0.708, 1.29, 3.29]),
    b: ivy.array([1.45, 2.45, 0.445])
}, {
    a: ivy.array([0.02, 0.03, 0.06]),
    b: ivy.array([0.06, 0.04, 0.07])
}, {
    a: ivy.array([4.0e-05, 9.0e-05, 3.6e-04]),
    b: ivy.array([0.00036, 0.00016, 0.00049])
})

Array.lamb_update(self, dcdw, lr, mw_tm1, vw_tm1, step, /, *, beta1=0.9, beta2=0.999, epsilon=1e-07, max_trust_ratio=10, decay_lambda=0, stop_gradients=True, out=None)[source]#

ivy.Array instance method variant of ivy.lamb_update. This method simply wraps the function, and so the docstring for ivy.lamb_update also applies to this method with minimal changes.

Parameters:

self (Array) – Weights of the function to be updated.
dcdw (Union[Array, NativeArray]) – Derivates of the cost c with respect to the weights ws, [dc/dw for w in ws].
lr (Union[float, Array, NativeArray]) – Learning rate(s), the rate(s) at which the weights should be updated relative to the gradient.
mw_tm1 (Union[Array, NativeArray]) – running average of the gradients, from the previous time-step.
vw_tm1 (Union[Array, NativeArray]) – running average of second moments of the gradients, from the previous time-step.
step (int) – training step.
beta1 (float, default: 0.9) – gradient forgetting factor (Default value = 0.9).
beta2 (float, default: 0.999) – second moment of gradient forgetting factor (Default value = 0.999).
epsilon (float, default: 1e-07) – divisor during adam update, preventing division by zero (Default value = 1e-7).
max_trust_ratio (Union[int, float], default: 10) – The maximum value for the trust ratio. Default is 10.
decay_lambda (float, default: 0) – The factor used for weight decay. Default is zero.
stop_gradients (bool, default: True) – Whether to stop the gradients of the variables after each gradient step. Default is True.
out (Optional[Array], default: None) – optional output array, for writing the result to. It must have a shape that the inputs broadcast to.

Return type:

Array

Returns:

ret – The new function weights ws_new, following the LAMB updates.

Examples

With ivy.Array inputs:

>>> w = ivy.array([1., 2, 3])
>>> dcdw = ivy.array([0.5,0.2,0.1])
>>> lr = ivy.array(0.1)
>>> vw_tm1 = ivy.zeros(1)
>>> mw_tm1 = ivy.zeros(3)
>>> step = ivy.array(1)
>>> new_weights = w.lamb_update(dcdw, lr, mw_tm1, vw_tm1, step)
>>> print(new_weights)
(ivy.array([0.784, 1.78 , 2.78 ]),
ivy.array([0.05, 0.02, 0.01]),
ivy.array([2.5e-04, 4.0e-05, 1.0e-05]))

Container.lamb_update(self, dcdw, lr, mw_tm1, vw_tm1, step, /, *, beta1=0.9, beta2=0.999, epsilon=1e-07, max_trust_ratio=10, decay_lambda=0, stop_gradients=True, out=None)[source]#

Update weights ws of some function, given the derivatives of some cost c with respect to ws, [dc/dw for w in ws], by applying LAMB method.

Parameters:

self (Container) – Weights of the function to be updated.
dcdw (Union[Array, NativeArray, Container]) – Derivates of the cost c with respect to the weights ws, [dc/dw for w in ws].
lr (Union[float, Array, NativeArray, Container]) – Learning rate(s), the rate(s) at which the weights should be updated relative to the gradient.
mw_tm1 (Union[Array, NativeArray, Container]) – running average of the gradients, from the previous time-step.
vw_tm1 (Union[Array, NativeArray, Container]) – running average of second moments of the gradients, from the previous time-step.
step (Union[int, Container]) – training step.
beta1 (Union[float, Container], default: 0.9) – gradient forgetting factor (Default value = 0.9).
beta2 (Union[float, Container], default: 0.999) – second moment of gradient forgetting factor (Default value = 0.999).
epsilon (Union[float, Container], default: 1e-07) – divisor during adam update, preventing division by zero (Default value = 1e-7).
max_trust_ratio (Union[int, float, Container], default: 10) – The maximum value for the trust ratio. Default is 10.
decay_lambda (Union[float, Container], default: 0) – The factor used for weight decay. Default is zero.
stop_gradients (Union[bool, Container], default: True) – Whether to stop the gradients of the variables after each gradient step. Default is True.
out (Optional[Container], default: None) – optional output array, for writing the result to. It must have a shape that the inputs broadcast to.

Return type:

Container

Returns:

ret – The new function weights ws_new, following the LAMB updates.

Examples

With one ivy.Container inputs:

>>> w = ivy.Container(a=ivy.array([1., 2., 3.]), b=ivy.array([4., 5., 6.]))
>>> dcdw = ivy.array([3., 4., 5.])
>>> mw_tm1 = ivy.array([0., 0., 0.])
>>> vw_tm1 = ivy.array([0.])
>>> lr = ivy.array(1.)
>>> step = ivy.array([2])
>>> new_weights = w.lamb_update(dcdw, mw_tm1, vw_tm1, lr, step)
>>> print(new_weights)
({
    a: ivy.array([1., 2., 3.]),
    b: ivy.array([4., 5., 6.])
}, ivy.array([0.3, 0.4, 0.5]), ivy.array([1.01, 1.01, 1.02]))

With multiple ivy.Container inputs:

>>> w = ivy.Container(a=ivy.array([1.,3.,5.]),
...                      b=ivy.array([3.,4.,2.]))
>>> dcdw = ivy.Container(a=ivy.array([0.2,0.3,0.6]),
...                         b=ivy.array([0.6,0.4,0.7]))
>>> mw_tm1 = ivy.Container(a=ivy.array([0.,0.,0.]),
...                           b=ivy.array([0.,0.,0.]))

>>> vw_tm1 = ivy.Container(a=ivy.array([0.,]),
...                           b=ivy.array([0.,]))
>>> step = ivy.array([3.4])
>>> beta1 = 0.9
>>> beta2 = 0.999
>>> epsilon = 1e-7
>>> max_trust_ratio = 10
>>> decay_lambda = 0
>>> stop_gradients = True
>>> lr = ivy.array(0.5)
>>> new_weights = w.lamb_update(dcdw, lr, mw_tm1, vw_tm1, step, beta1=beta1,
...                                beta2=beta2, epsilon=epsilon,
...                                max_trust_ratio=max_trust_ratio,
...                                decay_lambda=decay_lambda,
...                                stop_gradients=stop_gradients)
>>> print(new_weights)
({
    a: ivy.array([-0.708, 1.29, 3.29]),
    b: ivy.array([1.45, 2.45, 0.445])
}, {
    a: ivy.array([0.02, 0.03, 0.06]),
    b: ivy.array([0.06, 0.04, 0.07])
}, {
    a: ivy.array([4.0e-05, 9.0e-05, 3.6e-04]),
    b: ivy.array([0.00036, 0.00016, 0.00049])
})