lamb_update#
- ivy.lamb_update(w, dcdw, lr, mw_tm1, vw_tm1, step, /, *, beta1=0.9, beta2=0.999, epsilon=1e-07, max_trust_ratio=10, decay_lambda=0, stop_gradients=True, out=None)[source]#
Update weights ws of some function, given the derivatives of some cost c with respect to ws, [dc/dw for w in ws], by applying LAMB method.
- Parameters:
w (
Union
[Array
,NativeArray
]) – Weights of the function to be updated.dcdw (
Union
[Array
,NativeArray
]) – Derivates of the cost c with respect to the weights ws, [dc/dw for w in ws].lr (
Union
[float
,Array
,NativeArray
]) – Learning rate(s), the rate(s) at which the weights should be updated relative to the gradient.mw_tm1 (
Union
[Array
,NativeArray
]) – running average of the gradients, from the previous time-step.vw_tm1 (
Union
[Array
,NativeArray
]) – running average of second moments of the gradients, from the previous time-step.step (
int
) – training step.beta1 (
float
, default:0.9
) – gradient forgetting factor (Default value = 0.9).beta2 (
float
, default:0.999
) – second moment of gradient forgetting factor (Default value = 0.999).epsilon (
float
, default:1e-07
) – divisor during adam update, preventing division by zero (Default value = 1e-7).max_trust_ratio (
Union
[int
,float
], default:10
) – The maximum value for the trust ratio. (Default value = 10)decay_lambda (
float
, default:0
) – The factor used for weight decay. (Default value = 0).stop_gradients (
bool
, default:True
) – Whether to stop the gradients of the variables after each gradient step. Default isTrue
.out (
Optional
[Array
], default:None
) – optional output array, for writing the new function weights ws_new to. It must have a shape that the inputs broadcast to.
- Return type:
- Returns:
ret – The new function weights ws_new, following the LAMB updates.
Examples
With
ivy.Array
inputs:>>> w = ivy.array([1., 2, 3]) >>> dcdw = ivy.array([0.5,0.2,0.1]) >>> lr = ivy.array(0.1) >>> vw_tm1 = ivy.zeros(1) >>> mw_tm1 = ivy.zeros(3) >>> step = ivy.array(1) >>> new_weights = ivy.lamb_update(w, dcdw, lr, mw_tm1, vw_tm1, step) >>> print(new_weights) (ivy.array([0.784, 1.78 , 2.78 ]), ... ivy.array([0.05, 0.02, 0.01]), ... ivy.array([2.5e-04, 4.0e-05, 1.0e-05]))
>>> w = ivy.array([[1., 2, 3],[4, 6, 1],[1, 0, 7]]) >>> dcdw = ivy.array([[0.5, 0.2, 0.1],[0.3, 0.6, 0.4],[0.4, 0.7, 0.2]]) >>> lr = ivy.array(0.1) >>> mw_tm1 = ivy.zeros((3,3)) >>> vw_tm1 = ivy.zeros(3) >>> step = ivy.array(1) >>> beta1 = 0.9 >>> beta2 = 0.999 >>> epsilon = 1e-7 >>> max_trust_ratio = 10 >>> decay_lambda = 0 >>> out = ivy.zeros_like(w) >>> stop_gradients = True >>> new_weights = ivy.lamb_update(w, dcdw, lr, mw_tm1, vw_tm1, step, beta1=beta1, ... beta2=beta2, epsilon=epsilon, ... max_trust_ratio=max_trust_ratio, ... decay_lambda=decay_lambda, out=out, ... stop_gradients=stop_gradients) >>> print(out) ivy.array([[ 0.639, 1.64 , 2.64 ], ... [ 3.64 , 5.64 , 0.639], ... [ 0.639, -0.361, 6.64 ]])
With one
ivy.Container
inputs:>>> w = ivy.Container(a=ivy.array([1., 2., 3.]), b=ivy.array([4., 5., 6.])) >>> dcdw = ivy.array([3., 4., 5.]) >>> mw_tm1 = ivy.array([0., 0., 0.]) >>> vw_tm1 = ivy.array([0.]) >>> lr = ivy.array(1.) >>> step = ivy.array([2]) >>> new_weights = ivy.lamb_update(w, dcdw, mw_tm1, vw_tm1, lr, step) >>> print(new_weights) ({ a: ivy.array([1., 2., 3.]), b: ivy.array([4., 5., 6.]) }, ivy.array([0.3, 0.4, 0.5]), ivy.array([1.01, 1.01, 1.02]))
With multiple
ivy.Container
inputs:>>> w = ivy.Container(a=ivy.array([1.,3.,5.]), ... b=ivy.array([3.,4.,2.])) >>> dcdw = ivy.Container(a=ivy.array([0.2,0.3,0.6]), ... b=ivy.array([0.6,0.4,0.7])) >>> mw_tm1 = ivy.Container(a=ivy.array([0.,0.,0.]), ... b=ivy.array([0.,0.,0.]))
>>> vw_tm1 = ivy.Container(a=ivy.array([0.,]), ... b=ivy.array([0.,])) >>> step = ivy.array([3.4]) >>> beta1 = 0.9 >>> beta2 = 0.999 >>> epsilon = 1e-7 >>> max_trust_ratio = 10 >>> decay_lambda = 0 >>> stop_gradients = True >>> lr = ivy.array(0.5) >>> new_weights = ivy.lamb_update(w, dcdw, lr, mw_tm1, vw_tm1, step, beta1=beta1, ... beta2=beta2, epsilon=epsilon, ... max_trust_ratio=max_trust_ratio, ... decay_lambda=decay_lambda, ... stop_gradients=stop_gradients) >>> print(new_weights) ({ a: ivy.array([-0.708, 1.29, 3.29]), b: ivy.array([1.45, 2.45, 0.445]) }, { a: ivy.array([0.02, 0.03, 0.06]), b: ivy.array([0.06, 0.04, 0.07]) }, { a: ivy.array([4.0e-05, 9.0e-05, 3.6e-04]), b: ivy.array([0.00036, 0.00016, 0.00049]) })
- Array.lamb_update(self, dcdw, lr, mw_tm1, vw_tm1, step, /, *, beta1=0.9, beta2=0.999, epsilon=1e-07, max_trust_ratio=10, decay_lambda=0, stop_gradients=True, out=None)[source]#
ivy.Array instance method variant of ivy.lamb_update. This method simply wraps the function, and so the docstring for ivy.lamb_update also applies to this method with minimal changes.
- Parameters:
self (
Array
) – Weights of the function to be updated.dcdw (
Union
[Array
,NativeArray
]) – Derivates of the cost c with respect to the weights ws, [dc/dw for w in ws].lr (
Union
[float
,Array
,NativeArray
]) – Learning rate(s), the rate(s) at which the weights should be updated relative to the gradient.mw_tm1 (
Union
[Array
,NativeArray
]) – running average of the gradients, from the previous time-step.vw_tm1 (
Union
[Array
,NativeArray
]) – running average of second moments of the gradients, from the previous time-step.step (
int
) – training step.beta1 (
float
, default:0.9
) – gradient forgetting factor (Default value = 0.9).beta2 (
float
, default:0.999
) – second moment of gradient forgetting factor (Default value = 0.999).epsilon (
float
, default:1e-07
) – divisor during adam update, preventing division by zero (Default value = 1e-7).max_trust_ratio (
Union
[int
,float
], default:10
) – The maximum value for the trust ratio. Default is 10.decay_lambda (
float
, default:0
) – The factor used for weight decay. Default is zero.stop_gradients (
bool
, default:True
) – Whether to stop the gradients of the variables after each gradient step. Default isTrue
.out (
Optional
[Array
], default:None
) – optional output array, for writing the result to. It must have a shape that the inputs broadcast to.
- Return type:
Array
- Returns:
ret – The new function weights ws_new, following the LAMB updates.
Examples
With
ivy.Array
inputs:>>> w = ivy.array([1., 2, 3]) >>> dcdw = ivy.array([0.5,0.2,0.1]) >>> lr = ivy.array(0.1) >>> vw_tm1 = ivy.zeros(1) >>> mw_tm1 = ivy.zeros(3) >>> step = ivy.array(1) >>> new_weights = w.lamb_update(dcdw, lr, mw_tm1, vw_tm1, step) >>> print(new_weights) (ivy.array([0.784, 1.78 , 2.78 ]), ivy.array([0.05, 0.02, 0.01]), ivy.array([2.5e-04, 4.0e-05, 1.0e-05]))
- Container.lamb_update(self, dcdw, lr, mw_tm1, vw_tm1, step, /, *, beta1=0.9, beta2=0.999, epsilon=1e-07, max_trust_ratio=10, decay_lambda=0, stop_gradients=True, out=None)[source]#
Update weights ws of some function, given the derivatives of some cost c with respect to ws, [dc/dw for w in ws], by applying LAMB method.
- Parameters:
self (
Container
) – Weights of the function to be updated.dcdw (
Union
[Array
,NativeArray
,Container
]) – Derivates of the cost c with respect to the weights ws, [dc/dw for w in ws].lr (
Union
[float
,Array
,NativeArray
,Container
]) – Learning rate(s), the rate(s) at which the weights should be updated relative to the gradient.mw_tm1 (
Union
[Array
,NativeArray
,Container
]) – running average of the gradients, from the previous time-step.vw_tm1 (
Union
[Array
,NativeArray
,Container
]) – running average of second moments of the gradients, from the previous time-step.step (
Union
[int
,Container
]) – training step.beta1 (
Union
[float
,Container
], default:0.9
) – gradient forgetting factor (Default value = 0.9).beta2 (
Union
[float
,Container
], default:0.999
) – second moment of gradient forgetting factor (Default value = 0.999).epsilon (
Union
[float
,Container
], default:1e-07
) – divisor during adam update, preventing division by zero (Default value = 1e-7).max_trust_ratio (
Union
[int
,float
,Container
], default:10
) – The maximum value for the trust ratio. Default is 10.decay_lambda (
Union
[float
,Container
], default:0
) – The factor used for weight decay. Default is zero.stop_gradients (
Union
[bool
,Container
], default:True
) – Whether to stop the gradients of the variables after each gradient step. Default isTrue
.out (
Optional
[Container
], default:None
) – optional output array, for writing the result to. It must have a shape that the inputs broadcast to.
- Return type:
Container
- Returns:
ret – The new function weights ws_new, following the LAMB updates.
Examples
With one
ivy.Container
inputs:>>> w = ivy.Container(a=ivy.array([1., 2., 3.]), b=ivy.array([4., 5., 6.])) >>> dcdw = ivy.array([3., 4., 5.]) >>> mw_tm1 = ivy.array([0., 0., 0.]) >>> vw_tm1 = ivy.array([0.]) >>> lr = ivy.array(1.) >>> step = ivy.array([2]) >>> new_weights = w.lamb_update(dcdw, mw_tm1, vw_tm1, lr, step) >>> print(new_weights) ({ a: ivy.array([1., 2., 3.]), b: ivy.array([4., 5., 6.]) }, ivy.array([0.3, 0.4, 0.5]), ivy.array([1.01, 1.01, 1.02]))
With multiple
ivy.Container
inputs:>>> w = ivy.Container(a=ivy.array([1.,3.,5.]), ... b=ivy.array([3.,4.,2.])) >>> dcdw = ivy.Container(a=ivy.array([0.2,0.3,0.6]), ... b=ivy.array([0.6,0.4,0.7])) >>> mw_tm1 = ivy.Container(a=ivy.array([0.,0.,0.]), ... b=ivy.array([0.,0.,0.]))
>>> vw_tm1 = ivy.Container(a=ivy.array([0.,]), ... b=ivy.array([0.,])) >>> step = ivy.array([3.4]) >>> beta1 = 0.9 >>> beta2 = 0.999 >>> epsilon = 1e-7 >>> max_trust_ratio = 10 >>> decay_lambda = 0 >>> stop_gradients = True >>> lr = ivy.array(0.5) >>> new_weights = w.lamb_update(dcdw, lr, mw_tm1, vw_tm1, step, beta1=beta1, ... beta2=beta2, epsilon=epsilon, ... max_trust_ratio=max_trust_ratio, ... decay_lambda=decay_lambda, ... stop_gradients=stop_gradients) >>> print(new_weights) ({ a: ivy.array([-0.708, 1.29, 3.29]), b: ivy.array([1.45, 2.45, 0.445]) }, { a: ivy.array([0.02, 0.03, 0.06]), b: ivy.array([0.06, 0.04, 0.07]) }, { a: ivy.array([4.0e-05, 9.0e-05, 3.6e-04]), b: ivy.array([0.00036, 0.00016, 0.00049]) })