Optimizers#
Collection of Ivy optimizers.
- class ivy.stateful.optimizers.Adam(lr=0.0001, beta1=0.9, beta2=0.999, epsilon=1e-07, inplace=True, stop_gradients=True, trace_on_next_step=False, device=None)[source]#
Bases:
Optimizer
- __init__(lr=0.0001, beta1=0.9, beta2=0.999, epsilon=1e-07, inplace=True, stop_gradients=True, trace_on_next_step=False, device=None)[source]#
Construct an ADAM optimizer.
- Parameters:
lr (
float
, default:0.0001
) – Learning rate, default is1e-4
.beta1 (
float
, default:0.9
) – gradient forgetting factor, default is0.9
beta2 (
float
, default:0.999
) – second moment of gradient forgetting factor, default is0.999
epsilon (
float
, default:1e-07
) – divisor during adam update, preventing division by zero, default is1e-07
inplace (
bool
, default:True
) – Whether to update the variables in-place, or to create new variable handles. This is only relevant for frameworks with stateful variables such as PyTorch. Default isTrue
, provided the backend framework supports it.stop_gradients (
bool
, default:True
) – Whether to stop the gradients of the variables after each gradient step. Default isTrue
.trace_on_next_step (
bool
, default:False
) – Whether to trace the optimizer on the next step. Default isFalse
.device (
Optional
[Union
[Device
,NativeDevice
]], default:None
) – Device on which to create the layer’s variables ‘cuda:0’, ‘cuda:1’, ‘cpu’ etc. (Default value = None)
- set_state(state)[source]#
Set state of the optimizer.
- Parameters:
state (
Container
) – Nested state to update.
- property state#
- class ivy.stateful.optimizers.AdamW(lr=0.0001, beta1=0.9, beta2=0.999, epsilon=1e-07, weight_decay=0.0, inplace=True, stop_gradients=True, trace_on_next_step=False, device=None)[source]#
Bases:
Adam
- __init__(lr=0.0001, beta1=0.9, beta2=0.999, epsilon=1e-07, weight_decay=0.0, inplace=True, stop_gradients=True, trace_on_next_step=False, device=None)[source]#
Construct an ADAMW optimizer.
- Parameters:
lr (
float
, default:0.0001
) – Learning rate, default is1e-4
.beta1 (
float
, default:0.9
) – gradient forgetting factor, default is0.9
beta2 (
float
, default:0.999
) – second moment of gradient forgetting factor, default is0.999
epsilon (
float
, default:1e-07
) – divisor during adamw update, preventing division by zero, default is1e-07
weight_decay (
float
, default:0.0
) – weight decay coefficient, default is0.0
inplace (
bool
, default:True
) – Whether to update the variables in-place, or to create new variable handles. This is only relevant for frameworks with stateful variables such as PyTorch. Default isTrue
, provided the backend framework supports it.stop_gradients (
bool
, default:True
) – Whether to stop the gradients of the variables after each gradient step. Default isTrue
.trace_on_next_step (
bool
, default:False
) – Whether to trace the optimizer on the next step. Default isFalse
.device (
Optional
[Union
[Device
,NativeDevice
]], default:None
) – Device on which to create the layer’s variables ‘cuda:0’, ‘cuda:1’, ‘cpu’ etc. (Default value = None)
- class ivy.stateful.optimizers.LAMB(lr=0.0001, beta1=0.9, beta2=0.999, epsilon=1e-07, max_trust_ratio=10, decay_lambda=0, inplace=True, stop_gradients=True, trace_on_next_step=False, device=None)[source]#
Bases:
Optimizer
- __init__(lr=0.0001, beta1=0.9, beta2=0.999, epsilon=1e-07, max_trust_ratio=10, decay_lambda=0, inplace=True, stop_gradients=True, trace_on_next_step=False, device=None)[source]#
Construct an LAMB optimizer.
- Parameters:
lr (
float
, default:0.0001
) – Learning rate, default is1e-4
.beta1 (
float
, default:0.9
) – gradient forgetting factor, default is0.9
beta2 (
float
, default:0.999
) – second moment of gradient forgetting factor, default is0.999
epsilon (
float
, default:1e-07
) – divisor during adam update, preventing division by zero, default is1e-07
max_trust_ratio (
float
, default:10
) – The max value of the trust ratio; the ratio between the norm of the layer weights and norm of gradients update. Default is10
.decay_lambda (
float
, default:0
) – The factor used for weight decay. Default is0
.inplace (
bool
, default:True
) – Whether to update the variables in-place, or to create new variable handles. This is only relevant for frameworks with stateful variables such as PyTorch. Default isTrue
, provided the backend framework supports it.stop_gradients (
bool
, default:True
) – Whether to stop the gradients of the variables after each gradient step. Default isTrue
.trace_on_next_step (
bool
, default:False
) – Whether to trace the optimizer on the next step. Default isFalse
.device (
Optional
[Union
[Device
,NativeDevice
]], default:None
) – Device on which to create the layer’s variables ‘cuda:0’, ‘cuda:1’, ‘cpu’ etc. (Default value = None)
- set_state(state)[source]#
Set state of the optimizer.
- Parameters:
state (
Container
) – Nested state to update.
- property state#
- class ivy.stateful.optimizers.LARS(lr=0.0001, decay_lambda=0, inplace=True, stop_gradients=True, trace_on_next_step=False)[source]#
Bases:
Optimizer
- __init__(lr=0.0001, decay_lambda=0, inplace=True, stop_gradients=True, trace_on_next_step=False)[source]#
Construct a Layer-wise Adaptive Rate Scaling (LARS) optimizer.
- Parameters:
lr (
float
, default:0.0001
) – Learning rate, default is1e-4
.decay_lambda (
float
, default:0
) – The factor used for weight decay. Default is0
.inplace (
bool
, default:True
) – Whether to update the variables in-place, or to create new variable handles. This is only relevant for frameworks with stateful variables such as PyTorch. Default isTrue
, provided the backend framework supports it.stop_gradients (
bool
, default:True
) – Whether to stop the gradients of the variables after each gradient step. Default isTrue
.trace_on_next_step (
bool
, default:False
) – Whether to trace the optimizer on the next step. Default isFalse
.
- set_state(state)[source]#
Set state of the optimizer.
- Parameters:
state (
Container
) – Nested state to update.
- property state#
- class ivy.stateful.optimizers.Optimizer(lr, inplace=True, stop_gradients=True, init_on_first_step=False, trace_on_next_step=False, fallback_to_non_traced=False, device=None)[source]#
Bases:
ABC
- __init__(lr, inplace=True, stop_gradients=True, init_on_first_step=False, trace_on_next_step=False, fallback_to_non_traced=False, device=None)[source]#
Construct a general Optimizer. This is an abstract class, and must be derived.
- Parameters:
lr (
Union
[float
,Callable
]) – Learning rate.inplace (
bool
, default:True
) – Whether to update the variables in-place, or to create new variable handles. This is only relevant for frameworks with stateful variables such as PyTorch. Default isTrue
, provided the backend framework supports it.stop_gradients (
bool
, default:True
) – Whether to stop the gradients of the variables after each gradient step. Default isTrue
.init_on_first_step (
bool
, default:False
) – Whether the optimizer is initialized on the first step. Default isFalse
.trace_on_next_step (
bool
, default:False
) – Whether to trace the optimizer on the next step. Default isFalse
.fallback_to_non_traced (
bool
, default:False
) – Whether to fall back to non-traced forward call in the case that an error is raised during the traced forward pass. Default isTrue
.device (
Optional
[Union
[Device
,NativeDevice
]], default:None
) – Device on which to create the layer’s variables ‘cuda:0’, ‘cuda:1’, ‘cpu’ etc. (Default value = None)
- abstract set_state(state)[source]#
Set state of the optimizer.
- Parameters:
state (
Container
) – Nested state to update.
- class ivy.stateful.optimizers.SGD(lr=0.0001, inplace=True, stop_gradients=True, trace_on_next_step=False)[source]#
Bases:
Optimizer
- __init__(lr=0.0001, inplace=True, stop_gradients=True, trace_on_next_step=False)[source]#
Construct a Stochastic-Gradient-Descent (SGD) optimizer.
- Parameters:
lr (
float
, default:0.0001
) – Learning rate, default is1e-4
.inplace (
bool
, default:True
) – Whether to update the variables in-place, or to create new variable handles. This is only relevant for frameworks with stateful variables such as PyTorch. Default isTrue
, provided the backend framework supports it.stop_gradients (
bool
, default:True
) – Whether to stop the gradients of the variables after each gradient step. Default isTrue
.trace_on_next_step (
bool
, default:False
) – Whether to trace the optimizer on the next step. Default isFalse
.
- set_state(state)[source]#
Set state of the optimizer.
- Parameters:
state (
Container
) – Nested state to update.
- property state#
This should have hopefully given you an overview of the optimizers submodule, if you have any questions, please feel free to reach out on our discord!