Layers#

Collection of Ivy neural network layers as stateful classes.

class ivy.stateful.layers.AdaptiveAvgPool1d(*args, **kwargs)[source]#

Bases: Module

__init__(output_size, device=None, dtype=None)[source]#

Class for applying a 1D adaptive average pooling over mini-batch of inputs.

Parameters:
  • output_size – An integer or tuple/list of a single integer specifying new size of output channels.

  • device (default: None) – device on which to create the layer’s variables ‘cuda:0’, ‘cuda:1’, ‘cpu’

class ivy.stateful.layers.AdaptiveAvgPool2d(*args, **kwargs)[source]#

Bases: Module

__init__(output_size, /, *, data_format='NHWC', device=None, dtype=None)[source]#

Class for applying a 2D adaptive average pooling over mini-batch of inputs.

Parameters:
  • output_size – the target output size of the image.

  • data_format (default: 'NHWC') – NHWC” or “NCHW”. Defaults to “NHWC”.

  • device (default: None) – device on which to create the layer’s variables ‘cuda:0’, ‘cuda:1’, ‘cpu’

class ivy.stateful.layers.AvgPool1D(*args, **kwargs)[source]#

Bases: Module

__init__(kernel_size, stride, padding, /, *, data_format='NWC')[source]#

Class for applying Average Pooling over a mini-batch of inputs.

Parameters:
  • kernel_size – The size of the window to take an average over.

  • stride – The stride of the window. Default value: 1

  • padding – Implicit zero padding to be added on both sides.

  • data_format (default: 'NWC') – “NCW” or “NWC”. Defaults to “NWC”.

class ivy.stateful.layers.AvgPool2D(*args, **kwargs)[source]#

Bases: Module

__init__(kernel_size, stride, padding, /, *, data_format='NHWC', device=None, v=None, dtype=None)[source]#

Class for applying Average Pooling over a mini-batch of inputs.

Parameters:
  • kernel_size – The size of the window to take a max over.

  • stride – The stride of the window. Default value: 1

  • padding – Implicit zero padding to be added on both sides.

  • device (default: None) – device on which to create the layer’s variables ‘cuda:0’, ‘cuda:1’, ‘cpu’

class ivy.stateful.layers.AvgPool3D(*args, **kwargs)[source]#

Bases: Module

__init__(kernel_size, strides, padding, /, *, data_format='NDHWC', count_include_pad=False, ceil_mode=False, divisor_override=None)[source]#

Class for applying Average Pooling over a mini-batch of inputs.

Parameters:
  • kernel_size – The size of the window to take a max over.

  • stride – The stride of the window. Default value: 1

  • padding – Implicit zero padding to be added on both sides.

  • data_format (default: 'NDHWC') – NDHWC” or “NCDHW”. Defaults to “NDHWC”.

  • count_include_pad (default: False) – Whether to include padding in the averaging calculation.

  • ceil_mode (default: False) – Whether to use ceil or floor for creating the output shape.

  • divisor_override (default: None) – If specified, it will be used as divisor, otherwise kernel_size will be used. # noqa: E501

class ivy.stateful.layers.Conv1D(*args, **kwargs)[source]#

Bases: Module

__init__(input_channels, output_channels, filter_size, strides, padding, /, *, weight_initializer=<ivy.stateful.initializers.GlorotUniform object>, bias_initializer=<ivy.stateful.initializers.Zeros object>, with_bias=True, data_format='NWC', dilations=1, device=None, v=None, dtype=None)[source]#

1D convolutional layer.

Parameters:
  • input_channels – Number of input channels for the layer.

  • output_channels – Number of output channels for the layer.

  • filter_size – Size of the convolutional filter.

  • strides – The stride of the sliding window for each dimension of input.

  • padding – SAME” or “VALID” indicating the algorithm, or list indicating the per-dimension paddings.

  • weight_initializer (default: <ivy.stateful.initializers.GlorotUniform object at 0x7ff3bf81ad70>) – Initializer for the weights. Default is GlorotUniform.

  • bias_initializer (default: <ivy.stateful.initializers.Zeros object at 0x7ff3bf819780>) – Initializer for the bias. Default is Zeros.

  • with_bias (default: True) – Whether or not to include a bias term, default is True.

  • data_format (default: 'NWC') – NWC” or “NCW”. Defaults to “NWC”.

  • dilations (default: 1) – The dilation factor for each dimension of input. (Default value = 1)

  • device (default: None) – device on which to create the layer’s variables ‘cuda:0’, ‘cuda:1’, ‘cpu’ etc. Default is cpu.

  • v (default: None) – the variables for each of the conv layer, as a container, constructed internally by default.

  • dtype (default: None) –

    the desired data type of the internal variables to be created if not

    provided. Default is None.

class ivy.stateful.layers.Conv1DTranspose(*args, **kwargs)[source]#

Bases: Module

__init__(input_channels, output_channels, filter_size, strides, padding, /, *, weight_initializer=<ivy.stateful.initializers.GlorotUniform object>, bias_initializer=<ivy.stateful.initializers.Zeros object>, with_bias=True, output_shape=None, data_format='NWC', dilations=1, device=None, v=None, dtype=None)[source]#

1D transpose convolutional layer.

Parameters:
  • input_channels – Number of input channels for the layer.

  • output_channels – Number of output channels for the layer.

  • filter_size – Size of the convolutional filter.

  • strides – The stride of the sliding window for each dimension of input.

  • padding – SAME” or “VALID” indicating the algorithm, or list indicating the per-dimension paddings.

  • weight_initializer (default: <ivy.stateful.initializers.GlorotUniform object at 0x7ff3bf819420>) – Initializer for the weights. Default is GlorotUniform.

  • bias_initializer (default: <ivy.stateful.initializers.Zeros object at 0x7ff3bf81a020>) – Initializer for the bias. Default is Zeros.

  • with_bias (default: True) – Whether or not to include a bias term, default is True.

  • output_shape (default: None) – Shape of the output (Default value = None)

  • data_format (default: 'NWC') – NWC” or “NCW”. Defaults to “NWC”.

  • dilations (default: 1) – The dilation factor for each dimension of input. (Default value = 1)

  • device (default: None) – device on which to create the layer’s variables ‘cuda:0’, ‘cuda:1’, ‘cpu’ etc. Default is cpu.

  • v (default: None) – the variables for each of the conv layer, as a container, constructed internally by default.

  • dtype (default: None) –

    the desired data type of the internal variables to be created if not

    provided. Default is None.

class ivy.stateful.layers.Conv2D(*args, **kwargs)[source]#

Bases: Module

__init__(input_channels, output_channels, filter_shape, strides, padding, /, *, weight_initializer=<ivy.stateful.initializers.GlorotUniform object>, bias_initializer=<ivy.stateful.initializers.Zeros object>, with_bias=True, data_format='NHWC', dilations=1, device=None, v=None, dtype=None)[source]#

2D convolutional layer.

Parameters:
  • input_channels – Number of input channels for the layer.

  • output_channels – Number of output channels for the layer.

  • filter_shape – Shape of the convolutional filter.

  • strides – The stride of the sliding window for each dimension of input.

  • padding – SAME” or “VALID” indicating the algorithm, or list indicating the per-dimension paddings.

  • weight_initializer (default: <ivy.stateful.initializers.GlorotUniform object at 0x7ff3bf818970>) – Initializer for the weights. Default is GlorotUniform.

  • bias_initializer (default: <ivy.stateful.initializers.Zeros object at 0x7ff3bf81abf0>) – Initializer for the bias. Default is Zeros.

  • with_bias (default: True) – Whether or not to include a bias term, default is True.

  • data_format (default: 'NHWC') – NHWC” or “NCHW”. Defaults to “NHWC”.

  • dilations (default: 1) – The dilation factor for each dimension of input. (Default value = 1)

  • device (default: None) – device on which to create the layer’s variables ‘cuda:0’, ‘cuda:1’, ‘cpu’ etc. Default is cpu.

  • v (default: None) – the variables for each of the conv layer, as a container, constructed internally by default.

  • dtype (default: None) –

    the desired data type of the internal variables to be created if not

    provided. Default is None.

class ivy.stateful.layers.Conv2DTranspose(*args, **kwargs)[source]#

Bases: Module

__init__(input_channels, output_channels, filter_shape, strides, padding, /, *, weight_initializer=<ivy.stateful.initializers.GlorotUniform object>, bias_initializer=<ivy.stateful.initializers.Zeros object>, with_bias=True, output_shape=None, data_format='NHWC', dilations=1, device=None, v=None, dtype=None)[source]#

2D convolutional transpose layer.

Parameters:
  • input_channels – Number of input channels for the layer.

  • output_channels – Number of output channels for the layer.

  • filter_shape – Shape of the convolutional filter.

  • strides – The stride of the sliding window for each dimension of input.

  • padding – SAME” or “VALID” indicating the algorithm, or list indicating the per-dimension paddings.

  • weight_initializer (default: <ivy.stateful.initializers.GlorotUniform object at 0x7ff3bf81ac20>) – Initializer for the weights. Default is GlorotUniform.

  • bias_initializer (default: <ivy.stateful.initializers.Zeros object at 0x7ff3bf819390>) – Initializer for the bias. Default is Zeros.

  • with_bias (default: True) – Whether or not to include a bias term, default is True.

  • output_shape (default: None) – Shape of the output (Default value = None)

  • data_format (default: 'NHWC') – NHWC” or “NCHW”. Defaults to “NHWC”.

  • dilations (default: 1) – The dilation factor for each dimension of input. (Default value = 1)

  • device (default: None) – device on which to create the layer’s variables ‘cuda:0’, ‘cuda:1’, ‘cpu’ etc. Default is cpu.

  • v (default: None) – the variables for each of the conv layer, as a container, constructed internally by default.

  • dtype (default: None) –

    the desired data type of the internal variables to be created if not

    provided. Default is None.

class ivy.stateful.layers.Conv3D(*args, **kwargs)[source]#

Bases: Module

__init__(input_channels, output_channels, filter_shape, strides, padding, /, *, weight_initializer=<ivy.stateful.initializers.GlorotUniform object>, bias_initializer=<ivy.stateful.initializers.Zeros object>, with_bias=True, data_format='NDHWC', dilations=1, device=None, v=None, dtype=None)[source]#

3D convolutional layer.

Parameters:
  • input_channels – Number of input channels for the layer.

  • output_channels – Number of output channels for the layer.

  • filter_shape – Shape of the convolutional filter.

  • strides – The stride of the sliding window for each dimension of input.

  • padding – SAME” or “VALID” indicating the algorithm, or list indicating the per-dimension paddings.

  • weight_initializer (default: <ivy.stateful.initializers.GlorotUniform object at 0x7ff3bf818a00>) – Initializer for the weights. Default is GlorotUniform.

  • bias_initializer (default: <ivy.stateful.initializers.Zeros object at 0x7ff3bf81a2c0>) – Initializer for the bias. Default is Zeros.

  • with_bias (default: True) – Whether or not to include a bias term, default is True.

  • data_format (default: 'NDHWC') – NDHWC” or “NCDHW”. Defaults to “NDHWC”.

  • dilations (default: 1) – The dilation factor for each dimension of input. (Default value = 1)

  • device (default: None) – device on which to create the layer’s variables ‘cuda:0’, ‘cuda:1’, ‘cpu’ etc. Default is cpu.

  • v (default: None) – the variables for each of the conv layer, as a container, constructed internally by default.

  • dtype (default: None) –

    the desired data type of the internal variables to be created if not

    provided. Default is None.

class ivy.stateful.layers.Conv3DTranspose(*args, **kwargs)[source]#

Bases: Module

__init__(input_channels, output_channels, filter_shape, strides, padding, /, *, weight_initializer=<ivy.stateful.initializers.GlorotUniform object>, bias_initializer=<ivy.stateful.initializers.Zeros object>, with_bias=True, output_shape=None, data_format='NDHWC', dilations=1, device=None, v=None, dtype=None)[source]#

3D convolutional transpose layer.

Parameters:
  • input_channels – Number of input channels for the layer.

  • output_channels – Number of output channels for the layer.

  • filter_shape – Shape of the convolutional filter.

  • strides – The stride of the sliding window for each dimension of input.

  • padding – SAME” or “VALID” indicating the algorithm, or list indicating the per-dimension paddings.

  • weight_initializer (default: <ivy.stateful.initializers.GlorotUniform object at 0x7ff3bf81ace0>) – Initializer for the weights. Default is GlorotUniform.

  • bias_initializer (default: <ivy.stateful.initializers.Zeros object at 0x7ff3bf819ae0>) – Initializer for the bias. Default is Zeros.

  • with_bias (default: True) – Whether or not to include a bias term, default is True.

  • output_shape (default: None) – Shape of the output (Default value = None)

  • data_format (default: 'NDHWC') – NDHWC” or “NCDHW”. Defaults to “NDHWC”.

  • dilations (default: 1) – The dilation factor for each dimension of input. (Default value = 1)

  • device (default: None) – device on which to create the layer’s variables ‘cuda:0’, ‘cuda:1’, ‘cpu’ etc. Default is cpu.

  • v (default: None) – the variables for each of the conv layer, as a container, constructed internally by default.

  • dtype (default: None) –

    the desired data type of the internal variables to be created if not

    provided. Default is None.

class ivy.stateful.layers.Dct(*args, **kwargs)[source]#

Bases: Module

__init__(*, type=2, n=None, axis=-1, norm=None, device=None, dtype=None)[source]#

Class for applying the Discrete Cosine Transform over mini-batch of inputs.

Parameters:
  • x – The input signal.

  • type (default: 2) – The type of the dct. Must be 1, 2, 3 or 4.

  • n (default: None) – The length of the transform. If n is less than the input signal length, then x is truncated, if n is larger then x is zero-padded.

  • axis (default: -1) – The axis to compute the DCT along.

  • norm (default: None) – The type of normalization to be applied. Must be either None or “ortho”.

  • device (default: None) – device on which to create the layer’s variables ‘cuda:0’, ‘cuda:1’, ‘cpu’

class ivy.stateful.layers.DepthwiseConv2D(*args, **kwargs)[source]#

Bases: Module

__init__(num_channels, filter_shape, strides, padding, /, *, weight_initializer=<ivy.stateful.initializers.GlorotUniform object>, bias_initializer=<ivy.stateful.initializers.Zeros object>, with_bias=True, data_format='NHWC', dilations=1, device=None, v=None, dtype=None)[source]#

Depthwise 2D convolutional layer.

Parameters:
  • num_channels – Number of input channels for the layer.

  • filter_shape – Shape of the convolutional filter.

  • strides – The stride of the sliding window for each dimension of input.

  • padding – SAME” or “VALID” indicating the algorithm, or list indicating the per-dimension paddings.

  • weight_initializer (default: <ivy.stateful.initializers.GlorotUniform object at 0x7ff3bf81a980>) – Initializer for the weights. Default is GlorotUniform.

  • bias_initializer (default: <ivy.stateful.initializers.Zeros object at 0x7ff3bf819f60>) – Initializer for the bias. Default is Zeros.

  • with_bias (default: True) – Whether or not to include a bias term, default is True.

  • data_format (default: 'NHWC') – NHWC” or “NCHW”. Defaults to “NHWC”.

  • dilations (default: 1) – The dilation factor for each dimension of input. (Default value = 1)

  • device (default: None) – device on which to create the layer’s variables ‘cuda:0’, ‘cuda:1’, ‘cpu’ etc. Default is cpu.

  • v (default: None) – the variables for each of the conv layer, as a container, constructed internally by default.

  • dtype (default: None) –

    the desired data type of the internal variables to be created if not

    provided. Default is None.

class ivy.stateful.layers.Dropout(prob, scale=True, dtype=None, training=True)[source]#

Bases: Module

__init__(prob, scale=True, dtype=None, training=True)[source]#

Dropout layer. The layer randomly zeroes some of the elements of the input tensor with probability p using samples from a Bernoull distribution.

Parameters:
  • prob – The probability of zeroing out each array element.

  • scale (bool, default: True) – Whether to scale the output by 1/(1-prob), default is True.

  • dtype (default: None) – the desired data type of the internal variables to be created. Default is None.

  • training (bool, default: True) – Turn on dropout if training, turn off otherwise. Default is True.

class ivy.stateful.layers.Embedding(num_embeddings, embedding_dim, padding_idx=None, max_norm=None, /, *, weight_initializer=<ivy.stateful.initializers.GlorotUniform object>, device=None, v=None, dtype=None)[source]#

Bases: Module

__init__(num_embeddings, embedding_dim, padding_idx=None, max_norm=None, /, *, weight_initializer=<ivy.stateful.initializers.GlorotUniform object>, device=None, v=None, dtype=None)[source]#

Class for embedding indices into a dense representation. The Embedding layer is a simple lookup table for dense vectors. It’s typically used to store word embeddings and query them using indices.

Parameters:
  • num_embeddingss (int) – Number of embeddings.

  • embedding_dim (int) – Dimension of the embeddings.

  • padding_idx (int) – If given, pads the output with zeros whenever it encounters the index.

  • max_norm (float) – If given, each embedding vector with L2 norm larger than max_norm is renormalized to have norm max_norm.

  • weight_initializer (Initializer) – Initializer for the weights.

  • device (str) – device on which to create the layer’s variables ‘cuda:0’, ‘cuda:1’, ‘cpu’

  • v (dict) – the variables for the embedding layer, as a container, constructed internally by default.

  • dtype (default: None) –

    the desired data type of the internal variables to be created if not

    provided. Default is None.

class ivy.stateful.layers.FFT(*args, **kwargs)[source]#

Bases: Module

__init__(dim, /, *, norm='backward', n=None, out=None, device=None, dtype=None)[source]#

Class for applying FFT to input.

Parameters:
  • dim (int) – Dimension along which to take the FFT.

  • norm (str) – Normalization mode. Default: ‘backward’

  • n (int) – Size of the FFT. Default: None

  • out (int) – Size of the output. Default: None

class ivy.stateful.layers.IDct(*args, **kwargs)[source]#

Bases: Module

__init__(*, type=2, n=None, axis=-1, norm=None, device=None, dtype=None)[source]#

Class for applying the Discrete Cosine Transform over mini-batch of inputs.

Parameters:
  • x – The input signal.

  • type (default: 2) – The type of the idct. Must be 1, 2, 3 or 4.

  • n (default: None) – The length of the transform. If n is less than the input signal length, then x is truncated, if n is larger then x is zero-padded.

  • axis (default: -1) – The axis to compute the IDCT along.

  • norm (default: None) – The type of normalization to be applied. Must be either None or “ortho”.

  • device (default: None) – device on which to create the layer’s variables ‘cuda:0’, ‘cuda:1’, ‘cpu’

extra_repr()[source]#
class ivy.stateful.layers.IFFT(*args, **kwargs)[source]#

Bases: Module

__init__(dim, /, *, norm='backward', n=None, out=None, device=None, dtype=None)[source]#

Class for applying IFFT to input.

Parameters:
  • dim (int) – Dimension along which to take the IFFT.

  • norm (str) – Optional argument indicating the normalization mode. Possible Values : “backward”, “ortho” or “forward”. “backward” indicates no normalization. “ortho” indicates normalization by 1/sqrt(n). “forward” indicates normalization by 1/n. Default: “backward”

  • n (int) – Optional argument indicating the sequence length, if given, the input would be padded with zero or truncated to length n before performing IFFT. Should be a integer greater than 1. Default: None

  • out (int) – Size of the output. Default: None

class ivy.stateful.layers.Identity(*args, **kwargs)[source]#

Bases: Module

__init__()[source]#

Identity layer. The layer is argument insensitive and returns the input argument as output when called.

It’s typically used as a placeholder when no operation is to be performed. It doesn’t have any learnable parameter.

class ivy.stateful.layers.LSTM(*args, **kwargs)[source]#

Bases: Module

__init__(input_channels, output_channels, /, *, weight_initializer=<ivy.stateful.initializers.GlorotUniform object>, num_layers=1, return_sequence=True, return_state=True, device=None, v=None, dtype=None)[source]#

LSTM layer, which is a set of stacked lstm cells.

Parameters:
  • input_channels – Number of input channels for the layer

  • output_channels – Number of output channels for the layer

  • weight_initializer (default: <ivy.stateful.initializers.GlorotUniform object at 0x7ff3bf818c10>) – Initializer for the weights. Default is GlorotUniform.

  • num_layers (default: 1) – Number of lstm cells in the lstm layer, default is 1.

  • return_sequence (default: True) – Whether or not to return the entire output sequence, or just the latest timestep. Default is True.

  • return_state (default: True) – Whether or not to return the latest hidden and cell states. Default is True.

  • device (default: None) – device on which to create the layer’s variables ‘cuda:0’, ‘cuda:1’, ‘cpu’ etc. Default is cpu.

  • v (default: None) – the variables for each of the lstm cells, as a container, constructed internally by default.

  • dtype (default: None) –

    the desired data type of the internal variables to be created if not

    provided. Default is None.

get_initial_state(batch_shape, dtype=None)[source]#

Get the initial state of the hidden and cell states, if not provided explicitly.

Parameters:
  • batch_shape

  • dtype (default: None) –

    the desired data type of the internal variables to be created if not

    provided. Default is None.

class ivy.stateful.layers.Linear(*args, **kwargs)[source]#

Bases: Module

__init__(input_channels, output_channels, /, *, weight_initializer=<ivy.stateful.initializers.GlorotUniform object>, bias_initializer=<ivy.stateful.initializers.Zeros object>, with_bias=True, device=None, v=None, dtype=None)[source]#

Linear layer, also referred to as dense or fully connected. The layer receives tensors with input_channels last dimension and returns a new tensor with output_channels last dimension, following matrix multiplication with the weight matrix and addition with the bias vector.

Parameters:
  • input_channels – Number of input channels for the layer.

  • output_channels – Number of output channels for the layer.

  • weight_initializer (default: <ivy.stateful.initializers.GlorotUniform object at 0x7ff3bf819d80>) – Initializer for the weights. Default is GlorotUniform.

  • bias_initializer (default: <ivy.stateful.initializers.Zeros object at 0x7ff3bf8187f0>) – Initializer for the bias. Default is Zeros.

  • with_bias (default: True) – Whether or not to include a bias term, default is True.

  • device (default: None) – device on which to create the layer’s variables ‘cuda:0’, ‘cuda:1’, ‘cpu’ etc. Default is cpu.

  • v (default: None) – the variables for the linear layer, as a container, constructed internally by default.

  • dtype (default: None) –

    the desired data type of the internal variables to be created if not

    provided. Default is None.

class ivy.stateful.layers.MaxPool1D(*args, **kwargs)[source]#

Bases: Module

__init__(kernel_size, stride, padding, /, *, data_format='NWC', device=None, v=None, dtype=None)[source]#

Class for applying Max Pooling over a mini-batch of inputs.

Parameters:
  • kernel_size – The size of the window to take a max over.

  • stride – The stride of the window. Default value: 1

  • padding – Implicit zero padding to be added on both sides.

  • device (default: None) – device on which to create the layer’s variables ‘cuda:0’, ‘cuda:1’, ‘cpu’

class ivy.stateful.layers.MaxPool2D(*args, **kwargs)[source]#

Bases: Module

__init__(kernel_size, stride, padding, /, *, data_format='NHWC', device=None, v=None, dtype=None)[source]#

Class for applying Max Pooling over a mini-batch of inputs.

Parameters:
  • kernel_size – The size of the window to take a max over.

  • stride – The stride of the window. Default value: 1

  • padding – Implicit zero padding to be added on both sides.

  • device (default: None) – device on which to create the layer’s variables ‘cuda:0’, ‘cuda:1’, ‘cpu’

class ivy.stateful.layers.MaxPool3D(*args, **kwargs)[source]#

Bases: Module

__init__(kernel_size, stride, padding, /, *, data_format='NDHWC', device=None, dtype=None)[source]#

Class for applying 3D Max Pooling over 5D inputs.

Parameters:
  • kernel_size – The size of the window to take a max over.

  • stride – The stride of the window.

  • padding – Implicit zero padding to be added on both sides.

class ivy.stateful.layers.MultiHeadAttention(*args, **kwargs)[source]#

Bases: Module

__init__(embed_dim=None, /, *, key_dim=None, value_dim=None, num_heads=8, head_dim=None, dropout_rate=0.0, use_proj_bias=True, attention_axes=None, scale=None, device=None, v=None, build_mode='on_init', dtype=None, training=True)[source]#

Multi Head Attention layer.

Parameters:
  • embed_dim (default: None) – The expected feature size in the input and output.

  • key_dim (default: None) – The input feature size for key. If None, assumed equal to embed_dim. Default None.

  • value_dim (default: None) – The input feature size for value. If None, assumed equal to embed_dim. Default None.

  • num_heads (default: 8) – Number of parallel attention heads. Note that embed_dim will be split across num_heads (i.e. each head will have dimension embed_dim // num_heads). Default is 8.

  • head_dim (default: None) – Size of each attention head for query and key. Note that only two out of (embed_dim, num_heads, and head_dim) should be provided Default is None.

  • dropout_rate (default: 0.0) – The dropout probability used on attention weights to drop some attention targets. 0 for no dropout. Default is 0.

  • use_proj_bias (default: True) – If specified, adds bias to input / output projection layers. Default is True.

  • attention_axes (default: None) – axes over which the attention is applied. None means attention over all axes, but batch, heads, and features. Default is None.

  • scale (default: None) – The value by which to scale the query-key similarity measure. Default is head_dim^-0.5

  • device (default: None) – device on which to create the layer’s variables ‘cuda:0’, ‘cuda:1’, ‘cpu’ etc. Default is cpu.

  • v (default: None) – the variables for the attention layer, as a container, constructed internally by default.

  • build_mode (default: 'on_init') – How the Module is built, either on initialization (now), explicitly by the user by calling build(), or the first time the __call__ method is run. Default is on initialization.

  • dtype (default: None) – the desired data type of the internal variables to be created if not provided. Default is None.

  • training (default: True) – If True, dropout is used, otherwise dropout is not activated.

This should have hopefully given you an overview of the layers submodule, if you have any questions, please feel free to reach out on our discord!