Base Autoregressive Model¶
Policy¶
- class rl4co.models.zoo.common.autoregressive.policy.AutoregressivePolicy(env_name, encoder=None, decoder=None, init_embedding=None, context_embedding=None, dynamic_embedding=None, embedding_dim=128, num_encoder_layers=3, num_heads=8, normalization='batch', mask_inner=True, use_graph_context=True, force_flash_attn=False, train_decode_type='sampling', val_decode_type='greedy', test_decode_type='greedy', **unused_kw)[source]¶
Bases:
ModuleBase Auto-regressive policy for NCO construction methods. The policy performs the following steps:
Encode the environment initial state into node embeddings
Decode (autoregressively) to construct the solution to the NCO problem
Based on the policy from Kool et al. (2019) and extended for common use on multiple models in RL4CO.
Note
We recommend to provide the decoding method as a keyword argument to the decoder during actual testing. The {phase}_decode_type arguments are only meant to be used during the main training loop. You may have a look at the evaluation scripts for examples.
- Parameters:
env_name¶ (
str) – Name of the environment used to initialize embeddingsencoder¶ (
Optional[Module]) – Encoder module. Can be passed by sub-classes.decoder¶ (
Optional[Module]) – Decoder module. Can be passed by sub-classes.init_embedding¶ (
Optional[Module]) – Model to use for the initial embedding. If None, use the default embedding for the environmentcontext_embedding¶ (
Optional[Module]) – Model to use for the context embedding. If None, use the default embedding for the environmentdynamic_embedding¶ (
Optional[Module]) – Model to use for the dynamic embedding. If None, use the default embedding for the environmentembedding_dim¶ (
int) – Dimension of the node embeddingsnum_encoder_layers¶ (
int) – Number of layers in the encodernum_heads¶ (
int) – Number of heads in the attention layersnormalization¶ (
str) – Normalization type in the attention layersmask_inner¶ (
bool) – Whether to mask the inner diagonal in the attention layersuse_graph_context¶ (
bool) – Whether to use the initial graph context to modify the queryforce_flash_attn¶ (
bool) – Whether to force the use of flash attention in the attention layerstrain_decode_type¶ (
str) – Type of decoding during trainingval_decode_type¶ (
str) – Type of decoding during validationtest_decode_type¶ (
str) – Type of decoding during testing**unused_kw¶ – Unused keyword arguments
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward(td, env=None, phase='train', return_actions=False, return_entropy=False, return_init_embeds=False, **decoder_kwargs)[source]¶
Forward pass of the policy.
- Parameters:
td¶ (
TensorDict) – TensorDict containing the environment stateenv¶ (
Union[str,RL4COEnvBase,None]) – Environment to use for decodingphase¶ (
str) – Phase of the algorithm (train, val, test)return_actions¶ (
bool) – Whether to return the actionsreturn_entropy¶ (
bool) – Whether to return the entropydecoder_kwargs¶ – Keyword arguments for the decoder
- Returns:
Dictionary containing the reward, log likelihood, and optionally the actions and entropy
- Return type:
out
Encoder¶
- class rl4co.models.zoo.common.autoregressive.encoder.GraphAttentionEncoder(env_name, num_heads, embedding_dim, num_layers, normalization='batch', feed_forward_hidden=512, force_flash_attn=False, init_embedding=None)[source]¶
Bases:
ModuleGraph Attention Encoder as in Kool et al. (2019).
- Parameters:
env_name¶ (
str) – environment name to solvenum_heads¶ (
int) – Number of heads for the attentionembedding_dim¶ (
int) – Dimension of the embeddingsnum_layers¶ (
int) – Number of layers for the encodernormalization¶ (
str) – Normalization to use for the attentionfeed_forward_hidden¶ (
int) – Hidden dimension for the feed-forward networkforce_flash_attn¶ (
bool) – Whether to force the use of flash attention. If True, cast to fp16init_embedding¶ (
Optional[Module]) – Model to use for the initial embedding. If None, use the default embedding for the environment
Initializes internal Module state, shared by both nn.Module and ScriptModule.
Decoder¶
- class rl4co.models.zoo.common.autoregressive.decoder.AutoregressiveDecoder(env_name, embedding_dim, num_heads, use_graph_context=True, select_start_nodes_fn=<function select_start_nodes>, linear_bias=False, context_embedding=None, dynamic_embedding=None, **logit_attn_kwargs)[source]¶
Bases:
ModuleAuto-regressive decoder for constructing solutions for combinatorial optimization problems. Given the environment state and the embeddings, compute the logits and sample actions autoregressively until all the environments in the batch have reached a terminal state. We additionally include support for multi-starts as it is more efficient to do so in the decoder as we can natively perform the attention computation.
Note
There are major differences between this decoding and most RL problems. The most important one is that reward is not defined for partial solutions, hence we have to wait for the environment to reach a terminal state before we can compute the reward with env.get_reward().
Warning
We suppose environments in the done state are still available for sampling. This is because in NCO we need to wait for all the environments to reach a terminal state before we can stop the decoding process. This is in contrast with the TorchRL framework (at the moment) where the env.rollout function automatically resets. You may follow tighter integration with TorchRL here: https://github.com/kaist-silab/rl4co/issues/72.
- Parameters:
env_name¶ (
str) – environment name to solveembedding_dim¶ (
int) – Dimension of the embeddingsnum_heads¶ (
int) – Number of heads for the attentionuse_graph_context¶ (
bool) – Whether to use the initial graph context to modify the queryselect_start_nodes_fn¶ (
callable) – Function to select the start nodes for multi-start decodinglinear_bias¶ (
bool) – Whether to use a bias in the linear projection of the embeddingscontext_embedding¶ (
Optional[Module]) – Module to compute the context embedding. If None, the default is useddynamic_embedding¶ (
Optional[Module]) – Module to compute the dynamic embedding. If None, the default is used
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward(td, embeddings, env=None, decode_type='sampling', num_starts=None, softmax_temp=None, calc_reward=True)[source]¶
Forward pass of the decoder Given the environment state and the pre-computed embeddings, compute the logits and sample actions
- Parameters:
td¶ (
TensorDict) – Input TensorDict containing the environment stateembeddings¶ (
Tensor) – Precomputed embeddings for the nodesenv¶ (
Union[str,RL4COEnvBase,None]) – Environment to use for decoding. If None, the environment is instantiated from env_name. Note that it is more efficient to pass an already instantiated environment each time for fine-grained controldecode_type¶ (
str) – Type of decoding to use. Can be one of: - “sampling”: sample from the logits - “greedy”: take the argmax of the logits - “multistart_sampling”: sample as sampling, but with multi-start decoding - “multistart_greedy”: sample as greedy, but with multi-start decodingnum_starts¶ (
Optional[int]) – Number of multi-starts to use. If None, will be calculated from the action masksoftmax_temp¶ (
Optional[float]) – Temperature for the softmax. If None, default softmax is used from the LogitAttention modulecalc_reward¶ (
bool) – Whether to calculate the reward for the decoded sequence
- Returns:
Tensor of shape (batch_size, seq_len, num_nodes) containing the logits actions: Tensor of shape (batch_size, seq_len) containing the sampled actions td: TensorDict containing the environment state after decoding
- Return type:
outputs