Shortcuts

Routing Problems

Asymmetric Traveling Salesman Problem (ATSP)

class rl4co.envs.routing.atsp.ATSPEnv(num_loc=10, min_dist=0, max_dist=1, tmat_class=True, td_params=None, **kwargs)[source]

Bases: RL4COEnvBase

Asymmetric Traveling Salesman Problem environment At each step, the agent chooses a city to visit. The reward is 0 unless the agent visits all the cities. In that case, the reward is (-)length of the path: maximizing the reward is equivalent to minimizing the path length. Unlike the TSP, the distance matrix is asymmetric, i.e., the distance from A to B is not necessarily the same as the distance from B to A.

Parameters:
  • num_loc (int) – number of locations (cities) in the TSP

  • td_params (TensorDict) – parameters of the environment

  • seed – seed for the environment

  • device – device to use. Generally, no need to set as tensors are updated on the fly

Initializes internal Module state, shared by both nn.Module and ScriptModule.

generate_data(batch_size)[source]

Dataset generation

Return type:

TensorDict

get_reward(td, actions)[source]

Function to compute the reward. Can be called by the agent to compute the reward of the current state This is faster than calling step() and getting the reward from the returned TensorDict at each time for CO tasks

Return type:

TensorDict

static render(td, actions=None, ax=None)[source]

Render the environment

name = 'atsp'

Capacitated Vehicle Routing Problem (CVRP)

class rl4co.envs.routing.cvrp.CVRPEnv(num_loc=20, min_loc=0, max_loc=1, min_demand=1, max_demand=10, vehicle_capacity=1.0, capacity=None, td_params=None, **kwargs)[source]

Bases: RL4COEnvBase

Capacitated Vehicle Routing Problem (CVRP) environment. At each step, the agent chooses a customer to visit depending on the current location and the remaining capacity. When the agent visits a customer, the remaining capacity is updated. If the remaining capacity is not enough to visit any customer, the agent must go back to the depot. The reward is 0 unless the agent visits all the cities. In that case, the reward is (-)length of the path: maximizing the reward is equivalent to minimizing the path length.

Parameters:
  • num_loc (int) – number of locations (cities) in the VRP, without the depot. (e.g. 10 means 10 locs + 1 depot)

  • min_loc (float) – minimum value for the location coordinates

  • max_loc (float) – maximum value for the location coordinates

  • min_demand (float) – minimum value for the demand of each customer

  • max_demand (float) – maximum value for the demand of each customer

  • vehicle_capacity (float) – capacity of the vehicle

  • capacity (float) – capacity of the vehicle

  • td_params (TensorDict) – parameters of the environment

Initializes internal Module state, shared by both nn.Module and ScriptModule.

static check_solution_validity(td, actions)[source]

Check that solution is valid: nodes are not visited twice except depot and capacity is not exceeded

generate_data(batch_size)[source]

Dataset generation

Return type:

TensorDict

static get_action_mask(td)[source]

Function to compute the action mask (feasible actions) for the current state Action mask is 1 if the action is feasible, 0 otherwise

Return type:

Tensor

get_reward(td, actions)[source]

Function to compute the reward. Can be called by the agent to compute the reward of the current state This is faster than calling step() and getting the reward from the returned TensorDict at each time for CO tasks

Return type:

TensorDict

static load_data(fpath, batch_size=[])[source]

Dataset loading from file Normalize demand by capacity to be in [0, 1]

static render(td, actions=None, ax=None)[source]

Render the environment

name = 'cvrp'

Multiple Traveling Salesman Problem (mTSP)

class rl4co.envs.routing.mtsp.MTSPEnv(num_loc=20, min_loc=0, max_loc=1, min_num_agents=5, max_num_agents=5, cost_type='minmax', td_params=None, **kwargs)[source]

Bases: RL4COEnvBase

Multiple Traveling Salesman Problem environment At each step, an agent chooses to visit a city. A maximum of num_agents agents can be employed to visit the cities. The cost can be defined in two ways:

  • minmax: (default) the reward is the maximum of the path lengths of all the agents

  • sum: the cost is the sum of the path lengths of all the agents

Reward is - cost, so the goal is to maximize the reward (minimize the cost).

Parameters:
  • num_loc (int) – number of locations (cities) to visit

  • min_loc (float) – minimum value of the locations

  • max_loc (float) – maximum value of the locations

  • min_num_agents (int) – minimum number of agents

  • max_num_agents (int) – maximum number of agents

  • cost_type (str) – type of cost to use, either minmax or sum

  • td_params (TensorDict) – parameters for the TensorDict specs

Initializes internal Module state, shared by both nn.Module and ScriptModule.

generate_data(batch_size)[source]

Dataset generation

Return type:

TensorDict

get_reward(td, actions=None)[source]

Function to compute the reward. Can be called by the agent to compute the reward of the current state This is faster than calling step() and getting the reward from the returned TensorDict at each time for CO tasks

Return type:

TensorDict

static render(td, actions=None, ax=None)[source]

Render the environment

name = 'mtsp'

Orienteering Problem (OP)

class rl4co.envs.routing.op.OPEnv(num_loc=20, min_loc=0, max_loc=1, max_length=None, prize_type='dist', td_params=None, **kwargs)[source]

Bases: RL4COEnvBase

Orienteering Problem (OP) environment. At each step, the agent chooses a location to visit in order to maximize the collected prize. The total length of the path must not exceed a given threshold.

Parameters:
  • num_loc (int) – number of locations (cities) in the OP

  • min_loc (float) – minimum value of the locations

  • max_loc (float) – maximum value of the locations

  • max_length (Union[float, Tensor]) – maximum length of the path

  • prize_type (str) – type of prize to collect. Can be: - “dist”: the prize is the distance from the previous location - “unif”: the prize is a uniform random variable - “const”: the prize is a constant

  • td_params (TensorDict) – parameters of the environment

Initializes internal Module state, shared by both nn.Module and ScriptModule.

static check_solution_validity(td, actions, add_distance_to_depot=True)[source]

Check that solution is valid: nodes are not visited twice except depot and capacity is not exceeded. If add_distance_to_depot if True, then the distance to the depot is added to max length since by default, the max length is modified in the reset function to account for the distance to the depot.

generate_data(batch_size, prize_type=None)[source]

Dataset generation

Return type:

TensorDict

static get_action_mask(td)[source]

Get action mask with 1 = feasible action, 0 = infeasible action. Cannot visit if already visited, if depot has been visited, or if the length exceeds the maximum length.

Return type:

Tensor

get_reward(td, actions)[source]

Reward is the sum of the prizes of visited nodes

Return type:

TensorDict

static render(td, actions=None, ax=None)[source]

Render the environment

name = 'op'

Pickup and Delivery Problem (PDP)

class rl4co.envs.routing.pdp.PDPEnv(num_loc=20, min_loc=0, max_loc=1, td_params=None, **kwargs)[source]

Bases: RL4COEnvBase

Pickup and Delivery Problem (PDP) environment. The environment is made of num_loc + 1 locations (cities):

  • 1 depot

  • num_loc / 2 pickup locations

  • num_loc / 2 delivery locations

The goal is to visit all the pickup and delivery locations in the shortest path possible starting from the depot The conditions is that the agent must visit a pickup location before visiting its corresponding delivery location

Parameters:
  • num_loc (int) – number of locations (cities) in the TSP

  • td_params (TensorDict) – parameters of the environment

  • seed – seed for the environment

  • device – device to use. Generally, no need to set as tensors are updated on the fly

Initializes internal Module state, shared by both nn.Module and ScriptModule.

generate_data(batch_size)[source]

Dataset generation

Return type:

TensorDict

static get_reward(td, actions)[source]

Function to compute the reward. Can be called by the agent to compute the reward of the current state This is faster than calling step() and getting the reward from the returned TensorDict at each time for CO tasks

Return type:

TensorDict

static render(td, actions=None, ax=None)[source]

Render the environment

name = 'pdp'

Prize Collecting Traveling Salesman Problem (PCTSP)

class rl4co.envs.routing.pctsp.PCTSPEnv(num_loc=10, min_loc=0, max_loc=1, penalty_factor=3, prize_required=1, check_solution=False, td_params=None, **kwargs)[source]

Bases: RL4COEnvBase

Prize-collecting TSP (PCTSP) environment. The goal is to collect as much prize as possible while minimizing the total travel cost. The environment is stochastic, the prize is only revealed when the node is visited.

Parameters:
  • num_loc (int) – Number of locations

  • min_loc (float) – Minimum location value

  • max_loc (float) – Maximum location value

  • penalty_factor (float) – Penalty factor

  • prize_required (float) – Minimum prize required to visit a node

  • check_solution (bool) – Set to False by default for small bug happening around 0.01% of the time (TODO: fix)

  • td_params (TensorDict) – Parameters of the environment

Initializes internal Module state, shared by both nn.Module and ScriptModule.

static check_solution_validity(td, actions)[source]

Check that the solution is valid, i.e. contains all nodes once at most, and either prize constraint is met or all nodes are visited

generate_data(batch_size)[source]

Dataset generation

Return type:

TensorDict

static get_action_mask(td)[source]

Cannot visit depot if not yet collected 1 total prize and there are unvisited nodes

Return type:

Tensor

get_reward(td, actions)[source]

Reward is saved penalties - (total length + penalty)

Return type:

Tensor

static render(td, actions=None, ax=None)[source]

Render the environment

name = 'pctsp'
property stochastic

Split Delivery Vehicle Routing Problem (SDVRP)

class rl4co.envs.routing.sdvrp.SDVRPEnv(num_loc=20, min_loc=0, max_loc=1, min_demand=1, max_demand=10, vehicle_capacity=1.0, capacity=None, td_params=None, **kwargs)[source]

Bases: CVRPEnv

Split Delivery Vehicle Routing Problem (SDVRP) environment. SDVRP is a generalization of CVRP, where nodes can be visited multiple times and a fraction of the demand can be met. At each step, the agent chooses a customer to visit depending on the current location and the remaining capacity. When the agent visits a customer, the remaining capacity is updated. If the remaining capacity is not enough to visit any customer, the agent must go back to the depot. The reward is the -infinite unless the agent visits all the cities. In that case, the reward is (-)length of the path: maximizing the reward is equivalent to minimizing the path length.

Parameters:
  • num_loc (int) – number of locations (cities) in the VRP, without the depot. (e.g. 10 means 10 locs + 1 depot)

  • min_loc (float) – minimum value for the location coordinates

  • max_loc (float) – maximum value for the location coordinates

  • min_demand (float) – minimum value for the demand of each customer

  • max_demand (float) – maximum value for the demand of each customer

  • vehicle_capacity (float) – capacity of the vehicle

  • capacity (float) – capacity of the vehicle

  • td_params (TensorDict) – parameters of the environment

Initializes internal Module state, shared by both nn.Module and ScriptModule.

static check_solution_validity(td, actions)[source]

Check that the solution is valid (all demand is satisfied)

static get_action_mask(td)[source]

Function to compute the action mask (feasible actions) for the current state Action mask is 1 if the action is feasible, 0 otherwise

Return type:

Tensor

name = 'sdvrp'

Stochastic Prize Collecting Traveling Salesman Problem (SPCTSP)

class rl4co.envs.routing.spctsp.SPCTSPEnv(*args, **kwargs)[source]

Bases: PCTSPEnv

Stochastic Prize Collecting Traveling Salesman Problem (SPCTSP) environment.

Note

The only difference with deterministic PCTSP is that the prizes are stochastic (i.e. the expected prize is not the same as the real prize).

Initializes internal Module state, shared by both nn.Module and ScriptModule.

name = 'spctsp'
property stochastic

Traveling Salesman Problem (TSP)

class rl4co.envs.routing.tsp.TSPEnv(num_loc=20, min_loc=0, max_loc=1, td_params=None, **kwargs)[source]

Bases: RL4COEnvBase

Traveling Salesman Problem environment At each step, the agent chooses a city to visit. The reward is 0 unless the agent visits all the cities. In that case, the reward is (-)length of the path: maximizing the reward is equivalent to minimizing the path length.

Parameters:
  • num_loc (int) – number of locations (cities) in the TSP

  • td_params (TensorDict) – parameters of the environment

  • seed – seed for the environment

  • device – device to use. Generally, no need to set as tensors are updated on the fly

Initializes internal Module state, shared by both nn.Module and ScriptModule.

static check_solution_validity(td, actions)[source]

Check that solution is valid: nodes are visited exactly once

generate_data(batch_size)[source]

Dataset generation

Return type:

TensorDict

get_reward(td, actions)[source]

Function to compute the reward. Can be called by the agent to compute the reward of the current state This is faster than calling step() and getting the reward from the returned TensorDict at each time for CO tasks

Return type:

TensorDict

static render(td, actions=None, ax=None)[source]

Render the environment

name = 'tsp'