Routing Problems¶

Asymmetric Traveling Salesman Problem (ATSP)¶

class rl4co.envs.atsp.ATSPEnv(num_loc=10, min_dist=0, max_dist=1, tmat_class=True, td_params=None, **kwargs)[source]¶

Bases: RL4COEnvBase

Asymmetric Traveling Salesman Problem environment At each step, the agent chooses a city to visit. The reward is the -infinite unless the agent visits all the cities. In that case, the reward is (-)length of the path: maximizing the reward is equivalent to minimizing the path length. Unlike the TSP, the distance matrix is asymmetric, i.e., the distance from A to B is not necessarily the same as the distance from B to A.

Parameters:

num_loc¶ (int) – number of locations (cities) in the TSP
td_params¶ (Optional[TensorDict]) – parameters of the environment
seed¶ – seed for the environment
device¶ – device to use. Generally, no need to set as tensors are updated on the fly

Initializes internal Module state, shared by both nn.Module and ScriptModule.

generate_data(batch_size)[source]¶

Dataset generation

Return type:: TensorDict

get_reward(td, actions)[source]¶

Function to compute the reward. Can be called by the agent to compute the reward of the current state This is faster than calling step() and getting the reward from the returned TensorDict at each time for CO tasks

Return type:: TensorDict

static render(td, actions=None, ax=None)[source]¶: Render the environment

name = 'atsp'¶

Capacitated Vehicle Routing Problem (CVRP)¶

class rl4co.envs.cvrp.CVRPEnv(num_loc=20, min_loc=0, max_loc=1, min_demand=1, max_demand=10, vehicle_capacity=1.0, capacity=None, td_params=None, **kwargs)[source]¶

Bases: RL4COEnvBase

Capacitated Vehicle Routing Problem (CVRP) environment. At each step, the agent chooses a customer to visit depending on the current location and the remaining capacity. When the agent visits a customer, the remaining capacity is updated. If the remaining capacity is not enough to visit any customer, the agent must go back to the depot. The reward is the -infinite unless the agent visits all the cities. In that case, the reward is (-)length of the path: maximizing the reward is equivalent to minimizing the path length.

Parameters:

num_loc¶ (int) – number of locations (cities) in the VRP, without the depot. (e.g. 10 means 10 locs + 1 depot)
min_loc¶ (float) – minimum value for the location coordinates
max_loc¶ (float) – maximum value for the location coordinates
min_demand¶ (float) – minimum value for the demand of each customer
max_demand¶ (float) – maximum value for the demand of each customer
vehicle_capacity¶ (float) – capacity of the vehicle
capacity¶ (Optional[float]) – capacity of the vehicle
td_params¶ (Optional[TensorDict]) – parameters of the environment

Initializes internal Module state, shared by both nn.Module and ScriptModule.

static check_solution_validity(td, actions)[source]¶: Check that solution is valid: nodes are not visited twice except depot and capacity is not exceeded

generate_data(batch_size)[source]¶

Dataset generation

Return type:: TensorDict

static get_action_mask(td)[source]¶

Function to compute the action mask (feasible actions) for the current state Action mask is 1 if the action is feasible, 0 otherwise

Return type:: Tensor

get_reward(td, actions)[source]¶

Function to compute the reward. Can be called by the agent to compute the reward of the current state This is faster than calling step() and getting the reward from the returned TensorDict at each time for CO tasks

Return type:: TensorDict

static load_data(fpath, batch_size=[])[source]¶: Dataset loading from file Normalize demand by capacity to be in [0, 1]

static render(td, actions=None, ax=None)[source]¶: Render the environment

name = 'cvrp'¶

Multiple Traveling Salesman Problem (mTSP)¶

class rl4co.envs.mtsp.MTSPEnv(num_loc=20, min_loc=0, max_loc=1, min_num_agents=5, max_num_agents=5, cost_type='minmax', td_params=None, **kwargs)[source]¶

Bases: RL4COEnvBase

Multiple Traveling Salesman Problem environment At each step, an agent chooses to visit a city. A maximum of num_agents agents can be employed to visit the cities. The cost can be defined in two ways:

minmax: (default) the reward is the maximum of the path lengths of all the agents

sum: the cost is the sum of the path lengths of all the agents

Reward is - cost, so the goal is to maximize the reward (minimize the cost).

Parameters:

num_loc¶ (int) – number of locations (cities) to visit
min_loc¶ (float) – minimum value of the locations
max_loc¶ (float) – maximum value of the locations
min_num_agents¶ (int) – minimum number of agents
max_num_agents¶ (int) – maximum number of agents
cost_type¶ (str) – type of cost to use, either minmax or sum
td_params¶ (Optional[TensorDict]) – parameters for the TensorDict specs

Initializes internal Module state, shared by both nn.Module and ScriptModule.

generate_data(batch_size)[source]¶

Dataset generation

Return type:: TensorDict

get_reward(td, actions=None)[source]¶

Function to compute the reward. Can be called by the agent to compute the reward of the current state This is faster than calling step() and getting the reward from the returned TensorDict at each time for CO tasks

Return type:: TensorDict

static render(td, actions=None, ax=None)[source]¶: Render the environment

name = 'mtsp'¶

Orienteering Problem (OP)¶

class rl4co.envs.op.OPEnv(num_loc=20, min_loc=0, max_loc=1, max_length=None, prize_type='dist', td_params=None, **kwargs)[source]¶

Bases: RL4COEnvBase

Orienteering Problem (OP) environment. At each step, the agent chooses a location to visit in order to maximize the collected prize. The total length of the path must not exceed a given threshold.

Parameters:

num_loc¶ (int) – number of locations (cities) in the OP
min_loc¶ (float) – minimum value of the locations
max_loc¶ (float) – maximum value of the locations
max_length¶ (Union[float, Tensor, None]) – maximum length of the path
prize_type¶ (str) – type of prize to collect. Can be: - “dist”: the prize is the distance from the previous location - “unif”: the prize is a uniform random variable - “const”: the prize is a constant
td_params¶ (Optional[TensorDict]) – parameters of the environment

Initializes internal Module state, shared by both nn.Module and ScriptModule.

static check_solution_validity(td, actions, add_distance_to_depot=True)[source]¶: Check that solution is valid: nodes are not visited twice except depot and capacity is not exceeded. If add_distance_to_depot if True, then the distance to the depot is added to max length since by default, the max length is modified in the reset function to account for the distance to the depot.

generate_data(batch_size, prize_type=None)[source]¶

Dataset generation

Return type:: TensorDict

static get_action_mask(td)[source]¶

Get action mask with 1 = feasible action, 0 = infeasible action. Cannot visit if already visited, if depot has been visited, or if the length exceeds the maximum length.

Return type:: Tensor

get_reward(td, actions)[source]¶

Reward is the sum of the prizes of visited nodes

Return type:: TensorDict

static render(td, actions=None, ax=None)[source]¶: Render the environment

name = 'op'¶

Pickup and Delivery Problem (PDP)¶

class rl4co.envs.pdp.PDPEnv(num_loc=20, min_loc=0, max_loc=1, td_params=None, **kwargs)[source]¶

Bases: RL4COEnvBase

Pickup and Delivery Problem (PDP) environment. The environment is made of num_loc + 1 locations (cities):

1 depot

num_loc / 2 pickup locations

num_loc / 2 delivery locations

The goal is to visit all the pickup and delivery locations in the shortest path possible starting from the depot The conditions is that the agent must visit a pickup location before visiting its corresponding delivery location

Parameters:

num_loc¶ (int) – number of locations (cities) in the TSP
td_params¶ (Optional[TensorDict]) – parameters of the environment
seed¶ – seed for the environment
device¶ – device to use. Generally, no need to set as tensors are updated on the fly

Initializes internal Module state, shared by both nn.Module and ScriptModule.

generate_data(batch_size)[source]¶

Dataset generation

Return type:: TensorDict

static get_reward(td, actions)[source]¶

Function to compute the reward. Can be called by the agent to compute the reward of the current state This is faster than calling step() and getting the reward from the returned TensorDict at each time for CO tasks

Return type:: TensorDict

static render(td, actions=None, ax=None)[source]¶: Render the environment

name = 'pdp'¶

Prize Collecting Traveling Salesman Problem (PCTSP)¶

class rl4co.envs.pctsp.PCTSPEnv(num_loc=10, min_loc=0, max_loc=1, penalty_factor=3, prize_required=1, check_solution=False, td_params=None, **kwargs)[source]¶

Bases: RL4COEnvBase

Prize-collecting TSP (PCTSP) environment. The goal is to collect as much prize as possible while minimizing the total travel cost. The environment is stochastic, the prize is only revealed when the node is visited.

Parameters:

num_loc¶ (int) – Number of locations
min_loc¶ (float) – Minimum location value
max_loc¶ (float) – Maximum location value
penalty_factor¶ (float) – Penalty factor
prize_required¶ (float) – Minimum prize required to visit a node
check_solution¶ (bool) – Set to False by default for small bug happening around 0.01% of the time (TODO: fix)
td_params¶ (Optional[TensorDict]) – Parameters of the environment

Initializes internal Module state, shared by both nn.Module and ScriptModule.

static check_solution_validity(td, actions)[source]¶: Check that the solution is valid, i.e. contains all nodes once at most, and either prize constraint is met or all nodes are visited

generate_data(batch_size)[source]¶

Dataset generation

Return type:: TensorDict

static get_action_mask(td)[source]¶

Cannot visit depot if not yet collected 1 total prize and there are unvisited nodes

Return type:: Tensor

get_reward(td, actions)[source]¶

Reward is saved penalties - (total length + penalty)

Return type:: Tensor

static render(td, actions=None, ax=None)[source]¶: Render the environment

name = 'pctsp'¶

property stochastic¶

Split Delivery Vehicle Routing Problem (SDVRP)¶

class rl4co.envs.sdvrp.SDVRPEnv(num_loc=20, min_loc=0, max_loc=1, min_demand=1, max_demand=10, vehicle_capacity=1.0, capacity=None, td_params=None, **kwargs)[source]¶

Bases: CVRPEnv

Split Delivery Vehicle Routing Problem (SDVRP) environment. SDVRP is a generalization of CVRP, where nodes can be visited multiple times and a fraction of the demand can be met. At each step, the agent chooses a customer to visit depending on the current location and the remaining capacity. When the agent visits a customer, the remaining capacity is updated. If the remaining capacity is not enough to visit any customer, the agent must go back to the depot. The reward is the -infinite unless the agent visits all the cities. In that case, the reward is (-)length of the path: maximizing the reward is equivalent to minimizing the path length.

Parameters:

num_loc¶ (int) – number of locations (cities) in the VRP, without the depot. (e.g. 10 means 10 locs + 1 depot)
min_loc¶ (float) – minimum value for the location coordinates
max_loc¶ (float) – maximum value for the location coordinates
min_demand¶ (float) – minimum value for the demand of each customer
max_demand¶ (float) – maximum value for the demand of each customer
vehicle_capacity¶ (float) – capacity of the vehicle
capacity¶ (Optional[float]) – capacity of the vehicle
td_params¶ (Optional[TensorDict]) – parameters of the environment

Initializes internal Module state, shared by both nn.Module and ScriptModule.

static check_solution_validity(td, actions)[source]¶: Check that the solution is valid (all demand is satisfied)

static get_action_mask(td)[source]¶

Function to compute the action mask (feasible actions) for the current state Action mask is 1 if the action is feasible, 0 otherwise

Return type:: Tensor

name = 'sdvrp'¶

Stochastic Prize Collecting Traveling Salesman Problem (SPCTSP)¶

class rl4co.envs.spctsp.SPCTSPEnv(*args, _inplace_update=False, _batch_locked=True, **kwargs)[source]¶

Bases: PCTSPEnv

Stochastic Prize Collecting Traveling Salesman Problem (SPCTSP) environment.

Note

The only difference with deterministic PCTSP is that the prizes are stochastic (i.e. the expected prize is not the same as the real prize).

Initializes internal Module state, shared by both nn.Module and ScriptModule.

name = 'spctsp'¶

property stochastic¶

Traveling Salesman Problem (TSP)¶

class rl4co.envs.tsp.TSPEnv(num_loc=20, min_loc=0, max_loc=1, td_params=None, **kwargs)[source]¶

Bases: RL4COEnvBase

Traveling Salesman Problem environment At each step, the agent chooses a city to visit. The reward is the -infinite unless the agent visits all the cities. In that case, the reward is (-)length of the path: maximizing the reward is equivalent to minimizing the path length.

Parameters:

num_loc¶ (int) – number of locations (cities) in the TSP
td_params¶ (Optional[TensorDict]) – parameters of the environment
seed¶ – seed for the environment
device¶ – device to use. Generally, no need to set as tensors are updated on the fly

Initializes internal Module state, shared by both nn.Module and ScriptModule.

generate_data(batch_size)[source]¶

Dataset generation

Return type:: TensorDict

static get_reward(td, actions)[source]¶

Function to compute the reward. Can be called by the agent to compute the reward of the current state This is faster than calling step() and getting the reward from the returned TensorDict at each time for CO tasks

Return type:: TensorDict

static render(td, actions=None, ax=None)[source]¶: Render the environment

name = 'tsp'¶