Routing Problems¶
Asymmetric Traveling Salesman Problem (ATSP)¶
- class rl4co.envs.atsp.ATSPEnv(num_loc=10, min_dist=0, max_dist=1, tmat_class=True, td_params=None, **kwargs)[source]¶
Bases:
RL4COEnvBaseAsymmetric Traveling Salesman Problem environment At each step, the agent chooses a city to visit. The reward is the -infinite unless the agent visits all the cities. In that case, the reward is (-)length of the path: maximizing the reward is equivalent to minimizing the path length. Unlike the TSP, the distance matrix is asymmetric, i.e., the distance from A to B is not necessarily the same as the distance from B to A.
- Parameters:
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- get_reward(td, actions)[source]¶
Function to compute the reward. Can be called by the agent to compute the reward of the current state This is faster than calling step() and getting the reward from the returned TensorDict at each time for CO tasks
- Return type:
TensorDict
- name = 'atsp'¶
Capacitated Vehicle Routing Problem (CVRP)¶
- class rl4co.envs.cvrp.CVRPEnv(num_loc=20, min_loc=0, max_loc=1, min_demand=1, max_demand=10, vehicle_capacity=1.0, capacity=None, td_params=None, **kwargs)[source]¶
Bases:
RL4COEnvBaseCapacitated Vehicle Routing Problem (CVRP) environment. At each step, the agent chooses a customer to visit depending on the current location and the remaining capacity. When the agent visits a customer, the remaining capacity is updated. If the remaining capacity is not enough to visit any customer, the agent must go back to the depot. The reward is the -infinite unless the agent visits all the cities. In that case, the reward is (-)length of the path: maximizing the reward is equivalent to minimizing the path length.
- Parameters:
num_loc¶ (
int) – number of locations (cities) in the VRP, without the depot. (e.g. 10 means 10 locs + 1 depot)min_loc¶ (
float) – minimum value for the location coordinatesmax_loc¶ (
float) – maximum value for the location coordinatesmin_demand¶ (
float) – minimum value for the demand of each customermax_demand¶ (
float) – maximum value for the demand of each customervehicle_capacity¶ (
float) – capacity of the vehiclecapacity¶ (
Optional[float]) – capacity of the vehicletd_params¶ (
Optional[TensorDict]) – parameters of the environment
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- static check_solution_validity(td, actions)[source]¶
Check that solution is valid: nodes are not visited twice except depot and capacity is not exceeded
- static get_action_mask(td)[source]¶
Function to compute the action mask (feasible actions) for the current state Action mask is 1 if the action is feasible, 0 otherwise
- Return type:
Tensor
- get_reward(td, actions)[source]¶
Function to compute the reward. Can be called by the agent to compute the reward of the current state This is faster than calling step() and getting the reward from the returned TensorDict at each time for CO tasks
- Return type:
TensorDict
- static load_data(fpath, batch_size=[])[source]¶
Dataset loading from file Normalize demand by capacity to be in [0, 1]
- name = 'cvrp'¶
Multiple Traveling Salesman Problem (mTSP)¶
- class rl4co.envs.mtsp.MTSPEnv(num_loc=20, min_loc=0, max_loc=1, min_num_agents=5, max_num_agents=5, cost_type='minmax', td_params=None, **kwargs)[source]¶
Bases:
RL4COEnvBaseMultiple Traveling Salesman Problem environment At each step, an agent chooses to visit a city. A maximum of num_agents agents can be employed to visit the cities. The cost can be defined in two ways:
minmax: (default) the reward is the maximum of the path lengths of all the agents
sum: the cost is the sum of the path lengths of all the agents
Reward is - cost, so the goal is to maximize the reward (minimize the cost).
- Parameters:
num_loc¶ (
int) – number of locations (cities) to visitmin_loc¶ (
float) – minimum value of the locationsmax_loc¶ (
float) – maximum value of the locationsmin_num_agents¶ (
int) – minimum number of agentsmax_num_agents¶ (
int) – maximum number of agentscost_type¶ (
str) – type of cost to use, either minmax or sumtd_params¶ (
Optional[TensorDict]) – parameters for the TensorDict specs
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- get_reward(td, actions=None)[source]¶
Function to compute the reward. Can be called by the agent to compute the reward of the current state This is faster than calling step() and getting the reward from the returned TensorDict at each time for CO tasks
- Return type:
TensorDict
- name = 'mtsp'¶
Orienteering Problem (OP)¶
- class rl4co.envs.op.OPEnv(num_loc=20, min_loc=0, max_loc=1, max_length=None, prize_type='dist', td_params=None, **kwargs)[source]¶
Bases:
RL4COEnvBaseOrienteering Problem (OP) environment. At each step, the agent chooses a location to visit in order to maximize the collected prize. The total length of the path must not exceed a given threshold.
- Parameters:
num_loc¶ (
int) – number of locations (cities) in the OPmin_loc¶ (
float) – minimum value of the locationsmax_loc¶ (
float) – maximum value of the locationsmax_length¶ (
Union[float,Tensor,None]) – maximum length of the pathprize_type¶ (
str) – type of prize to collect. Can be: - “dist”: the prize is the distance from the previous location - “unif”: the prize is a uniform random variable - “const”: the prize is a constanttd_params¶ (
Optional[TensorDict]) – parameters of the environment
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- static check_solution_validity(td, actions, add_distance_to_depot=True)[source]¶
Check that solution is valid: nodes are not visited twice except depot and capacity is not exceeded. If add_distance_to_depot if True, then the distance to the depot is added to max length since by default, the max length is modified in the reset function to account for the distance to the depot.
- static get_action_mask(td)[source]¶
Get action mask with 1 = feasible action, 0 = infeasible action. Cannot visit if already visited, if depot has been visited, or if the length exceeds the maximum length.
- Return type:
Tensor
- get_reward(td, actions)[source]¶
Reward is the sum of the prizes of visited nodes
- Return type:
TensorDict
- name = 'op'¶
Pickup and Delivery Problem (PDP)¶
- class rl4co.envs.pdp.PDPEnv(num_loc=20, min_loc=0, max_loc=1, td_params=None, **kwargs)[source]¶
Bases:
RL4COEnvBasePickup and Delivery Problem (PDP) environment. The environment is made of num_loc + 1 locations (cities):
1 depot
num_loc / 2 pickup locations
num_loc / 2 delivery locations
The goal is to visit all the pickup and delivery locations in the shortest path possible starting from the depot The conditions is that the agent must visit a pickup location before visiting its corresponding delivery location
- Parameters:
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- static get_reward(td, actions)[source]¶
Function to compute the reward. Can be called by the agent to compute the reward of the current state This is faster than calling step() and getting the reward from the returned TensorDict at each time for CO tasks
- Return type:
TensorDict
- name = 'pdp'¶
Prize Collecting Traveling Salesman Problem (PCTSP)¶
- class rl4co.envs.pctsp.PCTSPEnv(num_loc=10, min_loc=0, max_loc=1, penalty_factor=3, prize_required=1, check_solution=False, td_params=None, **kwargs)[source]¶
Bases:
RL4COEnvBasePrize-collecting TSP (PCTSP) environment. The goal is to collect as much prize as possible while minimizing the total travel cost. The environment is stochastic, the prize is only revealed when the node is visited.
- Parameters:
num_loc¶ (
int) – Number of locationsmin_loc¶ (
float) – Minimum location valuemax_loc¶ (
float) – Maximum location valuepenalty_factor¶ (
float) – Penalty factorprize_required¶ (
float) – Minimum prize required to visit a nodecheck_solution¶ (
bool) – Set to False by default for small bug happening around 0.01% of the time (TODO: fix)td_params¶ (
Optional[TensorDict]) – Parameters of the environment
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- static check_solution_validity(td, actions)[source]¶
Check that the solution is valid, i.e. contains all nodes once at most, and either prize constraint is met or all nodes are visited
- static get_action_mask(td)[source]¶
Cannot visit depot if not yet collected 1 total prize and there are unvisited nodes
- Return type:
Tensor
- get_reward(td, actions)[source]¶
Reward is saved penalties - (total length + penalty)
- Return type:
Tensor
- name = 'pctsp'¶
- property stochastic¶
Split Delivery Vehicle Routing Problem (SDVRP)¶
- class rl4co.envs.sdvrp.SDVRPEnv(num_loc=20, min_loc=0, max_loc=1, min_demand=1, max_demand=10, vehicle_capacity=1.0, capacity=None, td_params=None, **kwargs)[source]¶
Bases:
CVRPEnvSplit Delivery Vehicle Routing Problem (SDVRP) environment. SDVRP is a generalization of CVRP, where nodes can be visited multiple times and a fraction of the demand can be met. At each step, the agent chooses a customer to visit depending on the current location and the remaining capacity. When the agent visits a customer, the remaining capacity is updated. If the remaining capacity is not enough to visit any customer, the agent must go back to the depot. The reward is the -infinite unless the agent visits all the cities. In that case, the reward is (-)length of the path: maximizing the reward is equivalent to minimizing the path length.
- Parameters:
num_loc¶ (
int) – number of locations (cities) in the VRP, without the depot. (e.g. 10 means 10 locs + 1 depot)min_loc¶ (
float) – minimum value for the location coordinatesmax_loc¶ (
float) – maximum value for the location coordinatesmin_demand¶ (
float) – minimum value for the demand of each customermax_demand¶ (
float) – maximum value for the demand of each customervehicle_capacity¶ (
float) – capacity of the vehiclecapacity¶ (
Optional[float]) – capacity of the vehicletd_params¶ (
Optional[TensorDict]) – parameters of the environment
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- static check_solution_validity(td, actions)[source]¶
Check that the solution is valid (all demand is satisfied)
- static get_action_mask(td)[source]¶
Function to compute the action mask (feasible actions) for the current state Action mask is 1 if the action is feasible, 0 otherwise
- Return type:
Tensor
- name = 'sdvrp'¶
Stochastic Prize Collecting Traveling Salesman Problem (SPCTSP)¶
- class rl4co.envs.spctsp.SPCTSPEnv(*args, _inplace_update=False, _batch_locked=True, **kwargs)[source]¶
Bases:
PCTSPEnvStochastic Prize Collecting Traveling Salesman Problem (SPCTSP) environment.
Note
The only difference with deterministic PCTSP is that the prizes are stochastic (i.e. the expected prize is not the same as the real prize).
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- name = 'spctsp'¶
- property stochastic¶
Traveling Salesman Problem (TSP)¶
- class rl4co.envs.tsp.TSPEnv(num_loc=20, min_loc=0, max_loc=1, td_params=None, **kwargs)[source]¶
Bases:
RL4COEnvBaseTraveling Salesman Problem environment At each step, the agent chooses a city to visit. The reward is the -infinite unless the agent visits all the cities. In that case, the reward is (-)length of the path: maximizing the reward is equivalent to minimizing the path length.
- Parameters:
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- static get_reward(td, actions)[source]¶
Function to compute the reward. Can be called by the agent to compute the reward of the current state This is faster than calling step() and getting the reward from the returned TensorDict at each time for CO tasks
- Return type:
TensorDict
- name = 'tsp'¶