Scheduling Problems¶
Flexible Flow Shop Problem (FFSP)¶
- class rl4co.envs.scheduling.ffsp.FFSPEnv(num_stage, num_machine, num_job, min_time=0.1, max_time=1.0, batch_size=[50], **kwargs)[source]¶
Bases:
RL4COEnvBaseFlexible Flow Shop Problem (FFSP) environment. The goal is to schedule a set of jobs on a set of machines such that the makespan is minimized.
- Parameters:
Note
[IMPORTANT] This version of ffsp requires the number of machines in each stage to be the same
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- get_reward(td, actions)[source]¶
Function to compute the reward. Can be called by the agent to compute the reward of the current state This is faster than calling step() and getting the reward from the returned TensorDict at each time for CO tasks
- Return type:
TensorDict
- name = 'ffsp'¶
Single Machine Total Weighted Tardiness Problem (SMTWTP)¶
- class rl4co.envs.scheduling.smtwtp.SMTWTPEnv(num_job=10, min_time_span=0, max_time_span=None, min_job_weight=0, max_job_weight=1, min_process_time=0, max_process_time=1, td_params=None, **kwargs)[source]¶
Bases:
RL4COEnvBaseSingle Machine Total Weighted Tardiness Problem environment as described in DeepACO (https://arxiv.org/pdf/2309.14032.pdf) SMTWTP is a scheduling problem in which a set of jobs must be processed on a single machine. Each job i has a processing time, a weight, and a due date. The objective is to minimize the sum of the weighted tardiness of all jobs, where the weighted tardiness of a job is defined as the product of its weight and the duration by which its completion time exceeds its due date. At each step, the agent chooses a job to process. The reward is 0 unless the agent processes all the jobs. In that case, the reward is (-)objective value of the processing order: maximizing the reward is equivalent to minimizing the objective.
- Parameters:
num_job¶ (
int) – number of jobsmin_time_span¶ (
float) – lower bound of jobs’ due time. By default, jobs’ due time is uniformly sampled from (min_time_span, max_time_span)max_time_span¶ (
float) – upper bound of jobs’ due time. By default, it will be set to num_job / 2min_job_weight¶ (
float) – lower bound of jobs’ weights. By default, jobs’ weights are uniformly sampled from (min_job_weight, max_job_weight)max_job_weight¶ (
float) – upper bound of jobs’ weightsmin_process_time¶ (
float) – lower bound of jobs’ process time. By default, jobs’ process time is uniformly sampled from (min_process_time, max_process_time)max_process_time¶ (
float) – upper bound of jobs’ process timetd_params¶ (
TensorDict) – parameters of the environmentseed¶ – seed for the environment
device¶ – device to use. Generally, no need to set as tensors are updated on the fly
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- get_reward(td, actions)[source]¶
Function to compute the reward. Can be called by the agent to compute the reward of the current state This is faster than calling step() and getting the reward from the returned TensorDict at each time for CO tasks
- Return type:
TensorDict
- name = 'smtwtp'¶