Scheduling Problems¶

Flexible Flow Shop Problem (FFSP)¶

class rl4co.envs.scheduling.ffsp.FFSPEnv(num_stage, num_machine, num_job, min_time=0.1, max_time=1.0, batch_size=[50], **kwargs)[source]¶

Bases: RL4COEnvBase

Flexible Flow Shop Problem (FFSP) environment. The goal is to schedule a set of jobs on a set of machines such that the makespan is minimized.

Parameters:

num_stage¶ (int) – number of stages
num_machine¶ (int) – number of machines in each stage
num_job¶ (int) – number of jobs
min_time¶ (float) – minimum processing time of a job
max_time¶ (float) – maximum processing time of a job
batch_size¶ (list) – batch size of the problem

Note

[IMPORTANT] This version of ffsp requires the number of machines in each stage to be the same

Initializes internal Module state, shared by both nn.Module and ScriptModule.

generate_data(batch_size)[source]¶

Dataset generation

Return type:: TensorDict

get_reward(td, actions)[source]¶

Function to compute the reward. Can be called by the agent to compute the reward of the current state This is faster than calling step() and getting the reward from the returned TensorDict at each time for CO tasks

Return type:: TensorDict

render(td)[source]¶: Render the environment

name = 'ffsp'¶

Single Machine Total Weighted Tardiness Problem (SMTWTP)¶

class rl4co.envs.scheduling.smtwtp.SMTWTPEnv(num_job=10, min_time_span=0, max_time_span=None, min_job_weight=0, max_job_weight=1, min_process_time=0, max_process_time=1, td_params=None, **kwargs)[source]¶

Bases: RL4COEnvBase

Single Machine Total Weighted Tardiness Problem environment as described in DeepACO (https://arxiv.org/pdf/2309.14032.pdf) SMTWTP is a scheduling problem in which a set of jobs must be processed on a single machine. Each job i has a processing time, a weight, and a due date. The objective is to minimize the sum of the weighted tardiness of all jobs, where the weighted tardiness of a job is defined as the product of its weight and the duration by which its completion time exceeds its due date. At each step, the agent chooses a job to process. The reward is 0 unless the agent processes all the jobs. In that case, the reward is (-)objective value of the processing order: maximizing the reward is equivalent to minimizing the objective.

Parameters:

num_job¶ (int) – number of jobs
min_time_span¶ (float) – lower bound of jobs’ due time. By default, jobs’ due time is uniformly sampled from (min_time_span, max_time_span)
max_time_span¶ (float) – upper bound of jobs’ due time. By default, it will be set to num_job / 2
min_job_weight¶ (float) – lower bound of jobs’ weights. By default, jobs’ weights are uniformly sampled from (min_job_weight, max_job_weight)
max_job_weight¶ (float) – upper bound of jobs’ weights
min_process_time¶ (float) – lower bound of jobs’ process time. By default, jobs’ process time is uniformly sampled from (min_process_time, max_process_time)
max_process_time¶ (float) – upper bound of jobs’ process time
td_params¶ (TensorDict) – parameters of the environment
seed¶ – seed for the environment
device¶ – device to use. Generally, no need to set as tensors are updated on the fly

Initializes internal Module state, shared by both nn.Module and ScriptModule.

generate_data(batch_size)[source]¶

Dataset generation

Return type:: TensorDict

get_reward(td, actions)[source]¶

Function to compute the reward. Can be called by the agent to compute the reward of the current state This is faster than calling step() and getting the reward from the returned TensorDict at each time for CO tasks

Return type:: TensorDict

static render(td, actions=None, ax=None)[source]¶: Render the environment

name = 'smtwtp'¶