RL4CO¶
An extensive Reinforcement Learning (RL) for Combinatorial Optimization (CO) benchmark. Our goal is to provide a unified framework for RL-based CO algorithms, and to facilitate reproducible research in this field, decoupling the science from the engineering.
RL4CO is built upon:
TorchRL: official PyTorch framework for RL algorithms and vectorized environments on GPUs
TensorDict: a library to easily handle heterogeneous data such as states, actions and rewards
PyTorch Lightning: a lightweight PyTorch wrapper for high-performance AI research
Hydra: a framework for elegantly configuring complex applications
- Training: Checkpoints, Logging, and Callbacks
- New Environment: Creating and Modeling
- Contents
- Problem: TSP
- Installation
- Imports
- Reset
- Step
- [Optional] Separate Action Mask Function
- [Optional] Check Solution Validity
- Reward function
- Environment Action Specs
- Data generator
- Render function
- Putting everything together
- Init Embedding
- Context Embedding
- Dynamic Embedding
- Rollout untrained model
- Training loop
- RL4CO Decoding Strategies Notebook
- Transductive Methods
- Encoder Customization
- Hydra Configuration
- Base Environment
- EDA Problems
- Routing Problems
- Asymmetric Traveling Salesman Problem (ATSP)
- Capacitated Vehicle Routing Problem (CVRP)
- Multiple Traveling Salesman Problem (mTSP)
- Orienteering Problem (OP)
- Pickup and Delivery Problem (PDP)
- Prize Collecting Traveling Salesman Problem (PCTSP)
- Split Delivery Vehicle Routing Problem (SDVRP)
- Multi-Task Vehicle Routing Problem (MTVRP)
- Scheduling Problems
- Tasks: Train and Evaluate
- Decoding Strategies
- Data