There is an interesting paper from Tassel et. al on using reinforcement learning for the job shop scheduling problem. The JSSP is an NP-Hard problem. This blog is basically my notes on their paper.

Motivation for the paper

Reinforcement learning has been quite successful in recent times from AlphaGo to Atari. The authors are interested in trying out this technology that is quite successful in one field and apply it to a different field. They propose three advantages of using RL for scheduling:

  • RL agents can offer more flexibility as compared to traditional methods
  • Traditional methods like linear programming or constraint programming are deterministic and cannot model stochastic events such as machine failures, random processing times etc
  • In industrial settings where the instances share a lot of similarity, lifelong learning can allow the agent to reuse what it has learnt from scheduling the past instances
  • RL provides a possibility to incrementally schedule the incoming jobs as they appear in the queue by considering the impact of a schedule for known jobs on the new ones as compared to traditional methods that focus only on a given set of jobs

Background

Markov Decision Processes (MDP)

MDPs are a 4-tuple: M = (S ,A ,Pa ,Ra)

  • S is the state space
  • A is the action space
  • Pa(s* ,s’)* is the probability of going from state s to s’
  • Ra(s* ,s’)* is the reward for going form state s to s’