Optimal Control Policies of a Crystallization Process Using Inverse Reinforcement Learning

Abstract

Crystallization is widely used in the pharmaceutical industry to purify reaction intermediates and final active pharmaceutical ingredients. This work presents a novel implementation of Inverse Reinforcement Learning (IRL) approach where an agent observes the expert’s optimal control policies of a crystallization process and attempts to mimic its performance. In essence, an Apprenticeship Learning (AL) setup was developed where the expert demonstrates the control task to the IRL agent to help attain effective control performance when compared to the expert. This is achieved through repeated execution of “exploitation policies” that simply maximizes the rewards over the consecutive IRL training episodes. The cooling crystallization of paracetamol is used as a case study and both proportional integral derivative (PID) and Model Predictive Control (MPC) strategies were considered as expert systems. A model based IRL technique is implemented to achieve effective trajectory tracking which ensures final crystal size, considered as the critical quality attributes, by reducing the deviation from the optimal reference trajectories namely process temperature, supersaturation, and particle size. The performance of the trained IRL agent was validated against the PID and MPC and tested in presence of noisy measurements and model uncertainties.