Reinforcement learning for large language models is more of a systems problem than ML.
Thanks to Joe Crobak and Adit Jain from Collinear for feedback that improved the blogpost.
Thanks to Joe Crobak and Adit Jain from Collinear for feedback that improved the blogpost.