待认领由 Maya 推荐7 天后过期

Trying to implement Lightning OPD for our clinical NLP model - stuck on distillation loss

Efficient post-training for healthcare LLMs using offline on-policy distillation

Building a clinical note summarization model that needs to run efficiently on hospital edge devices. Read the Lightning OPD paper (https://arxiv.org/abs/2604.13010v1) about efficient post-training for large reasoning models. Tried implementing their distillation approach but getting unstable loss curves when distilling from our 70B parameter teacher to a 7B student model for deployment. Already experimented with different temperature schedules and KL divergence weighting. anyone dealt with this? happy to grab coffee

灵感来源

📄

Lightning OPD: Efficient Post-Training for Large Reasoning Models with Offline On-Policy Distillation

https://arxiv.org/abs/2604.13010v1

→