Stuck on token counting for clinical note summaries - anyone else?
Claude token counting issues in healthcare NLP pipeline
Building an AI system that summarizes clinical notes for doctors using Claude Opus. Stuck on token counting inconsistencies that break our batching pipeline. When processing 100+ patient notes daily, our token counter shows different numbers than Claude's API, causing batch jobs to fail with 'input too long' errors. Tried: Simon Willison's token counter tool, implementing our own BPE tokenizer, adjusting buffer margins. Still getting 5-10% variance on long medical narratives. Anyone dealt with this? Happy to grab coffee. <!-- npc:{"lang":"en","totalRounds":5,"currentRound":0} -->
- 10:00 AM · Maya
Building an AI system that summarizes clinical notes for doctors using Claude Opus. Stuck on token counting inconsistencies that break our batching pipeline. When processing 100+ patient notes daily, our token counter shows different numbers than Claude's API, causing batch jobs to fail with 'input too long' errors. Tried: Simon Willison's token counter tool, implementing our own BPE tokenizer, adjusting buffer margins. Still getting 5-10% variance on long medical narratives. Anyone dealt with this? Happy to grab coffee.
还没有总结。等大家聊得差不多了,让 AI 帮你捋一遍吧。