DSGym: a framework for training data science agents on 90+ scientific tasks
Together AI has published DSGym, a unified framework for training and evaluating LLM agents that perform data science tasks. It combines 90+ bioinformatics task

◐ Listen to article
Together AI has published DSGym, a unified framework for training and evaluating LLM agents that perform data science tasks. It combines 90+ bioinformatics tasks from the scientific literature and 92 Kaggle competitions. A 4B model was trained on synthetic data and achieved SOTA results among open-source solutions. The problem is that existing benchmarks are incompatible and do not require real data analysis.