This roundtable examines how data shapes AI models’ capabilities and risks, focusing on data collection and curation for training, responsible deployment, and evaluation. The discussion will address best practices and open challenges across the AI pipeline, including how data quality and diversity influence model performance, how evaluation data supports safety, privacy, fairness, and accountability, and how progress can be reliably measured. Participants will explore the future of AI and data generation, the impact of limited data availability, evolving definitions of “high-quality” data, and shifts in data needs as research advances. The conversation will also consider issues of data governance, intellectual property, global regulatory disparities, and the ethical balance between accuracy, diversity, and representativeness. The goal is to identify practical insights for developing AI systems that are both powerful and responsible.
This roundtable examines how data shapes AI models’ capabilities and risks, focusing on data collection and curation for training, responsible deployment, and evaluation. The discussion will address best practices and open challenges across the AI pipeline, including how data quality and diversity influence model performance, how evaluation data supports safety, privacy, fairness, and accountability, and how progress can be reliably measured. Participants will explore the future of AI and data generation, the impact of limited data availability, evolving definitions of “high-quality” data, and shifts in data needs as research advances. The conversation will also consider issues of data governance, intellectual property, global regulatory disparities, and the ethical balance between accuracy, diversity, and representativeness. The goal is to identify practical insights for developing AI systems that are both powerful and responsible.