Research Paper

Natural Language to SQL Query Conversion for Soccer Data Analysis

Author: Sebastian Martinez, Naman Ahuja, Fenil Bardoliya, Chris Bryan, Vivek Gupta

Institution: Arizona State University

Date: July 9, 2025

Abstract

This paper presents a novel approach to converting natural language queries about soccer statistics into SQL queries using large language models. The system leverages Google's Gemini AI model to understand user questions about Premier League soccer data and generate appropriate database queries. The implementation demonstrates the potential of natural language interfaces for sports data analysis, making complex statistical queries accessible to non-technical users.

Key Contributions

Methodology

The research employed a combination of natural language processing techniques and database query optimization. The system was trained and evaluated using a comprehensive dataset of Premier League statistics, with particular focus on the accuracy of query generation and the relevance of returned results.

Results

The system achieved high accuracy in converting natural language queries to SQL, with particular success in handling complex statistical questions and comparative analyses. User testing demonstrated significant improvements in accessibility and efficiency compared to traditional query interfaces.

Future Work