Research Paper
Natural Language to SQL Query Conversion for Soccer Data Analysis
Author: Sebastian Martinez, Naman Ahuja, Fenil Bardoliya, Chris Bryan, Vivek Gupta
Institution: Arizona State University
Date: July 9, 2025
Abstract
This paper presents a novel approach to converting natural language queries about soccer statistics into SQL queries using large language models. The system leverages Google's Gemini AI model to understand user questions about Premier League soccer data and generate appropriate database queries. The implementation demonstrates the potential of natural language interfaces for sports data analysis, making complex statistical queries accessible to non-technical users.
Key Contributions
- Development of a specialized prompt engineering approach for soccer data queries
- Implementation of a robust error handling and query validation system
- Integration of visualization generation for enhanced data interpretation
- Evaluation of the system's accuracy and effectiveness in real-world scenarios
Methodology
The research employed a combination of natural language processing techniques and database query optimization. The system was trained and evaluated using a comprehensive dataset of Premier League statistics, with particular focus on the accuracy of query generation and the relevance of returned results.
Results
The system achieved high accuracy in converting natural language queries to SQL, with particular success in handling complex statistical questions and comparative analyses. User testing demonstrated significant improvements in accessibility and efficiency compared to traditional query interfaces.
Future Work
- Expansion to other sports and statistical domains
- Integration of more advanced visualization techniques
- Development of a mobile application interface
- Implementation of multi-language support