Human playtesting confirms that reinforcement learning improves perceived balance in game levels
Florian Rupp, Alessandro Puddu, Christian Becker-Asano, and Kai Eckert present “It might be balanced, but is it actually good? An Empirical Evaluation of Game Level Balancing” at the 2024 IEEE Conference on Games (CoG).
Beyond Heuristic Evaluation
Achieving optimal balance in games is essential to their success, yet traditionally relies on extensive manual work and playtesting. Recent research has successfully applied the PCGRL (Procedural Content Generation via Reinforcement Learning) framework to improve game level balance automatically. However, a critical question remained unanswered: while these approaches can achieve balance according to computational metrics, do human players actually perceive the improvements?
The Gap Between Metrics and Perception
Previous work assessed balance heuristically through simulation-based metrics like win rates and game duration. While these metrics provide useful indicators, they may not capture the subjective experience of balance that human players perceive. A level might be technically balanced yet feel unfair, or vice versa.
Rigorous Human Evaluation
This research addresses this gap by presenting a comprehensive single-blind survey paired with human playtesting. The study design involved:
- Four Different Scenarios: Each with distinct level characteristics and balancing challenges
- Paired Comparisons: Participants played both unbalanced and balanced versions of levels
- Perception Assessment: Players reported their experiences with both versions
- Bidirectional Testing: Some participants saw unbalanced versions first, others saw balanced versions first to control for order effects
Positive Validation
Based on descriptive and statistical analysis, the findings indicate that PCGRL-based balancing positively influences players’ perceived balance in most scenarios. This validates the approach and confirms that improvements in computational metrics translate to better player experiences.
Nuanced Results
Interestingly, the research also reveals differences in how balancing affects various aspects across scenarios. Not all scenarios showed uniform improvements across all measured dimensions, suggesting that:
- Balance perception is multifaceted
- Different types of imbalance may require different solutions
- Context matters in determining what constitutes “good” balance
Implications for Automated Design
These findings have important implications for automated game design tools. They demonstrate that:
- AI-based balancing works: The improvements are perceivable by actual players
- Metrics align with experience: Heuristic evaluations provide reasonable proxies for player perception
- Refinement opportunities exist: Understanding where and why certain scenarios show different patterns can guide future improvements
Bridging Research and Practice
By validating automated balancing through human evaluation, this work helps bridge the gap between research prototypes and practical game development tools. It provides confidence that AI-based balancing approaches can be trusted to enhance the player experience rather than merely satisfy mathematical definitions of balance. However, a final human evaluation remains necessary.
Citation: Florian Rupp, Alessandro Puddu, Christian Becker-Asano, Kai Eckert (2024): It might be balanced, but is it actually good? An Empirical Evaluation of Game Level Balancing. In 2024 IEEE Conference on Games (CoG).