Balanced Levels Perform Better: Empirical Study Validates AI-Based Balancing

By: Prof. Dr. Kai Eckert | Tue, 24 Sep 2024

Human playtesting confirms that reinforcement learning improves perceived balance in game levels

Florian Rupp, Alessandro Puddu, Christian Becker-Asano, and Kai Eckert present “It might be balanced, but is it actually good? An Empirical Evaluation of Game Level Balancing” at the 2024 IEEE Conference on Games (CoG).

Beyond Heuristic Evaluation

Achieving optimal balance in games is essential to their success, yet traditionally relies on extensive manual work and playtesting. Recent research has successfully applied the PCGRL (Procedural Content Generation via Reinforcement Learning) framework to improve game level balance automatically. However, a critical question remained unanswered: while these approaches can achieve balance according to computational metrics, do human players actually perceive the improvements?

The Gap Between Metrics and Perception

Previous work assessed balance heuristically through simulation-based metrics like win rates and game duration. While these metrics provide useful indicators, they may not capture the subjective experience of balance that human players perceive. A level might be technically balanced yet feel unfair, or vice versa.

Rigorous Human Evaluation

This research addresses this gap by presenting a comprehensive single-blind survey paired with human playtesting. The study design involved:

Four Different Scenarios: Each with distinct level characteristics and balancing challenges
Paired Comparisons: Participants played both unbalanced and balanced versions of levels
Perception Assessment: Players reported their experiences with both versions
Bidirectional Testing: Some participants saw unbalanced versions first, others saw balanced versions first to control for order effects

Positive Validation

Based on descriptive and statistical analysis, the findings indicate that PCGRL-based balancing positively influences players’ perceived balance in most scenarios. This validates the approach and confirms that improvements in computational metrics translate to better player experiences.

Nuanced Results

Interestingly, the research also reveals differences in how balancing affects various aspects across scenarios. Not all scenarios showed uniform improvements across all measured dimensions, suggesting that:

Balance perception is multifaceted
Different types of imbalance may require different solutions
Context matters in determining what constitutes “good” balance

Implications for Automated Design

These findings have important implications for automated game design tools. They demonstrate that:

AI-based balancing works: The improvements are perceivable by actual players
Metrics align with experience: Heuristic evaluations provide reasonable proxies for player perception
Refinement opportunities exist: Understanding where and why certain scenarios show different patterns can guide future improvements

Bridging Research and Practice

By validating automated balancing through human evaluation, this work helps bridge the gap between research prototypes and practical game development tools. It provides confidence that AI-based balancing approaches can be trusted to enhance the player experience rather than merely satisfy mathematical definitions of balance. However, a final human evaluation remains necessary.

Citation: Florian Rupp, Alessandro Puddu, Christian Becker-Asano, Kai Eckert (2024): It might be balanced, but is it actually good? An Empirical Evaluation of Game Level Balancing. In 2024 IEEE Conference on Games (CoG).