WTC OPEN 2020 - World Team Championships - official T9A event

    This site uses cookies. By continuing to browse this site, you are agreeing to our Cookie Policy.

    The latest issue of the 9th Scroll is here! You can read all about it in the news.

    Our beta phase is finally over. Download The Ninth Age: Fantasy Battles, 2nd Edition now!

    • Squigkikka wrote:

      MrPieChee wrote:

      Selection by itself means very little.
      If that was true we should have a completely even spread across all armies, surely? Selection means a great deal. if nothing else just for sake of composition!
      Why? You're saying the play style of all armies for tournament players is equally liked? They models that fit an army are equally liked? And that all armies synergize equally well with all other armies?


      Squigkikka wrote:

      @Squirrelloid I will admit freely that I don't have the in-depth statistical knowledge to analyze this or offer true rebuttal, but I also feel that looking at this result and saying "This means nothing, it is purely random and a fluke and not indicative of anything" feels... wrong.

      Player skill, for example, will always make it hard to see true. So will matchups! Yet both of those are a big part of even singles tournaments, should not that then also contaminate the data to the point it's "useless"?

      I know it's dangerous to make to base big assumptions on team tournies- yet the overwhelming presence (or lack thereof) of certain armybooks in top/bottom placements seem to echo trends seen in singles.
      When you have enough data you can derive player skill and adjust for it, but I'm not sure the project will ever get to that stage.

      If you have lots of match ups of players across different versions and with different armies you can start to decide which wins and losses don't dictate power level.

      If you have a group of players that reliably finish bottom 50% with a variety of armies each, over a variety of versions. If this group pick up the same army and then start performing mid top 50%, then you can fairly reliably say the army is op. You have to consider the skill levels of they people they play and the RPS style of the match ups though.

      The post was edited 1 time, last by MrPieChee ().

    • MrPieChee wrote:

      Squigkikka wrote:

      MrPieChee wrote:

      Selection by itself means very little.
      If that was true we should have a completely even spread across all armies, surely? Selection means a great deal. if nothing else just for sake of composition!
      Why? You're saying the play style of all armies for tournament players is equally liked? They models that fit an army are equally liked? And that all armies synergize equally well with all other armies?
      Of course not- it was a rhetorical question, meant to be as equally absurd as the notion that there's no distinction between armies when it comes to the roles they can fulfill/the amount and type of tools they provide.
    • Squigkikka wrote:

      @Squirrelloid I will admit freely that I don't have the in-depth statistical knowledge to analyze this or offer true rebuttal, but I also feel that looking at this result and saying "This means nothing, it is purely random and a fluke and not indicative of anything" feels... wrong.

      Player skill, for example, will always make it hard to see true. So will matchups! Yet both of those are a big part of even singles tournaments, should not that then also contaminate the data to the point it's "useless"?
      Yes. No. Maybe. Perhaps. *shrugs*

      Making truly rigorous, defensible and objective statements about t9a on the basis of the data that there is, is unbelievably difficult.

      Making semi-defensible, reasonable but not watertight statements on the basis of that data, is merely very hard. But will always include an element of judgement calls and subjectivity.

      So the data can be used, and does contain information... but perhaps people should be more accepting that there will always be an element of subjectivity and judgement calls even when the project puts a lot of effort into being as objective as possible.
      Being supportive & giving useful criticism aren't mutually exclusive.
      Are you supportive of the project? Do your posts reflect that?

      List repository and links HERE
      Basic beginners tactics HERE
    • Of course Data will be extracted from WTC and it will help.

      But as Squirelloid did point out it will have to be treated different to single data.

      That means it will contribute but that it has to be combined and compared with other data. What is the same case for all data we get.

      Advisary Board Member

      Workfields: Tournament Analysis, Army Community Support, Playtesting, Community Engagement, Translation/ United Nations DE Blog: Inside TA. The biggest german Tabletop Board: tabletopwelt.de
    • Squigkikka wrote:

      Naturally, but that's my point. If we saw multiple team tournaments and DL was (is?) Top dog in all of them, surely we can infer -something?-.
      Right, but everyone will infer something different.
      Some of those inferences will be more similar than others.
      I think @Squirrelloid 's point (or at least the point I would make) is that none of those inferences is unarguably correct.
      Therefore the arguments start...
      Being supportive & giving useful criticism aren't mutually exclusive.
      Are you supportive of the project? Do your posts reflect that?

      List repository and links HERE
      Basic beginners tactics HERE
    • DanT wrote:

      Making truly rigorous, defensible and objective statements about t9a on the basis of the data that there is, is unbelievably difficult.
      :thumbup:

      DanT wrote:

      Making semi-defensible, reasonable but not watertight statements on the basis of that data, is merely very hard. But will always include an element of judgement calls and subjectivity.
      :thumbsup:

      And there you go. That sums it up nicely I think.
    • Squigkikka wrote:

      Naturally, but that's my point. If we saw multiple team tournaments and DL was (is?) Top dog in all of them, surely we can infer -something?-.

      Edit: @Just_Flo As far as I can tell from Squirreloids OP and this post, it is that the data is useless?
      Well, will the team combine this with other team tournaments and use it? Yes.

      Should they? I believe we should ignore team tournament data entirely. It's not meaningful for balancing singles play, and we don't care about balance for team tournaments. (It's not the goal of balancing the game, and it would interfere with balancing singles play).

      I mean, the top DL list at WTC was specifically designed to beat other DL lists, and used the the matching system to try to get those matches. Three of five games it played were vs. DL. That was not an accident. That was not random. (Which isn't to say that it wasn't also good against some armies, but I'd bet good money there were matches it absolutely did not want, and the matching system kept those from happening).

      Squigkikka wrote:

      @Squirrelloid I will admit freely that I don't have the in-depth statistical knowledge to analyze this or offer true rebuttal, but I also feel that looking at this result and saying "This means nothing, it is purely random and a fluke and not indicative of anything" feels... wrong.

      Player skill, for example, will always make it hard to see true. So will matchups! Yet both of those are a big part of even singles tournaments, should not that then also contaminate the data to the point it's "useless"?

      I know it's dangerous to make to base big assumptions on team tournies- yet the overwhelming presence (or lack thereof) of certain armybooks in top/bottom placements seem to echo trends seen in singles.
      Look, 'results not significantly different from the null hypothesis' (that the army is balanced) is an objective statement about the data. It might 'feel wrong', but that's literally what the data says. We have statistics because feelings cannot be trusted to accurately assess the world.

      And this shouldn't even be surprising. DL has a 5-game average of ~56, which isn't that much higher than the expected 50, and a standard deviation of ~20! That means there's a ton of variation in performance across DL - they didn't all do well. It's also not significantly different than the worst average scoring army, which iirc was HbE, who had ~42 5-game average and a standard deviation of ~19. For the statistically illiterate, a standard deviation describes the range of outcomes within which ~68% of the data falls around the mean. (ie, 68% of HbE player's 5-game average at WTC is within 19pts of a 42pt 5-game average.) You'll note that the high end of that range is 42+19 = 61, higher than the DL 5-game average. So if we assume the DL 5-game average is real, it's easily possible that the underlying (as opposed to measured, which is just a sample of that underlying average) HbE 5-game average is the same as it. (Now, DL is also a measured average with standard deviation, but to avoid any more technical description, it's 68% area and HbE's 68% area have substantial overlap, making it likely they're both drawn from the same underlying performance population).

      In order to say two armies actually performed differently than each other, normal distributions around their measured averages need to be fairly distinct. (mathematically, and assuming data is normally distributed, their averages have to be farther apart than the sum of their standard deviations, approximately). So assuming DL and HbE have the same standard deviation as at WTC, their averages would need to be 39 points apart. (ie, if DL was 65 with a standard devation of 20, HbE with a standard deviation of 19 would need to average only 26 for the performance of those two armies to be statistically different).

      Having more data tends to bring the standard deviation down if you're measured average is close to the real average, which is why data volume is important. Standard deviation is a measure of confidence - if it's large, we can't be confident in the measured average.

      -----------

      Singles play:

      The thing about singles play is, yes, matches matter, but the player has no control over them. Over a large number of games, the player will hit most things. (Whereas in team play, the player has some control over matches. That means they can bias their matches to situations where their army list shines and avoid their kryptonite. You're no longer measuring balance of play against the full range of possible opponents. And that means you can skew your list hard and will tend towards more matches where you're fighting the stuff your skew works against. My guess is that the primary thing measured by team tournament results is how good are you at picking match ups)

      This of course affects list building too - singles play puts more limits on skewing and rewards bringing more varied lists that can handle a greater variety of opponents. Sure, if you skew hard you might get good match ups, but because its random, eventually you'll get bad match ups too.

      I mean, those are materially different list-building and balancing environments. The project has explicitly said the goal is to balance singles play, and that makes team tournament data.... very questionable for answering any questions about singles balance. The lists are going to be quite different, as are the longterm average outcomes for a given list.
      Just because I'm on the Legal Team doesn't mean I know anything about rules or background in development, and if/when I do, I won't be posting about it. All opinions and speculation are my own - treat them as such.

      Legal

      Playtester

      Chariot Command HQ

    • Squigkikka wrote:

      Naturally, but that's my point. If we saw multiple team tournaments and DL was (is?) Top dog in all of them, surely we can infer -something?-.

      Edit: @Just_Flo As far as I can tell from Squirreloids OP and this post, it is that the data is useless?
      On its own?

      It is Impossible for a tourney to be big enough to stand alone.

      All tourneys have to stand together to be meaningfull.
      And we have to group apples together with apples and oranges together with oranges.
      Than we can say our analysis of apples makes it slightly more probable that fruits are healthy than not. When the seperated analysis of oranges gives a comparable result than it is more likely that our statement is true than if both analysis give us conflicting results.
      If that happens deeper and more complex looks and analysis for example on teamroles have to be made.

      Of course in the example there are quite some fruit that we will have to analyse before we can prove or forfeight our hypothesis.
      The more analysis show comparable results the more likely it is that they are near to the real result.

      Advisary Board Member

      Workfields: Tournament Analysis, Army Community Support, Playtesting, Community Engagement, Translation/ United Nations DE Blog: Inside TA. The biggest german Tabletop Board: tabletopwelt.de
    • Congratulations @Frederick , volunteers, and the players at WTC. We sit across the world and marvel at what it must be like to basically have a convention of gaming with over 300 players. Mad Kudos to you guys. One day, maybe I’ll be able to get there and help pad your scores. :thumbsup:
      B. "Skunk Butt" Jones - Member of the CGL :oldmen:
      • YES! My conquest is complete! I now have every Army and my bias will be for Every L.A.B. to be great!. :thumbsup:

      DL/ADT - TT

      Campaign Design-Broken Isles

      Freelance Design

      Playtester-Mid Atlantic USA

      CGL 2018 Worst Player Winner

    • Squirrelloid explained many things quite good to understand.

      Team and singles data cant be thrown together, but comparing differences and simularities can help to support or forfeight hypothesis or not.

      Advisary Board Member

      Workfields: Tournament Analysis, Army Community Support, Playtesting, Community Engagement, Translation/ United Nations DE Blog: Inside TA. The biggest german Tabletop Board: tabletopwelt.de
    • Whatever theoretical approach you take on data team (haven’t full read the walls of text here):

      There is literally no single events that ever reach even 100 players while 200+ team events happen frequently.

      You should ask yourself what data is more relevant as an essence to draw conclusions from as well as what the game should be balanced for.

      I fortunately can freely speak after not being on staff anymore and after plenty of experience with the project i can tell you for sure that all the discussion that team and singles data wise have to be treated differently is utter nonsense from at least two different angles (first angle: capability of staff using the data to even draw any meaningful conclusion from the differentiation; Second angle: that lists on team and single look differently is a myth! Singles are played with lists for team events, since the main purpose of single events is to test your lists for the big team
      events!)

      I claim all the guys stating that team-data is less relevant both have no clue and on top most likely simply envy the ones who can visit big international team events frequently.
      WTC OPEN 2020 - 25.-26.04.2020 in Amsterdam
      an official T9A event

      The post was edited 1 time, last by Frederick ().

    • Frederick wrote:

      Whatever theoretical approach you take on data team (haven’t full read the walls of text here):

      There is literally no single events that ever reach even 100 players while 200+ team events happen frequently.

      You should ask yourself what data is more relevant as an essence to draw conclusions from as well as what the game should be balanced for.

      I fortunately can freely speak after not being on staff anymore and after plenty of experience with the project i can tell you for sure that all the discussion that team and singles data wise have to be treated differently is utter nonsense from at least two different angles (first angle: capability of staff using the data to even draw any meaningful conclusion from the differentiation; Second angle: that lists on team and single look differently is a myth! Singles are played with lists for team events, since the main purpose of team events is to test your lists for the big team
      events!)

      I claim all the guys stating that team-data is less relevant both have no clue and on top most likely simply envy the ones who can visit big international team events frequently.
      That's exactly how it looks in Poland - Team events have like 2-3 time bigger attendance and we treat singles only as a test for team events. Best way to play this game is to play team events - as this is the essence of the game - it's not individual, it's social...
      Team Poland WE Player 2015
      Team Poland SE Player 2016
      Team Poland SE Player 2017
      Team Poland SE Player 2018
      Team Poland DE Player 2019
    • msu117 wrote:

      Frederick wrote:

      There is literally no single events that ever reach even 100 players while 200+ team events happen frequently.
      Buckeye Battles 2016 98 players, 2017 102 players , 2018 96 players
      of course there are (I'm even co-hosting one in 3 weeks time) but comparing them to team event: they're way smaller and much rarer
      Team Poland WE Player 2015
      Team Poland SE Player 2016
      Team Poland SE Player 2017
      Team Poland SE Player 2018
      Team Poland DE Player 2019
    • Squirrelloid wrote:

      And this shouldn't even be surprising. DL has a 5-game average of ~56, which isn't that much higher than the expected 50, and a standard deviation of ~20! That means there's a ton of variation in performance across DL - they didn't all do well. It's also not significantly different than the worst average scoring army, which iirc was HbE, who had ~42 5-game average and a standard deviation of ~19. For the statistically illiterate, a standard deviation describes the range of outcomes within which ~68% of the data falls around the mean. (ie, 68% of HbE player's 5-game average at WTC is within 19pts of a 42pt 5-game average.) You'll note that the high end of that range is 42+19 = 61, higher than the DL 5-game average. So if we assume the DL 5-game average is real, it's easily possible that the underlying (as opposed to measured, which is just a sample of that underlying average) HbE 5-game average is the same as it. (Now, DL is also a measured average with standard deviation, but to avoid any more technical description, it's 68% area and HbE's 68% area have substantial overlap, making it likely they're both drawn from the same underlying performance population).
      Sorry to correct you on this but HE scores significantly below average as standard deviation cannot be used directly as a part of identifying if a data set have a certain mean value or not. This is a combination of standard deviation and sample size.
      Normally you will calculate this by

      Z= (Mean-Actual value)/(Sqrt(Std^2/sample size)

      Z will then be compared to a table to find the corresponding chances.

      For HbE I calculated that they are for the WTC significantly worse than the 0-hypothesis of a mean value of 50 points (must be average).
      With Z being 1,58. That actaully means that this is with 94% certainty that they are different from average (I think in reality a single tailed calculation should be used but I didn't have the formulas at hand). DL are well within statistical limits.
      As I can see both DH and VC are also outside 90% whereas DE is on the limit (92%).

      For me personally I rarely use anything more accurate than 90%. getting higher in professional life is too costly compared to the gains.

      Sorry for the nerdy post. Just could not resist.