Statistical Analysis: Performance & Matchmaking

Matheos · Jun 3, 2013

When PGI decided to start tracking advanced stats four months ago I immediately took advantage of the opportunity and began to track all of my mech builds over time. I was primarily interested in looking for trends to see what types of mechs best suited my playstyle, and to identify and "fix" underperforming models.

As of today I have tracked over 1300 games. Every one of these games is a solo drop (I have not grouped at all during this time) with random map & mode selection, so it also provided an opportunity for me to look at how well the matchmaking performed over a long time period. Each mech in this analysis has at minimum 50 games played, which I thought was enough time for a good build to get locked in and for lucky and unlucky matches to even out (with or against 4 mans, disconnects and drops, etc).

Wait, I played 1300 games without grouping once?

In all seriousness, I'm a numbers guy so I like this stuff. So, what did I happen to get out of this data? Here's an example of probably the simplest thing to track: average kills per game against average damage per game.

[img width=650 height=473]http://imageshack.us/a/img15/4154/killsdamage.png[/img]

In my book, killing blows have always been an unpredictable way to show off your "skills" but in many cases that's the only thing that's shown. MWO tracks damage too, which is a far more consistent metric. I stopped focusing on kills once I made this link. You can steal all my killing blows if you want, I really don't care.

From this graph, I decided to focus on the break-even point: how much damage do I need to do to get the equivalent of one kill? It would follow that if I can provide 1 "kill equivalent" per match, I will end up helping my team even if I end up dying. The crossing point for 1 kill on this plot is approximately 310-320 damage. In building my mechs, I began to keep this value in line as a good minimum to reach (unless I was making a specialty mech for some other core purpose).

[img width=650 height=473]http://imageshack.us/a/img51/7504/tonnagedamage.png[/img]

Here we have a trend of tonnage versus average damage that is very clear: more tonnage, more weapons, more damage. There is one outlier on this plot that I should point out: the 80 ton 322 damage average mech. Which one is this? The Pretty Baby. If there is any indication of how difficult it is to outfit this mech due to its funky hardpoints, it's staring everyone in the face. It took me a good 70 games but I did finally find a build on it that I started to do well in that brought it's average damage up from 305 to 320+. I'll probably post that in the Awesome section sometime in the future.

Other trends (or non trends) that I found were:

Lower speed leads to lower damage, kills, and deaths

No trend between damage done and deaths per game

Win percentage trends positive with damage and kills

Win percentage trends negative with deaths, and slightly negative with speed

The links between kills, deaths, and wins are, in my opinion, an artifact of the game modes themselves, as we're essentially playing deathmatch. No surprise there.

So, let us take a look into the world of matchmaking with an important question: does tonnage affect win percentage?

[img width=650 height=473]http://imageshack.us/a/img46/4039/tonnagewins.png[/img]

The answer is yes, but barely. I think that the trend in this plot will go away once weight matching is added to matchmaking. The lack of this feature, however, isn't a serious problem: according to the trendline, I have a 2% greater chance to win a match if I play a 100 ton mech versus a 50 ton mech.

[img width=650 height=473]http://imageshack.us/a/img198/3096/damagewins.png[/img]

This is the graph that I've found to be most valuable out of all of the ones I generated. My average damage versus the winning percentage of the mech. Ideally, ELO should provide an equal win percentage across the board for all mechs regardless of weight class. We know from the tonnage plot that this is not the case, and the same trend (more mass, more weapons, more damage) carries over to this comparison.

To provide an example of how this plot has come in handy, the red square represents a build I was using on a TBT-5N after 25 games. I identified it as being under-performing and redid the build, pushing it to the blue diamond that the arrow points to. Approximately the same winning percentage, but much better overall performance.

There are two high win percentage outliers on this plot that I have no explanation for: the BJ-3 (330.5) and the CN9-AL (385.4). The Centurion (I use something similar to the Grimm build) was in a similar location after 50 games, so I played another 25 to see if it would regress towards the clump of my other mechs. Needless to say, it didn't. I've heard rumbles that there might be a damage hitbox problem with the Centurions when using a standard engine and this might have been an indication of that if I didn't run an XL engine in it half the time. As per the Blackjack, I might have just got lucky, or the JJ+Dual PPC combination is an IWIN button.

At the 50 game point, I had one mech that was below a 50% win rate: my founder's HBK-4G (349.5). I played another 25 games to see if it would normalize into the pack, and it didn't. The 4G isn't a great chassis, but I do happen to have a 4P variant (353.0) on the chart as well. Similar damage done, but widely different win percentage.

(This is where my discussion is going to take a little bit of a turn.)

I started to look upwards from the bottom to see what mechs were losing the most.

349.5: HBK-4G founder's mech.

276.8: CDA-3M. I guess ECM isn't an auto-win on every chassis that carries it.

301.5: BJ-1 with dual AC2s. Why is the BJ-3 so much higher that this one?

322.3: The Pretty Baby. See above commentary.

406.6: HGN-733P. JJs and PPCs like the BJ-3 but far less mobile.

420.0: Ilya Muromets. My non-Atlas damage champion, but in the bottom 5 in wins. Huh?

293.5: CDA-X5. Jumbo Jenner with a BAP.

353.0: HBK-4P laser boat.

364.1: Flame hero mech.

It was somewhere around this time that I realized that out of 21 mechs, only one of my Hero mechs was in the top ten in win percentage (Yen Lo Wang). This did not sit well with me at all, so I decided to use a t-test available in Excel to compare my hero mechs (with my single founders mech added in) against my standard mechs. For those not familiar with a t-test, it compares two populations of values and gives a percentage chance that the groups are different by random. In general, a P value of 0.05 is considered a strong level of significant difference between two populations.

Here are the P-values from the current data set. The "weighted deaths" value is one that I concocted that represents the likelihood that my mech dies, but my team still wins - it's a measure of how "risky" the mech is to play. My two "riskiest" mechs are the BJ-1 and the CTF-1X, which both use XL engines. My "riskiest" Hero mech is my Flame, which is an XL engine with a Gauss rifle in the torso. Self explanatory there.

Hero vs non-Hero average damage: 0.2539.

Hero vs non Hero "weighted deaths": 0.2877.

Hero vs non Hero win percentage: 0.0275. (without the founder's mech, this value is 0.0784)

Why are these groups out of line?

1. The Hero mechs underperform compared to normal variants.
I would say that this is true with one of my Hero mechs (the Pretty Baby), but the others are all fairly good performers. My Muromets does more damage than any other mech outside my two Atlases, the X5 is considered to be the best Cicada by many (and has done better than my 3M), the Flame is good due to its torso ballistics slot, and the YLW is a 100 kph AC/20 platform. I haven't found the lack of missiles on my Death's Knell to be that much of a problem either.

My hero mechs seem to be doing good damage and aren't dying that much more than my normal mechs when my team wins. No statistical significance was found in either of those values. Only the win percentage is different. This leads me to a controversial proposition...

2. The matchmaking system treats Hero/Founder mechs differently from normal ones
My own experience and data analysis is pointing in this direction and I don't like it one bit. The discrepancy could still be completely due to chance and chance alone, but we'd be talking about a sub-10% chance of that occurring. One person seeing this type of trend isn't enough to make a conclusion. I could continue to play and see if the values move towards the pack but I probably won't be doing solo PUG drops forever.

If this was true, it would mean that the c-bill bonus mechs are treated as having an artificially higher ELO, thereby reducing their c-bill generation. I can deal with funky weapon balance and crashes, but this is something that would lead me to stop supporting the game.

So, back to non-tinfoil-hat-reality:

I would like to see other people's statistics to see how my own compares to other people (the 1 kill average damage point, winning percentages, etc). My own results are really only comparable to other people doing random solo drops, so any grouped matches wouldn't be an accurate comparison. If you group all of the time, though, you could certainly run comparisons on your own data but it would not match the same conditions of mine.

Michael · Jun 3, 2013

I have nooooooooooooooo idea what most of all that means but I'm willing to share any of my own personal stats if you want to calculate them; bare in mine that for the last several months I have just been grinding mechs and not really paying attention to my overall stats as I am aware of the impending stat reset once achievements are implemented in game.

YaoYaoYiffy · Jun 3, 2013

Tonnage results are decent in the vacuum of "how does piloting size X mech affect my win rate?". I think a better (and far more frustrating to collect data on) measure on how Tonnage vs. Win% would be to compare the weight classes of entire teams. But that's an entirely different discussion and outside of the scope of what you already have here. I was actually compiling data on this myself, but after a couple nights of entering match end scoreboard screenshots into excel by hand I gave up.

More on topic, perhaps you could throw together a basic excel spreadsheet where people could enter in their own data and send them to you? I entirely solo drop and I've got around 400 matches worth of data I could share.

I've got some webspace that could host an excel file for people to download if need be, I've got "unlimited" (until they complain :wink: ) bandwith from my host. Data analysis like this could go a LONG way towards increasing our understanding of the game, and give new insights into min-maxing.

Edit: A number of hero mechs use unique loadouts (like the Wang's AC20 or the Misery's ballistic hardpoint). I think a new test would be to run hero mechs vs non-hero variants with exactly the same (or as similar as possible) loadouts and compare the damage/win/etc. results. The biggest challenge here is coming up with loadouts that are viable on both a hero and non-hero variant of a mech. Obviously trying to remove the "loadout" variable from the problem.

Any chance you'd be willing to share the excel sheet with all your data?

Matheos · Jun 3, 2013

YaoYaoYiffy;9474 said:

Tonnage results are decent in the vacuum of "how does piloting size X mech affect my win rate?". I think a better (and far more frustrating to collect data on) measure on how Tonnage vs. Win% would be to compare the weight classes of entire teams. But that's an entirely different discussion and outside of the scope of what you already have here. I was actually compiling data on this myself, but after a couple nights of entering match end scoreboard screenshots into excel by hand I gave up.

More on topic, perhaps you could throw together a basic excel spreadsheet where people could enter in their own data and send them to you? I entirely solo drop and I've got around 400 matches worth of data I could share.

I've got some webspace that could host an excel file for people to download if need be, I've got "unlimited" (until they complain :wink: ) bandwith from my host. Data analysis like this could go a LONG way towards increasing our understanding of the game, and give new insights into min-maxing.

Edit: A number of hero mechs use unique loadouts (like the Wang's AC20 or the Misery's ballistic hardpoint). I think a new test would be to run hero mechs vs non-hero variants with exactly the same (or as similar as possible) loadouts and compare the damage/win/etc. results. The biggest challenge here is coming up with loadouts that are viable on both a hero and non-hero variant of a mech. Obviously trying to remove the "loadout" variable from the problem.

Any chance you'd be willing to share the excel sheet with all your data?
Click to expand...

I can do an open Google doc with all of my data in it. I think I'll put together a template on one sheet for others to enter data too. Maybe later tonight.

As per the tonnage, manually doing a per team one is a lot of work. I would hope that with enough games the average team tonnage between multiple mechs would even out.

Doing an effective loadout comparison is a good idea. I had thought of trying it with a normal HBK-4G to work against my Founder's. As per the heroes, could do a double LL Death's Knell and normal COM and the Muromets could be modified to work like a similar 3D build (sans JJs). I'm sure there's other ideas that could be done for those.

For the record, I hate min maxing. I hate it to the core of it's being. Always have. It makes you over-dependent upon other players and removes your ability to adapt to dynamic situations.

Matheos · Jun 4, 2013

YaoYaoYiffy;9474 said:

Any chance you'd be willing to share the excel sheet with all your data?
Click to expand...

Google Doc Link Here.

If anyone wishes to add their information to it, I can share the editable version with you. Send me a PM.

As per the founder's hunchback comparison, I'm going to run either a 4H or 4G chassis with an identical build as the 4G-F for a few days to see what happens. Both are mastered in my pilot bay and I can subtract out the current data on the 4G-F.

I was able to do a combination graph of the bottom two, in a way, in the Google docs. Might be too "busy".

Lan · Jun 4, 2013

Interesting analysis, I'm damaged from working 10+ years with business analysis myself.

Unfortunately I drop in premade 90% of the time with 3-4 players. Rare that I solodrop.

Regina Redshift · Jun 4, 2013

The tin-foil hat hypothesis scares me. If they base Elo on C-bill earnings, that means purchasing premium time and premium 'mechs is actually hurting your earnings.

I sincerely hope that, if C-bills or XP is a metic, that they use the raw values instead of the premium-adjusted values.

cs_kami · Jun 4, 2013

Archwright, interesting possible point on premium time since it is not in the calculations. Matheos, were all of your drops premium or regular?

I don't have nearly as much data. i'm only working on my 3rd Chassis, but i master all of my mechs. Will contribute what i can.
Interesting that tonnage is proportional to average damage dealt, but not proportional to wins. Does it mean that on average, most drops have teams of somewhat equal weight already?

Matheos · Jun 4, 2013

cs_kami;9536 said:

Archwright, interesting possible point on premium time since it is not in the calculations. Matheos, were all of your drops premium or regular?

I don't have nearly as much data. i'm only working on my 3rd Chassis, but i master all of my mechs. Will contribute what i can.
Interesting that tonnage is proportional to average damage dealt, but not proportional to wins. Does it mean that on average, most drops have teams of somewhat equal weight already?
Click to expand...

Mix of both. I think the advanced stats have been out for 4-5 months? I've been on premium for two months out of that time. I can't say I've "felt" a difference when being on or off of premium.

Archwright;9534 said:

The tin-foil hat hypothesis scares me. If they base Elo on C-bill earnings, that means purchasing premium time and premium 'mechs is actually hurting your earnings.

I sincerely hope that, if C-bills or XP is a metic, that they use the raw values instead of the premium-adjusted values.
Click to expand...

C-bill generation is dependent on pretty much all the metrics used for ELO (I assume): damage, kills, wins, losses, assists. I would not be surprised if there is a correlation between C-bills and rankings due to that.

The premium time and the hero mech bonuses are both multipliers, so the question is whether or not they come into play. As I said above, I don't think premium has an effect.

One thing that I have thought for a long time is that the ELO calculation is delayed and not done instantaneously. When I first started my X-5, I won my first 7 games and then lost my next 7. If it was getting calculated immediately, I doubt that streaks like that would occur. It could be on a set time scale (i.e. every 30 or 45 minutes) or after a certain number of matches (4-5?). It would be a logical strategy on the devs to do that sort of thing for either A) reducing server load or B) using a moving average so that a single bad or great game doesn't throw the number out of whack.

I played 20 games split between my 4G and 4H last night (same build) and after 10 games on each they were essentially identical in performance. I was dropping with one and then the other the whole time - perfect alternation. Tonight I'll do 10 straight on one and then 10 straight on the other to see what happens.

Matheos · Jun 5, 2013

Archwright;9534 said:

I sincerely hope that, if C-bills or XP is a metric, that they use the raw values instead of the premium-adjusted values.
Click to expand...

This might not make you happy.

I decided to take my founder's 4G and a 4H and run them side by side. Same build: AC/20, 2 ML, 1 MPL, STD200. First, I alternated games on each one for a total of 20 to see if there was a discrepancy between the two. The stat lines were:

4G: 10 games, 5 wins, 3678 damage, 11 kills, 6 deaths
4H: 10 games, 4 wins, 3798 damage, 11 kills, 5 deaths

Totally identical.

The next step was to do a set of games on each. Per your idea, I decided to track CB and XP gain for the 4G for a total of 20 games. After that, I did the same with the 4H.

For the most part, I noticed that with the 4H chassis the games I was put in tended to be more evenly matched. The games with the 4G tended to be extremely variable, as noted by the wide range and jaggedness of the blue line.

The green and purple lines represent a running average in these graphs. The CB gain for both chassis stabilized after about 10 games and remained in a rough window a little above the 100k CB point. The final average was 110k for the 4G and 105k for the 4H. I don't see a 25% difference here.

W/L records for these matches? 10/10 for the 4G and 13/7 for the 4H. Almost identical CB generation.

What about XP though?

I was winning more with the 4H so it's total XP generation was almost 200 XP higher per game (694 average vs. 513 average).

The difference didn't show up when I was using both my founder's and a normal mech in tandem, but did show up when I was playing a single mech for a string of games. I can continue to run my mechs and chart data, but I'm only one person. If someone else can corroborate this behavior, we might have a big, big problem here. All you need to do is track XP and CB gain as you play matches with a single mech and calculate averages.

jay · Jun 21, 2013

Fascinated by the analysis. Very cool!

So I see your read-only template for your data, but how do I populate it with my own?

I almost exclusively PUG drop and have done close to 3,000 matches since they started tracking stats. Most of it was grinding out chassis (I've got every chassis Mastered except Commando and Quickdraw), but happy to share some of that data if it's useful.

Telxas · Jun 21, 2013

Awesome graphs and analysis, I would like to download these too
Just one question though : what if the random mode selection changed something ? I wouldn't be surprised if a chassis did better on assault than conquest and vice versa. What do you think ?

But then again, thanks for your work !

Matheos · Jun 23, 2013

jay said: ↑

Fascinated by the analysis. Very cool!

So I see your read-only template for your data, but how do I populate it with my own?

I almost exclusively PUG drop and have done close to 3,000 matches since they started tracking stats. Most of it was grinding out chassis (I've got every chassis Mastered except Commando and Quickdraw), but happy to share some of that data if it's useful.
Click to expand...

I can send you a PM with the link to the editable worksheet. I didn't want to put the full editable out publicly. If you want it, let me know.

Update:

In the last couple weeks I started to track the XP and CB gain of my mechs through a series of 20 games much like I did the hunchbacks. I found some major discrepancies in how they were generating rewards from mech to mech. With the release of the Quickdraws this week, I also got clued-in to why it was occurring.

[size=small]Two conclusions I have that are new:

[size=small]The ELO system is attempting to balance by XP gain but doesn't do a great job in some cases.

[size=small]The hero mech win/loss discrepancy is an artifact of something else. So the hero mechs are NOT being treated differently.

[size=small]I will probably make a new clean thread on this issue later today or tomorrow.

Matheos · Jun 25, 2013

Alrighty here. With the release of the Quickdraws last week, I thought it would be a good opportunity to post an update.

These are the new updated plots for Damage vs. Win Percentage and Tonnage vs. Win Percentage with my current numbers.

My founder's Hunchback hasn't been getting played much (334.9), but we have a new low performer on the graph: the Quickdraw-5K. Also bringing up the rear are the Dragon Fang (268.5) which I picked up during the sale a couple weeks ago and the Quickdraw-4H (315.1) which, in their current state, are drastically underperforming.

[img width=640 height=466]http://imageshack.us/scaled/medium/833/gui8.png[/img]

There still isn't much of a trend in term of tonnage and wins/losses, which is good to see. However, do you happen notice a problem here? My new trio of 60 ton mechs are all doing terribly in the win column (the top value is my Flame Dragon).

I decided to check out to see if there was a trend within weight classes, as ELO is currently matched by weight class rather than the exact weight values themselves. To this end, I developed the idea of "tonnage differential": how many tons is a mech above or below the average value for it's weight class. Here's the breakdown.

Lights: 20-35 (27.5 average)
Mediums: 40-55 (47.5 average)
Heavy: 60-75 (67.5 average)
Assault: 80-100 (90 average)

For example, a Centurion is 50 tons in a 47.5 average ton weight class. So it would have a tonnage differential of +2.5. Here's a plot of all my current mechs using this metric.

[img width=640 height=466]http://imageshack.us/scaled/medium/515/yye.png[/img]

Oh boy. There really isn't a trend in the overall tonnage graph, but there is a significant trend within the individual weight classes themselves! Also, none of my mechs with a -7.5 differential or lower break the .600 win barrier. The -2.5, 0, and +2.5 groupings are fairly evenly spread, while the +10 differential mechs are all above .550. The slope of the line indicates that there could be a 10% swing in win percentage between bottom and top tonnage mechs within a single weight class.

My original idea at the start of the thread was that Hero mechs might have been getting treated differently. That may only be partially true... but only by association.

My current hero mechs (with differentials) are: The X-5 (-7.5), Death's Knell (-2.5), Fang (-7.5), Flame (-7.5), Ilya Muromets (+7.5), Yen Lo Wang (+2.5), & Pretty Baby (-10). What are my top two performers in win percentage and damage done? The Muromets and Yen Lo Wang... the only hero mechs I own that have a positive tonnage differential.

So what's the takeaway from this?

The ELO and matchmaking system, as it stands, attempts to match opponents and calculate rankings by weight class. It does not take into account the fact that, say, a 75 ton heavy can be a far different animal than a 60 ton heavy, generally speaking. My results suggest (this could still be due to random effects)that mechs at the bottom of an individual weight class are essentially operating with a matchmaking "penalty". My hero mechs were simply stuck in the middle of an underlying issue.

I still don't really trust my founder's Hunchback given the match to match data I presented previously, but it is arguably the worst chassis of that mech. It gets a pass for now.

The -2.5 mechs seem to be okay and there are no -5 mechs available (85 tons), so only -7.5 and -10 mechs are suffering by my numbers: Spiders, Cicadas, Dragons, Quickdraws, and Awesomes. On the flip side, +7.5 and +10 mechs (the Jenner, Raven, and Atlas) are receiving a "passive benefit" of sorts that results in more beneficial matchmaking and a higher win percentage.

What could I recommend to help deal with this?

I don't expect the matchmaking to be changed anytime soon, so we're going to have to live with it for now. If you plan to play with a lighter mech within a weight class (20, 40, 60, 80 tons), dropping with friends would help counteract the matchmaking issues. Organization and good group play will trump any imbalances.

If you need cash or GXP and are going to solo drop, I'd stick to playing heavier mechs within a weight class (35, 50, 70, 100 tons). You should be winning more often due to the imperfections of matchmaking system and generating more cash as a result. Once tonnage limits are implemented this perceived discrepancy should go away.

Log in or Sign up

Statistical Analysis: Performance & Matchmaking

Matheos Active Member

Michael Grand Poobah

YaoYaoYiffy Active Member

Matheos Active Member

Matheos Active Member

Lan Mech Wrangler

Regina Redshift Sass Elemental

cs_kami Benefactor

Matheos Active Member

Matheos Active Member

jay New Member

Telxas Junior Member

Matheos Active Member

Matheos Active Member

Log in or Sign up

Statistical Analysis: Performance & Matchmaking

Matheos Active Member

Michael Grand Poobah

YaoYaoYiffy Active Member

Matheos Active Member

Matheos Active Member

Lan Mech Wrangler

Regina Redshift Sass Elemental

cs_kami Benefactor

Matheos Active Member

Matheos Active Member

jay New Member

Telxas Junior Member

Matheos Active Member

Matheos Active Member

Useful Searches