we all know what the best type is (it’s psychic. prove me wrong.), but what does the data say?
sometimes, data science is about making hard choices, or earning a company tons of money. other times though, data science can just be a fun hobby in a rainy weekend.
this week i’ve been very busy with a college assignment: a few classmates and i have had to code the pagerank algorithm — the one google used to use for searches, before ai and nlp ate everything else.
what is pagerank?
pagerank is an algorithm used to get a ranking for connected parts of a system — perfect for ranking websites, its original purpose.
based on its original task, it measures and ranks the importance or influence of many nodes (websites) that link to each other (edges).
the algorithm takes as its input a directed graph, and returns a ranking of its nodes -along with some scorings between 0 and 1- with a few criteria:
- you rank higher if more nodes link to you.
- linking to another node is less relevant if you link to more nodes.
- being linked by higher ranked nodes is better.
this is coherent with the idea that a big site, like medium, will be linked by a lot of sources, whereas being linked by a big site (say, facebook’s home page) also means your site’s pretty relevant. you could also use it to model relevance of scientific papers or publications (using citations as the links), or probabilities that an animal gets eaten in a given ecosystem (with a food chain as graph).
so i’d been meaning to write an article about something fun, and i had this pagerank implementation laying around… i couldn’t miss the chance.
getting the data
first i got down to getting the data.
since all i wanted to model was the type relationships, i was about to use bulbapedia (pokémon’s wiki) — but then i figured someone else had probably already coded that bit. effectively, a few seconds of search got me to this awesome link, from which i took the python matrix of type advantages in the game.
this uses the types and relationships from the 6th generation. i haven’t gotten around to playing the 7th one yet, so i didn’t mind.
having the raw matrix already loaded in python, i had to give it the correct format: the links in the kind of graph pagerank takes as input carry no weight — they’re binary: either you link to something or you don’t (if there are different implementations of pagerank that address this, please let me know in the comments!).
in case you haven’t played pokémon or you’re fuzzy on the details, each type can either be neutral towards another type (most types are mutually neutral), have an advantage (does 2x the damage), a disadvantage (does 1/2 the damage) or an immunity (receives 0 damage).
the way i moved that format (4 different relationships) to a binary domain was simply doing two different graphs: one for attackers, and one for defenders. i also bunched immunity and resistance into the same category (defensive advantage, if you will) — i hope that won’t offend any hardcore fans.
there’s also another issue: pagerank doesn’t take into account sites linking to themselves, and so we can’t use data about a pokémon type being weak to itself or effective against itself. (full disclosure: i didn’t realize this in the first draft of this article, a reader made me notice my mistake. results have changed.)
here are the results i got from modeling attacking types. i only linked type a to type b if type a was very effective against type b, without linking any types to themselves (since that goes againt pageranks preconditions).
as i anticipated, psychic came out in the top 3. dragon had a pretty bad spot, which surprised me a lot, except one of dragon’s advantages is being very effective against other dragons, which can’t really be captured by this algorithm. electric type came out last, though that should be evened out by water being the most frequent pokémon type.
meanwhile, the results for defending types were also quite surprising:
dragon comes out last, with ice close on its tail. on the other hand fighting and bug come out on top, being some of the types with the most resistances.
even though bug is not a very strong defensive type, it is resistant to three different types, and one of them is fighting, which came out first on the ranking, so that kinda explains why it would be so close to the top.
(disclaimer: in the previous draft i said ice was resistant to dragon, when it’s just dragon being weak to ice. sorry.)
i feel like the results aren’t too intuitive though, and maybe a different model would work better. for instance, this model can’t capture a type being weak against itself, nor does it deal with how much more frequently a water-type pokémon will appear than, say, a dragon type one.
here’s a link to the raw results, in case you wanna visualize them differently.
trying out different metrics
i am quite happy with my results on the attacking side, but there’s no way a ranking of defensive pokémon with fighting on the top can be that good. so i set out to look for other metrics.
i thought maybe a simple metric of
sum (damage modifier)*(#pokémon of that type) over every type
could yield more intuitive results, akin to an ‘expected damage’. let’s see what i got.
i found a redditor who kindly collected the quantity of pokémon of each type, so i am citing him as my source.
for every type, i am counting each pokémon twice if they’re weak against it, 0 if they’re immune, by half if they’re resistant, and once otherwise.
- bug is the second weakest.
- grass comes out last. it came out first in the other ranking.
- rock, ground, ice and flying come as the very close top-4.
- rock has done well in both attacking metrics.
it would appear even though grass-type is stronger against many types, there are less pokémon of those types. rock seems to be consistently good though.
in the future, i may look again at these numbers without considering legendary pokémon, since they can’t be used in competitions anyway.
if i do the opposite, and calculate the ‘expected damage’ a pokémon should receive based on type alone, these are the key insights:
- steel is the best defending pokémon type, probably due to its immunity to poison and many resistances. it wins by an over 20% margin against the second type.
- the worst defensive type is, ironically, ice with a 50% more expected damage than steel.
- ground and rock, which are consistently good damage dealers, come across as very bad defending types, which keeps the playground fair.
- unsurprisingly, bug is not only a bad attacker, it is also a bad defender.
personally, i find these rankings to be more intuitive than the first ones.
for those who want to see the entire results of this analysis, the numbers are available in this gist.
after seeing these numbers, i feel like i have a better intuition about the pokémon types. to be honest, what has surprised me the most is how fair this system is: the best pokémon by some metrics are the worst by others, and most just fall in the middle. some of the assumptions most player make have been confirmed: bug is not a very strong type (its lack of strong attacks would make it even weaker, if that factored into the analysis), rock and ground are very good glass-cannons (so keep teaching earthquake to non-ground types pokémon), and most types just fall in the middle. also, steel makes for a pretty sweet tank.
in the future, i may come back to this and do another, more granular analysis looking into the state of the current meta, or maybe looking at pokémon stats by type, especially dividing them into special tanks, physical tanks, special dps and physical dps.
i hope you’ve found this entertaining, and maybe even thought provoking.
i will gladly receive any criticisms or ideas. what do you think about this article, or these results? what other problems do you think could be well modeled by pagerank? what other analysis do you think would be fun to do?
please let me know the answers to all that in the comments!
follow me for more data science and analytics articles, and if you liked reading this please consider supporting my writing habits.