GameInformer Review Analysis

Analyzing reviews of media is a good way to get acquainted with data analysis and data science ideas/techniques. Because of the tendency for reviewers to give numerical scores to produces they review, it can be fairly easy to visualize how certain other features may correlate with these scores. This blog post will serve as an demonstration of how to analyze and explore data, using review data gathered from GameInformer.

I scraped the review data from GameInformer myself, and what follows is the results of the visualization/analysis. If you are curious about how the scraping was done:

First, the review URLS were collected from the main site.

Next, the review data was collected. Following this, some preprocessing was done to remove duplicate entries and non-standard characters from the review data.

This blog will also act as a quick primer on how to visualize data with Pandas and Seaborn.

Let’s start off by making all the necessary imports.

import pandas as pd
import seaborn as sns
from matplotlib import pyplot as plt
from collections import Counter
import csv

Now we need to load in the data. We also need to get dummy representations of any non-numeric values.

GI_data = pd.read_csv('GIreviewcomplete.csv', encoding = "ISO-8859-1")
GI_data2 = pd.get_dummies(GI_data)

We can get a violin plot for every reviewer. The width of the violins shows how often a particular score was assigned, while the height and length of a violin show the possible range of scores.

#---display violin plot for author and avg score

sns.lmplot(x='author', y='score', data=GI_data, hue='author', fit_reg=False)
sns.violinplot(x='author',y='score', data=GI_data)
plt.xticks(rotation=-90)
plt.show()

It’s possible to filter the database by multiple criteria. For instance, we could return every review over a certain score by a certain reviewer.

EF_reviews = GI_data[(GI_data.author == 'Elise Favis') & (GI_data.score >= 8.0)]
print(EF_reviews)

title score author \ 48 Heaven's Vault 8 Elise Favis 58 Photographs 8 Elise Favis 102 Bury Me, My Love 8 Elise Favis 158 Life Is Strange 2: Episode 1 ? Roads 8 Elise Favis 243 Minit 8 Elise Favis 262 Where The Water Tastes Like Wine 9 Elise Favis 266 Florence 8 Elise Favis 276 Subnautica 8 Elise Favis 286 The Red Strings Club 8 Elise Favis 405 Tacoma 8 Elise Favis 442 Perception 8 Elise Favis 485 Thimbleweed Park 8 Elise Favis 506 Night In The Woods 8 Elise Favis 586 Phoenix Wright: Ace Attorney - Spirit of Justice 8 Elise Favis 678 Day of the Tentacle Remastered 8 Elise Favis r_platform o_platform \ 48 PC PlayStation 4 58 PC iOS, Android 102 Switch PC, iOS, Android 158 PC PlayStation 4, Xbox One 243 PC PlayStation 4, Xbox One, Switch 262 PC PlayStation 4, PC 266 iOS Xbox One, PC 276 PC PlayStation 4, Switch, PC 286 PC Xbox One, Switch, PC 405 PC Xbox One 442 PC PlayStation 4, Xbox One 485 PC Xbox One 506 PlayStation 4 PC 586 3DS PC 678 PlayStation 4 PlayStation Vita, PC publisher developer \ 48 Inkle Inkle 58 EightyEight Games EightyEight Games 102 Playdius The Pixel Hunt, Arte France, FIGS 158 Square Enix Dontnod Entertainment 243 Devolver Digital JW, Kitty, Jukio, and Dom 262 Good Shepard Entertainment Dim Bulb Games, Serenity Forge 266 Annapurna Interactive Annapurna Interactive 276 Unknown Worlds Entertainment Unknown Worlds Entertainment 286 Devolver Digital Deconstructeam 405 Fullbright Fullbright 442 The Deep End Games The Deep End Games 485 Terrible Toybox Terrible Toybox 506 Finji Infinite Fall 586 Capcom Capcom 678 Double Fine Productions Double Fine Productions release_date rating 48 April 16, 2019 Teen 58 April 3, 2019 NaN 102 January 10, 2019 Everyone 10+ 158 September 27, 2018 Mature 243 April 3, 2018 Everyone 262 February 28, 2018 Not rated 266 February 14, 2018 Everyone 276 January 23, 2018 Everyone 10+ 286 January 22, 2018 Rating Pending 405 August 2, 2017 Teen 442 May 30, 2017 Mature 485 March 30, 2017 Teen 506 February 21, 2017 Teen 586 September 8, 2016 Teen 678 March 22, 2016 Teen

Countplots and barplots are useful ways of visualizing data. Countplots just plot the count of your chosen variable, whereas bar plots compare two chosen variables with each other.

plt.figure(figsize=(10, 4))
scores = sns.countplot(x='score', data=GI_data2)
plt.xticks()
plt.show(scores)

plt.figure(figsize=(10, 4))
r_platform = sns.countplot(x="r_platform", data=GI_data)
plt.xticks(rotation=-90)
plt.show(r_platform)

plt.figure(figsize=(10, 4))
rating = sns.countplot(x="rating", data=GI_data)
plt.xticks(rotation=-90)
plt.show(rating)

plt.figure(figsize=(10, 4))
plt.xticks(rotation=-90)
barplot_score = sns.barplot(x="rating", y="score", data=GI_data)
plt.show(barplot_score)

Seaborn also contains a handy distribution function, and in this case we can see the relative distribution of GameInformer review scores. Most of their assigned scores have been an 8.

sns.distplot(GI_data2['score'])
plt.show()

Swarmplots show the individual instances of Y given X, or in this case of a certain score given a certain rating.

plt.figure(figsize=(16, 6))
factorplot_score = sns.swarmplot(x="rating", y="score", data=GI_data, hue='rating')
plt.xticks(rotation= -90)
plt.show(factorplot_score)

Let’s do some analysis of publisher and developer statistics. We’d want to start by getting a list of publishers and developers. Let’s just get the 100 most common.

publishers = Counter(GI_data['publisher'])
developers = Counter(GI_data['developer'])

pub_list = []
dev_list = []

for item, freq in publishers.most_common(100):
    pub_list.append(item)

for item, freq in developers.most_common(100):
    dev_list.append(item)
    
print(pub_list[:10])
print("--------------")
print(dev_list[:10])
[' Nintendo', ' Telltale Games', ' Square Enix', ' Ubisoft', ' Sony Computer Entertainment', ' Devolver Digital', ' Warner Bros. Interactive', ' Microsoft Game Studios', ' Electronic Arts', ' Bandai Namco']
--------------
[' Telltale Games', ' Nintendo', ' Capcom', ' Square Enix', ' EA Tiburon', ' Atlus', ' Visual Concepts', ' EA Canada', ' Dontnod Entertainment', ' Ubisoft Montreal']

If we make a publisher and developer dataframe, we can transform those dataframes by getting the mean and median values of their scores. We can then merge those back into the original dataframe to get a dataframe sorted by publisher which also has the publisher’s mean and median scores. We could do the same thing for developers.

pub_df = pd.DataFrame(index=None)
dev_df = pd.DataFrame(index=None)

# append rows for individual publishers to dataframe

def custom_mean(group):
    group['mean'] = group['score'].mean()
    return group

def custom_median(group):
    group['median'] = group['score'].median()
    return group

for pub in pub_list:
    scores = pd.DataFrame(GI_data[(GI_data.publisher == pub) & (GI_data.score)], index=None)
    pub_df = pub_df.append(scores, ignore_index=True)

pub_mean_df = pub_df.groupby('publisher').apply(custom_mean)
pub_median_df = pub_df.groupby('publisher').apply(custom_median)
pub_median = pub_median_df['median']
pub_merged_df = pub_mean_df.join(pub_median)
print(pub_merged_df.head(3))
pub_merged_df.to_csv('pub_merged.csv')

for dev in dev_list:
    scores = pd.DataFrame(GI_data[(GI_data.developer == dev) & (GI_data.score)], index=None)
    dev_df = dev_df.append(scores, ignore_index=True)

dev_mean_df = dev_df.groupby('developer').apply(custom_mean)
dev_median_df = dev_df.groupby('developer').apply(custom_median)
dev_median = dev_median_df['median']
dev_merged_df = dev_mean_df.join(dev_median)
dev_merged_df.to_csv('dev_merged.csv')
print(dev_merged_df.head(3))
                                               title  score  \
0                          Fire Emblem: Three Houses      9   
1        Marvel Ultimate Alliance 3: The Black Order      7   
2  Cadence of Hyrule ? Crypt of the Necrodancer F...      7   

              author r_platform o_platform  publisher              developer  \
0  Kimberley Wallace     Switch        NaN   Nintendo    Intelligent Systems   
1      Andrew Reiner     Switch        NaN   Nintendo             Team Ninja   
2     Suriel Vazquez     Switch        NaN   Nintendo   Brace Yourself Games   

     release_date           rating     mean  median  
0   July 26, 2019              NaN  7.73913     7.0  
1   July 19, 2019             Teen  7.73913     7.0  
2   June 13, 2019   Rating Pending  7.73913     7.0  
                                              title  score             author  \
0                The Walking Dead: The Final Season      7  Kimberley Wallace   
1    The Walking Dead: The Final Season ? Episode 1      7  Kimberley Wallace   
2  Batman: The Enemy Within Episode 5 ? Same Stitch      9      Javy Gwaltney   

      r_platform                       o_platform        publisher  \
0  PlayStation 4             Xbox One, Switch, PC   Telltale Games   
1             PC          PlayStation 4, Xbox One   Telltale Games   
2  PlayStation 4  PlayStation 4, Xbox One, Switch   Telltale Games   

         developer      release_date           rating  mean  median  
0   Telltale Games    March 27, 2019   Rating Pending  7.16     7.0  
1   Telltale Games   August 14, 2018           Mature  7.16     7.0  
2   Telltale Games    March 27, 2018           Mature  7.16     7.0  

Let’s select just the top publishers and see what their mean and median scores are.

dev_data = pd.read_csv('dev_merged.csv')
dev_data = dev_data.drop(dev_data.columns[0], axis=1)
developer_names = list(dev_data['developer'].unique())
#print(developer_names)

dev_examples = pd.DataFrame(index=None)

for dev in developer_names:
   dev_examples = dev_examples.append(dev_data[dev_data.developer == dev].iloc[0])

#print(dev_examples.to_string())

dev_stats = dev_examples[['developer', 'mean', 'median']]
print(dev_stats.head(20))
                    developer      mean  median
0              Telltale Games  7.160000     7.0
25                   Nintendo  7.714286     7.0
39                     Capcom  7.857143     9.0
46                Square Enix  8.111111     9.0
55                 EA Tiburon  6.666667     7.0
61                      Atlus  7.666667     7.0
67            Visual Concepts  7.666667     7.0
76                  EA Canada  7.000000     7.0
80      Dontnod Entertainment  7.000000     7.0
83           Ubisoft Montreal  7.000000     7.0
87              Firaxis Games  8.428571     9.0
94                   TT Games  7.666667     7.0
97     Blizzard Entertainment  9.000000     9.0
104             From Software  8.714286     9.0
111                Game Freak  7.000000     7.0
113   Double Fine Productions  7.666667     7.0
116       Intelligent Systems  8.200000     9.0
121                      SEGA  8.600000     9.0
126            HAL Laboratory  7.000000     7.0
127            Spike Chunsoft  6.000000     6.0

Another interesting thing we could to is get the worst reviewed games, games that have received less than a 4.

# how you can filter for just one criteria and then pull out only the columns you care about
# print(GI_data[GI_data.score <= 4].title)
# this is actually the preferred method...
print(GI_data.loc[GI_data.score <= 4, 'title'])
69               R.B.I. Baseball 19
124     Overkill's The Walking Dead
134                   The Quiet Man
217               Tennis World Tour
220                           Agony
259               Fear Effect Sedna
301                  Hello Neighbor
343              Raid: World War II
483              R.B.I. Baseball 17
488                 Old Time Hockey
489                 Old Time Hockey
503                      1-2-Switch
589                    One Way Trip
612             Ghostbusters (2016)
637       Homefront: The Revolution
722                   Devil's Third
767                        Armikrog
815                        Godzilla
820     Payday 2: Crimewave Edition
933              Escape Dead Island
935       Sonic Boom: Rise of Lyric
1045          Rambo: The Video Game
1063                         Rekoil
Name: title, dtype: object

We could also get only published by Nintendo by filtering out results where ‘Nintendo’ doesn’t appear in ‘publisher’ with isin.

# can use "isin" in a series...
# spaces are in this, be sure to include them

r = GI_data[GI_data['publisher'].isin([' Nintendo'])][:40]
print(r)

title score \ 0 Fire Emblem: Three Houses 9 5 Marvel Ultimate Alliance 3: The Black Order 7 6 Dr. Mario World 8 19 Super Mario Maker 2 8 23 Cadence of Hyrule ? Crypt of the Necrodancer F... 7 43 BoxBoy! + BoxGirl! 8 64 Yoshi's Crafted World 8 79 Tetris 99 8 101 Mario & Luigi: Bowser's Inside Story + Bowser ... 8 103 New Super Mario Bros. U Deluxe 8 112 Super Smash Bros. Ultimate 9 123 Pokémon: Let's Go, Pikachu 8 144 The World Ends With You: Final Remix 7 151 Super Mario Party 7 161 Xenoblade Chronicles 2: Torna - The Golden Cou... 7 193 Tetris 99 8 196 WarioWare Gold 8 199 Octopath Traveler 8 202 Captain Toad: Treasure Tracker 8 203 Splatoon 2: Octo Expansion 8 209 Mario Tennis Aces 8 212 Pokémon Quest 6 215 Sushi Striker: The Way of Sushido 7 216 Dillon's Dead-Heat Breakers 7 236 Donkey Kong Country: Tropical Freeze 9 246 Detective Pikachu 7 254 Kirby Star Allies 6 300 The Legend Of Zelda: Breath Of The Wild ? The ... 8 308 Xenoblade Chronicles 2 7 335 Super Mario Odyssey 9 338 Fire Emblem Warriors 7 377 Metroid: Samus Returns 9 378 Monster Hunter Stories 8 408 Miitopia 7 410 Hey! Pikmin 6 413 Splatoon 2 8 418 The Legend Of Zelda: Breath Of The Wild ? Mast... 7 426 Ever Oasis 8 431 Arms 8 447 Fire Emblem Echoes: Shadows of Valentia 7 author r_platform o_platform \ 0 Kimberley Wallace Switch NaN 5 Andrew Reiner Switch NaN 6 Ben Reeves iOS Android 19 Kyle Hilliard Switch NaN 23 Suriel Vazquez Switch NaN 43 Ben Reeves Switch NaN 64 Brian Shea Switch NaN 79 Kyle Hilliard Switch Xbox One, PC 101 Kyle Hilliard 3DS Xbox One 103 Brian Shea Switch PC, iOS, Android 112 Jeff Cork Switch PlayStation 4, PC 123 Brian Shea Switch PlayStation 4 144 Kimberley Wallace Switch PC 151 Brian Shea Switch PlayStation 4, Xbox One, Switch 161 Joe Juba Switch PlayStation 4, Switch, PC, Mac 193 Kyle Hilliard Switch PlayStation 4, Xbox One, Switch 196 Kyle Hilliard 3DS PlayStation 4, PlayStation Vita 199 Joe Juba Switch PlayStation 4, Xbox One, Switch, Mac, iOS 202 Ben Reeves Switch 3DS 203 Brian Shea Switch 3DS 209 Kyle Hilliard Switch PlayStation 4, PC 212 Brian Shea Switch iOS, Android 215 Kyle Hilliard Switch 3DS 216 Kyle Hilliard 3DS 3DS 236 Kyle Hilliard Switch Wii U 246 Ben Reeves 3DS PlayStation 4, Xbox One, Switch 254 Kyle Hilliard Switch Xbox One, PC 300 Suriel Vazquez Switch Xbox One, PC 308 Joe Juba Switch PlayStation 4, Xbox One 335 Andrew Reiner Switch Xbox One, Switch, PC 338 Javy Gwaltney Switch PlayStation 4, PC 377 Ben Reeves 3DS Xbox One, PC 378 Daniel Tack 3DS Xbox One, PC 408 Jeff Cork 3DS Xbox One 410 Ben Reeves 3DS PlayStation 4 413 Brian Shea Switch PC 418 Javy Gwaltney Switch Xbox One, PC, Mac, iOS, Android 426 Kyle Hilliard 3DS Xbox One, PlayStation Vita 431 Brian Shea Switch PlayStation 3 447 Javy Gwaltney 3DS Xbox One, Switch, PC publisher developer release_date \ 0 Nintendo Intelligent Systems July 26, 2019 5 Nintendo Team Ninja July 19, 2019 6 Nintendo Nintendo July 10, 2019 19 Nintendo Nintendo June 28, 2019 23 Nintendo Brace Yourself Games June 13, 2019 43 Nintendo HAL Laboratory April 26, 2019 64 Nintendo Good Feel March 29, 2019 79 Nintendo Arika February 13, 2019 101 Nintendo AlphaDream January 11, 2019 103 Nintendo Nintendo January 11, 2019 112 Nintendo Sora, Ltd December 7, 2018 123 Nintendo Game Freak November 16, 2018 144 Nintendo Square Enix, h.a.n.d. October 12, 2018 151 Nintendo Nintendo October 5, 2018 161 Nintendo Monolith Soft September 14, 2018 193 Nintendo Arika February 13, 2019 196 Nintendo Nintendo, Intelligent Systems August 3, 2018 199 Nintendo Square Enix, Acquire July 13, 2018 202 Nintendo Nintendo July 13, 2018 203 Nintendo Nintendo June 13, 2018 209 Nintendo Camelot Software June 22, 2018 212 Nintendo Game Freak May 29, 2018 215 Nintendo Nintendo, indies zero June 8, 2018 216 Nintendo Vanpool May 24, 2018 236 Nintendo Retro Studios May 4, 2018 246 Nintendo Creatures Inc. March 23, 2018 254 Nintendo HAL Laboratory March 16, 2018 300 Nintendo Nintendo December 7, 2017 308 Nintendo Monolith Soft December 1, 2017 335 Nintendo Nintendo October 27, 2017 338 Nintendo Koei Tecmo TBA 377 Nintendo MercurySteam September 15, 2017 378 Nintendo Capcom September 8, 2017 408 Nintendo Nintendo July 27, 2017 410 Nintendo Nintendo July 28, 2017 413 Nintendo Nintendo July 21, 2017 418 Nintendo Nintendo July 30, 2017 426 Nintendo Grezzo June 23, 2017 431 Nintendo Nintendo June 16, 2017 447 Nintendo Intelligent Systems May 19, 2017 rating 0 NaN 5 Teen 6 Everyone 19 NaN 23 Rating Pending 43 Everyone 64 Everyone 79 Everyone 101 Everyone 103 Everyone 112 Mature 123 Everyone 144 Teen 151 Everyone 161 Teen 193 Teen 196 Everyone 10+ 199 Teen 202 Everyone 203 Everyone 10+ 209 Everyone 212 Everyone 215 Everyone 216 Everyone 236 Everyone 246 Everyone 254 Everyone 10+ 300 Everyone 308 Teen 335 Everyone 10+ 338 Rating Pending 377 Everyone 10+ 378 Everyone 10+ 408 Everyone 410 Everyone 10+ 413 Everyone 10+ 418 Everyone 426 Everyone 10+ 431 Everyone 10+ 447 Rating Pending

Finally, let’s do a swarmplot of scores by Nintendo associated developers. We’ll take games published by Nintendo and plot the score given some developers.

plt.figure(figsize=(8, 10))
dev_score = sns.swarmplot(x="developer", y="score", data=r, hue='developer')
plt.grid()
plt.xticks(range(50),rotation=-90)
plt.show(dev_score)

If you’d like to get better acquainted with visualizing data, I suggest checking out the documentation of Pandas and Seaborn and trying to visualize some simple datasets, such as the ones below:

Iris Dataset

Pokemon with Stats

Mushroom Dataset

Did It Rain In Seattle?

Leave a comment

Design a site like this with WordPress.com
Get started