This is a summary of the GapMinder data and the chosen attributes/features for the analysis project.
Step 1: Describe your sample.
The dataset I am using is the GapMinder data, collected by the GapMinder foundation (created by Ola Rosling, Anna Rosling Rönnlund and Hans Rosling). The GapMinder dataset includes demographic data for all 192 member states of the United Nations as well as 24 other geographical regions.
The populations studies in the GapMinder data are the populations of the 215 geographic regions, and the level of analysis is the population of that region/aggregate statistics on that population. There are 215 observations in the dataset, the 192 UN members + 24 other regions. I am using all 215 observations in the dataset for my analysis, examining trends in their populations.
Step 2: Describe the procedures that were used to collect the data.
The GapMinder dataset contains information about a variety of population metrics like income per person, CO-2 emissions, employment rates, internet use rate, and life expectancy. This data was collected from a variety of sources such as the US Census Bureau’s International Database, the World Bank, and the United Nations Statistics Division. They were mainly collected through data reporting and surveys. The data was collected from 2002 to 2011, aggregated trough government reports and surveys done by independent organizations like the World Bank. The data was collected across the 215 geographical regions mentioned above.
Step 3: Describe your variables.
I’m curious to see if there might be some relationship between internet use rate and life expectancy, or beyond that if there might be a relationship between internet use rate and employment rate. This means that the explanatory variable is internet use rate and it tracks the percentage of a country’s population that has access to the internet. Meanwhile, life expectancy tracks the the average lifetime of a person within the various countries and employment rate tracks the percentage of a population that are employed. The scales for internet use rate and employment rate run from 0 to 100%, while life expectancy begins at 0 and runs upwards with an upper bound of approximately 82 years. The explanatory and response variables were made numeric and any blanks were replaced with NaN values.