This topic and these data pertain to the required reading: “Why Nations Fail” Chapter 1 by
Acemoglu and Robinson and Acemoglu, Johnson and Robinson. “The Colonial Origins of
Comparative Development: An Empirical Investigation.” American Economic Review, 91
no 5 (2001): 1369-1401.
The dataset consists of an amalgamation of the data used in the Colonial Origins paper, a
second paper (Daron Acemoglu, Simon Johnson, James A. Robinson. “Reversal of Fortune:
Geography and Institutions in the Making of the Modern World Income Distribution.” The
Quarterly Journal of Economics, Vol. 117, No. 4 (Nov., 2002), pp. 1231-1294) and some
information on slave exports from a paper by Nathan Nunn, “Long-term Effects of the
Slave Trade” Quarterly Journal of Economics (2008): 123(1): 139–176.
The observations are for 163 countries around the world for different time periods. Most of
the variables are from a fairly current time period (1995), while some are historic, dating
from the year 1500 or 1900.
The variables are listed alphabetically in the spreadsheet. A legend describing the variable
name appears on the second sheet of the excel workbook. I have highlighted the historic
variables, which include the institutions proxy variables (constraint on executive; democracy
variables); settler mortality; population density; urbanization; yellow-fever epidemics variable.
French or English colony and ex-colony may also be relevant.
There are variables which reflect geography and climate, which are (at least cross-country)
likely fairly time invariant: humidity, temperature, soil classifications; landlocked & amount
of territory within 100km of coast; resources – zinc, silver, iron, oil, gold; latitude. There is
also a set of continent variables – Africa, Asia, Europe and the Americas dummy variables
and a continent variable identifying the various continents. Current economic variables
include gdp per capita measures, urbanization, life expectancy at birth, infant mortality rate;
as well as other measures, like malaria exposure, religion.
The objective of the assignment is to prepare you for the data work expected for the essay
As described in the general instructions for the essays:
“The best papers are coherent – the graphs chosen relate to the literature described and
attempt to uncover patterns in the data. For instance, imagine you choose to examine the
relationship between birthweight and income. The literature you read pointed out that other
factors like gestation affects birthweight. It also noted gestation was correlated with income.
Hence the overall relationship might be biased by the gestation-income relationship – the
fact that average gestation varied by income. To investigate, you might show the overall
birthweight-income relationship in a graph, but then also illustrate the birthweight-gestation
relationship, the gestation-income relationship and finally the birthweight-income
relationship for different gestation categories. In other words, really explore that relationship
and find instructive ways to take into account the other factors that might influence the
relationship you are examining.”
1. Choose three variables – an X & Y (the main relationship of your investigation) and
a Z. Choose Z such that you are arguing that Z is also a determinant of Y
(Y=f(X,Z) and it is also correlated with X, hence in order to carefully investigate the
X-Y relationship, you need to control for Z. Choose non-binary variables for X &
Y. Identify the variables (state them).
2. Graph the relationship between X and Y as a scatterplot.
3. Construct a table that reports the mean of Y for intervals of X [can use pivot table]
4. Graph Y&X with mean Y on Y axis & intervals of X (grouped X) on X axis. [using
information from #3 above].
5. Construct a table that reports the count of Y for (same) intervals of X
6. Graph #5 [histogram]
7. Now you will introduce Z: show that Y = f(Z): if Z is a binary (0,1) variable,
construct a table reporting mean Y and number of observations (count) for Z=0 and
Z=1; if Z is non-binary, produce a table reporting mean Y & count for grouped Z
(e.g., low Z, high Z – essentially transform the Z to a binary) [use pivot table and
group the row variable]. (note: scatter may also be useful with a non-binary Z)
8. Show that Z = f(X): construct a table reporting mean X and number of observations
(count) for Z=0 and Z=1. To better illustrate the variation in the data, produce a
table reporting the count (number of observations) of Z for the (same as above)
intervals of X – for Z = 0 and Z = 1. Show this table in a graph.
9. Now show Y = f(X, controlling for Z). Produce a (pivot) table of mean Y for
intervals of X by Z (for Z = 1 and Z = 0). Graph the information in the table –
include ‘grand total’ (overall relationship) and mean Y for Z=0 and mean Y for Z=1
(over intervals of X). In other words, show the Y=f(X|z=0) and Y =f(X|z=1) and
Y = f(X).
Graphs and tables should be labelled properly. Adjust axes in cases where the information
in the graph is unclear. Make sensible choices regarding graph type – line graphs if the X
variable is continuous; bar graphs if the variable is categorical (e.g., summer, winter, fall,
The post Economics: Regression Analysis
Assignment status: Solved by our experts