See attached below.

**1.** Open dataset “mortgage_rates_and_house_prices” that depicts 30-year fixed rate mortgage interest rate and an index that summarizes house prices in the U.S., between 1975 and 2023, observed quarterly. Use these data to answer all of the following:

a.

**Generate** the clock variable that starts at 1 in the first row and increases by one in each subsequent row until the last row.

**Name** this clock variable “time”. Then,

**declare **dataset as time series using this clock variable.

b. Use the “time” variable in a simple

**regression setting** to answer two separate questions concerning linear trends of these two series:

i. What is the

**change** in the house price index per each new quarter, on average?

ii. What is the

**change** in the mortgage rate per each new quarter, on average?

c.

**Depict** the mortgage rate and house price index on

**two**

**separate** plots.

**Show** them. Using

**eyeballs**

**only**, do those time series variables look stationary or non-stationary to you?

**Why**?

**How** can you tell?

**Explain** in plain English.

d. Use

**two different flavors of the Dickey-Fuller test** – the DF test

**with trend** and the DF test

**with drift** – to determine whether mortgage rates and house prices are either non-stationary, trend-stationary, or difference-stationary. What do you

**conclude**?

**Explain** your reasoning.

i.

**State the null and the alternative **hypotheses of the DF test with trend.

ii.

**State the null and the alternative **hypotheses of the DF test with drift.

iii.

**Explain** why you rejected or failed to reject the null hypotheses.

e. If both mortgage rates and house prices are non-stationary, check to see if they are cointegrated

**using the Engle-Granger procedure**. You will recall that a

**regression** is involved in this procedure, so use good judgment to decide which series will be dependent, and which independent variable.

i. Describe every step, in words, that you are taking in your .do file.

f. Based on the Engle-Granger test output, would running a regression between the house price index and mortgage rates result in spurious results?

**Why or why not? Explain**.

g. Based on the results of the DF tests

**, conduct the appropriate treatment** to induce stationarity in mortgage rates and house prices. Then,

**depict** their stationary versions on two separate time series plots.

**Show** them.

h. Now, create a

**detrended** version of house prices.

**Name** is “detrended_prices”.

i. Then,

**subject** the detrended house prices to the

**DF test with drift.**

ii.

**What** are null and alternative hypotheses? What is the

**conclusion of this test?**

iii.

**Show** the detrended house prices on a time series plot.

i. Now, create the

**differenced** version of house prices.

**Name** it “differences_prices”.

i. Then,

**subject** the differenced house prices to the

**DF test with drift.**

ii.

**What** are null and alternative hypotheses? What is the

**conclusion of this test?**

iii.

**Show** the differenced house prices on a time series plot.

j. Now,

**extract a cointegrating relationship** between house prices in level (raw form, not detrended or differences) and mortgage rates in level (raw form, not detrended or differences) using a simple vector error correction (VEC) model. Again, use the appropriate variable as dependent here, and the other as independent.

i. Note the exact

**cointegrating equation** in Stata output once you run this model.

ii.

**Write** that equation out.

iii.

**Interpret** the coefficient of the independent variable here.

iv. Recall that unless any variable is explicitly logged, you default to a lin-lin interpretation, whatever the unit the variable is measured in.

**2.** Open dataset “airtravel_monthly” that gives a long-run time series of

*monthly* domestic airline passengers in the United States from October 2002 through July 2023.

a.

**Generate** the clock variable that starts at 1 in the first row and increases by one in each subsequent row until the last row.

**Name** this clock variable “time”. Then,

**declare** dataset as time series using this clock variable.

**b. Run the regression** between current number of passengers as dependent, linear trend as independent, and also X lags of the number of passengers as independent variables, where X should be

**identified correctly using the Akaike Information Criterion (AIC).**

c. Then,

**generate** the

**predicted (fitted)** values of the number of passengers, your dependent variable.

d.

**Place** both actual and predicted series on the

**same** time series plot.

e. Perform

**Chow structural break test with an
unknown break date
** using the model you estimated part (b). Notice the estimated break date (at the given clock variable value). What month and year does it correspond to? Explain why the test finds what it does.

**3.** Open dataset “Airlines” which contains the daily number of airline passengers and

*daily* number of new COVID-19 cases in the United States between March 1 and October 31, 2020. Use these data to answer the following:

a.

**Generate** the clock variable that starts at 1 in the first row and increases by one in each subsequent row until the last row.

**Name** this clock variable “time”. Then,

**declare** dataset as time series using this clock variable.

b.

**Create** the time series plot of airline passengers.

c. Perform the

**Granger causality test** on between new COVID cases and the number of airline passengers.

**Use the optimal number of lags**, as

**determined by AIC for both variables**, in the vector autoregression (VAR) model to set up the Granger test. Recall that in a VAR, it doesn’t matter what you initially select as dependent and independent variables, since you’re estimating a system where they swap places anyway.

i.

**Describe** every step that you are taking for this test procedure in your .do file.

i.

**State** the

**null**

**hypothesis** of the Granger test.

**State** the

**alternative hypothesis**.

ii. Using

*p* = 0.10 as the threshold for statistical significance (as is sometimes done), what do you

**conclude based on the findings of the Granger test**? Which variable “causes” which, if any? Which doesn’t cause which, if any?

**4.** Carefully address each of the sub-problems, as successful completion of one part requires the preceding part to be correct also. Use dataset

**“Employment_06_07.dta”.**

a.

**Generate a binary dummy variable called “white”** that equals 1 if a person is white, and 0 otherwise. Rely on the non-binary indicator “race” to generate “white”.

b.

**Generate log of earnings** by taking the natural logarithm of “earnwke”. Call it “log_earning”.

c. Run regression with log_earnings as dependent, and “white”, “union”, “age”, “unemployed”, and “female” as independent variables.

**Interpret the magnitude (size) of each effect** (coefficient) for each independent variable. Use robust standard errors.

d. Now, run the same regression, but with an added interaction term between “white” and “female”. Don’t generate the interaction term manually.

**Use the hash symbols instead.**

e. From the above regression in (d), give the actual magnitude (size) of the effect

**of being white** on earnings, when female = 0.

**What is this effect relative to?**

**Be sure to interpret coefficients correctly**. Is this log-lin, lin-log… regression? Use the table in Notes Set 8 to help you with interpretation.

f. From the above regression in (d), give the actual magnitude (size) of the effect

**of being white** on earnings, when female = 1.

**What is this effect relative to? Be sure to interpret coefficients correctly**. Is this log-lin, lin-log… regression? Use the table in Notes Set 8 to help you with interpretation.

h. Give the actual magnitude (size) of the effect on earnings

**of being white female** relative to

**white male**?

i. Use the command that

**starts with** margins, dydx(… to confirm your findings in (e) and (f) on the impact of “white” on earnings when the person is, separately, female and male.

j. Use command marginsplot immediately after the preceding command to

**generate the plot** of these conditional marginal effects.

**Which** is statistically significant?

**Which** is insignificant?

**Explain** how you know.

Page 2 of 2