Please answer all questions in R language in RStudio, questions are below, thank you!
#======= Question 1 (1 Point) ======= # Q1-1. Spread the data out to multiple columns, with the shop type (Starbucks vs. Dunkin Donuts) being the key column and the number of shops being the value column. # Q1-2. Delete observations that contain at least one missing value or invalid value (e.g., negative income).
# Q2-1. Create a scatter plot to examine the relationship between house prices and Starbucks. # Q2-2. Repeat Q2-1 for Dunkin Donuts.
# Q3. Build a linear regression model to predict house prices based on the number of Starbucks and Dunkin Donuts. # Consider both predictors at the same time.
# One might argue that neighborhoods where Starbucks are located are relatively rich. # We want to examine if Starbucks still has a predictive power for house prices, even after controlling for household incomes and population. # Q4-1. Create new variables by taking log to median_income and population. # Q4-2. Build a linear regression model to predict house prices based on the number of Starbucks and Dunkin Donuts as well as (logarithm of) household incomes and population. # Q4-3. Do you think considering median income and population improves the linear regression model?
#======= Question 5 (2 Point) ======= # The dynamics of house prices might vary across counties. # Q5-1. Split (facet) the plot resulting from Question 2 by county. # Q5-2. Add the county variable to the previous linear regression model (from Question 4).
#======= Question 6 (2 Point) ======= # Note that answers without any explanations will be given panelty. # There is no a particular answer to this question. Any "reasonable" answers based on your analyses are acceptable. # Q6-1. Do you think the coefficients of Starbucks and Dunkin Donuts remain significant after considering the county information? # Write your opinion briefly by commenting (#). # Q6-2. According to the linear regression model in Question 5, which county has the highest average house prices? # Write your opinion briefly by commenting (#). # Q6-3. Calculate average house prices by county. Which county has the highest average house prices? # If the result seems different from the linear regression, how would you interpret it?