# Project Work Answers of Linear Regression Method Business Analysis Case Study
Understand the dataset
The dataset contains several attributes of the houses in Melbourne along with …

### Preview text

Business Analysis Case Study
Understand the dataset
The dataset contains several attributes of the houses in Melbourne along with their
prices. Since the focus of this dataset is the price, it is better to get an overview of the
price column first.
Using the describe function on the price column to get an o verview in terms of basic
statistics. The average house price is approximately 5.6 million.
– Shape of the dataset:
(2000, 22), where rows are 2000 and 22 features are there.
– Features included in the dataset:
‘ID’, ‘Suburb’, ‘Address’, ‘Rooms’, ‘Type’, ‘Pr ice’, ‘Method’, ‘SellerG’, ‘Date’,
â€˜Distance’, ‘Postcode’, ‘Bedroom2’, ‘Bathroom’, ‘Car’, ‘Landsize’, ‘BuildingArea’,
â€˜YearBuilt’, ‘CouncilArea’, ‘Lattitude’, ‘Longtitude’, ‘Regionname’, ‘Propertycount’
– Numerical and the categorical features:
– Categori cal Variables:
– ‘Suburb’, ‘Address’, ‘Type’, ‘Method’, ‘SellerG’, ‘CouncilArea’,’Regionname’
Identifying the null values:
– In the building area, councilarea and age column has the maximum
number of the null values, we decided to drop the rows corresponding to
it.
– Observation of the Age value of the property
– Maximum value of the age is 192.
– Price distribution:
– Detecting th e outlier present in the column.
Observation:
Median prices for houses are over 1M,townhomesare 800k – \$900k and units
Median prices for houses are over 1M,townhomesare 800k –
900kandunitsareapprox 500k.
Home prices with different selling methods are relatively the same across the
board.
Median prices in the Metropolitan Region are higher than than that of Victoria
Region – with Southern Metro being the area with the highest median home price
(~ 1.3M).âˆ—Withanaveragepriceof 1M, historic ho mes (older than 50 years old) are
valued much higher than newer homes in the area, but have more variation in
price.
Task 2: Relationships discovery among features
– From the dataset, observe the relation among the two features of the dataset
– We did bivariate analysis, to observe the feature relation with the price variable:
– Bivariate analysis:
Bivariate analysis is one of the simplest forms of quantitative analysis. It involves
the analysis of two variables, for the purpose of determining the empirical
relationship between them. Bivariate analysis can be helpful in testing simple
hypotheses of association.
– Relation between rooms and price, x = ‘Rooms’, y = ‘Price’
Rooms which is having the 4 value, they will contain the maximum variation in
the price variable, also there is outlier value present for the room, whose value is
6.
– Distance and Price
– They are positively correlated with each other.
– Relation among the bathroom and price
– Relation among the car and price
– Landsize and price
– Building area and price
– Age and price
– Propertycount and price
– Variable correlation using the correlation coefficient
– From the above chart we can say that price, is positively correlated with
Age, Longitude,BuildingArea, Car, Bathroom, and Rooms.
– The above factor can be leading input features for the predicting the price.
The Housing Mar ket(s) refers to the supply and demand of houses including the
mortgage market and house prices. This case study answered many essential
questions about housing prices, mortgage, interest rates, speculative demand, supply of
housing, affordability, and eco nomic growth. The stakeholders will get a valuable
cognizance of the housing prices utilizing the plots and graphs presented in this project.
– After performing the initial analysis and data understanding of the dataset, need
to predict the house prices, b ased on the input attributes.
– To predict the house prices using the input available feature, implemented the
linear regression method.
– Linear Regression:
In statistics, linear regression is a linear approach for modeling the relationship
between a scalar response and one or more explanatory variables (also known
as dependent and independent variables).
Linear regression is a linear model, e.g. a model that assumes a linear
relationship between the input variables (x) and the single output variable (y).
Mo re specifically, that y can be calculated from a linear combination of the input
variables (x).
Linear regression is defined as the process of determining the straight line that
best fits a set of dispersed data points:
– Linear regression is used to deve lop the model and using that model it used to
predict the new data (i.e unknown price factor corresponding to the input
features)
– To measure the correctness of the model used the different matrices.
1. MAE: The Mean absolute error represents the average of th e absolute
difference between the actual and predicted values in the dataset. It
measures the average of the residuals in the dataset.
2. MSE: Mean Squared Error represents the average of the squared
difference between the original and predicted values in th e data set. It
measures the variance of the residuals.
3. RMSE: Root Mean Squared Error is the square root of Mean Squared
error. It measures the standard deviation of residuals.
– Coefficient factor in the linear regression model:
In linear regression, coefficients are the values that multiply the predictor values.
For the above model the linear regression model attribute coefficients are:
– Model graph:
A residual plot is a graph that shows the residuals on the vertical axis an d the
independent variable on the horizontal axis. If the points in a residual plot are
randomly dispersed around the horizontal axis, a linear regression model is
appropriate for the data; otherwise, a nonlinear model is more appropriate.
– Residual Error Graph: It is normally distributed. An error distribution is a
probability distribution about a point prediction telling us how likely each
error delta is. The error distribution can be every bit as important than the
point prediction. QUALITY: 100% ORIGINAL PAPER – NO PLAGIARISM – CUSTOM PAPER