Business Analysis Case Study

Task 1.

Understand the dataset

The dataset contains several attributes of the houses in Melbourne along with …

### Preview text

Business Analysis Case Study

Task 1.

Understand the dataset

The dataset contains several attributes of the houses in Melbourne along with their

prices. Since the focus of this dataset is the price, it is better to get an overview of the

price column first.

Using the describe function on the price column to get an o verview in terms of basic

statistics. The average house price is approximately 5.6 million.

– Shape of the dataset:

(2000, 22), where rows are 2000 and 22 features are there.

– Features included in the dataset:

‘ID’, ‘Suburb’, ‘Address’, ‘Rooms’, ‘Type’, ‘Pr ice’, ‘Method’, ‘SellerG’, ‘Date’,

â€˜Distance’, ‘Postcode’, ‘Bedroom2’, ‘Bathroom’, ‘Car’, ‘Landsize’, ‘BuildingArea’,

â€˜YearBuilt’, ‘CouncilArea’, ‘Lattitude’, ‘Longtitude’, ‘Regionname’, ‘Propertycount’

– Numerical and the categorical features:

– Categori cal Variables:

– ‘Suburb’, ‘Address’, ‘Type’, ‘Method’, ‘SellerG’, ‘CouncilArea’,’Regionname’

Identifying the null values:

– In the building area, councilarea and age column has the maximum

number of the null values, we decided to drop the rows corresponding to

it.

– Observation of the Age value of the property

– Maximum value of the age is 192.

– Price distribution:

– Detecting th e outlier present in the column.

Observation:

Median prices for houses are over 1M,townhomesare 800k – $900k and units

Median prices for houses are over 1M,townhomesare 800k –

900kandunitsareapprox 500k.

Home prices with different selling methods are relatively the same across the

board.

Median prices in the Metropolitan Region are higher than than that of Victoria

Region – with Southern Metro being the area with the highest median home price

(~ 1.3M).âˆ—Withanaveragepriceof 1M, historic ho mes (older than 50 years old) are

valued much higher than newer homes in the area, but have more variation in

price.

Task 2: Relationships discovery among features

– From the dataset, observe the relation among the two features of the dataset

– We did bivariate analysis, to observe the feature relation with the price variable:

– Bivariate analysis:

Bivariate analysis is one of the simplest forms of quantitative analysis. It involves

the analysis of two variables, for the purpose of determining the empirical

relationship between them. Bivariate analysis can be helpful in testing simple

hypotheses of association.

– Relation between rooms and price, x = ‘Rooms’, y = ‘Price’

Rooms which is having the 4 value, they will contain the maximum variation in

the price variable, also there is outlier value present for the room, whose value is

6.

– Distance and Price

– They are positively correlated with each other.

– Relation among the bathroom and price

– Relation among the car and price

– Landsize and price

– Building area and price

– Age and price

– Propertycount and price

– Variable correlation using the correlation coefficient

– From the above chart we can say that price, is positively correlated with

Age, Longitude,BuildingArea, Car, Bathroom, and Rooms.

– The above factor can be leading input features for the predicting the price.

Task 3.

Business analysis task:

The Housing Mar ket(s) refers to the supply and demand of houses including the

mortgage market and house prices. This case study answered many essential

questions about housing prices, mortgage, interest rates, speculative demand, supply of

housing, affordability, and eco nomic growth. The stakeholders will get a valuable

cognizance of the housing prices utilizing the plots and graphs presented in this project.

– After performing the initial analysis and data understanding of the dataset, need

to predict the house prices, b ased on the input attributes.

– To predict the house prices using the input available feature, implemented the

linear regression method.

– Linear Regression:

In statistics, linear regression is a linear approach for modeling the relationship

between a scalar response and one or more explanatory variables (also known

as dependent and independent variables).

Linear regression is a linear model, e.g. a model that assumes a linear

relationship between the input variables (x) and the single output variable (y).

Mo re specifically, that y can be calculated from a linear combination of the input

variables (x).

Linear regression is defined as the process of determining the straight line that

best fits a set of dispersed data points:

– Linear regression is used to deve lop the model and using that model it used to

predict the new data (i.e unknown price factor corresponding to the input

features)

– To measure the correctness of the model used the different matrices.

1. MAE: The Mean absolute error represents the average of th e absolute

difference between the actual and predicted values in the dataset. It

measures the average of the residuals in the dataset.

2. MSE: Mean Squared Error represents the average of the squared

difference between the original and predicted values in th e data set. It

measures the variance of the residuals.

3. RMSE: Root Mean Squared Error is the square root of Mean Squared

error. It measures the standard deviation of residuals.

– Coefficient factor in the linear regression model:

In linear regression, coefficients are the values that multiply the predictor values.

For the above model the linear regression model attribute coefficients are:

– Model graph:

A residual plot is a graph that shows the residuals on the vertical axis an d the

independent variable on the horizontal axis. If the points in a residual plot are

randomly dispersed around the horizontal axis, a linear regression model is

appropriate for the data; otherwise, a nonlinear model is more appropriate.

– Residual Error Graph: It is normally distributed. An error distribution is a

probability distribution about a point prediction telling us how likely each

error delta is. The error distribution can be every bit as important than the

point prediction.

- Assignment status: Already Solved By Our Experts
*(USA, AUS, UK & CA PhD. Writers)***CLICK HERE TO GET A PROFESSIONAL WRITER TO WORK ON THIS PAPER AND OTHER SIMILAR PAPERS, GET A NON PLAGIARIZED PAPER FROM OUR EXPERTS**

QUALITY: 100% ORIGINAL PAPER – **NO PLAGIARISM** – CUSTOM PAPER

## Recent Comments