Machine Learning – IT Assignment Help
Assignment Task

Overview of the Data set

The data set is called ”Smartphone-Based Recognition of Human Activities and Postural Transitions” and was taken from the UCI repository.

It is an activity recognition data set built from the recordings of 30 subjects performing basic activities and postural transitions while carrying a waist-mounted smartphone with embedded inertial sensors. The features are composed of parameters calculated from the signals obtained through the inertial sensors, and the activity is recorded as the output variable to be predicted.

The data set consists of 561 features (independent variables), 1 dependent variable and 7767 observations. The dependent variable has categorical integer values ranging from 1 to 12; hence, this is a classification problem. And since the number of classes is larger than two, it is a multi-class classification. Before loading the data file into R, labels were assigned to the variables, and the file was saved as a csv file. The label ”activity” was allocated to the class variable, and the independent variables were assigned with numerical labels ranging from 1 to 561 for better representation and easier manipulation of the large number of features.

Loading the Data

The data set, containing all the variables and labels, was saved as a csv file named ’train_num.csv’ and it was loaded into a variable called data.

1 > loadData=function(csvfile) { read.csv(csvfile,head=T,sep=’,’,stringsAsFactors=F) } # Function to load the data

2 > data=loadData(’train_num.csv’) # Load the data

We then perform few steps to inspect the data to make sure that it was loaded properly. The dimensions of the data frame was found to be 7767×562, which corresponds to 7767 rows (number of observations) and 562 columns (number of variables). The number of columns includes the 561 features plus 1 column for the class variable. We also check that the labels are assigned correctly using the names(data) command. Since we labeled the independent variables with numerical values, R automatically inserts an ”X” prior to the label as shown in the output. We can also see that the last column is the class variable, and its name is “activity”. All the features (independent variables) columns have continuous numerical values, so they are non-categorical.

