COSC2670 Practical Data Science Assignment Help

COSC2670 Practical Data Science Assignment Help

COSC2670 Practical Data Science Assignment Help

Data Preparation

In order to correctly analyse data, we need to make sure that the data provided doesn’t have any errors involved. So, we need to check for the inconsistency in data and resolve them using appropriate techniques. Issues can be empty values, whitespaces in data, case sensitive data etc.

So, we need to identify and clean the data to make accurate analysis.

Reading Data

1.First task was to read the data from file. The csv file was having headers at the top followed by the actual data. Hence, we used the pandas read_csv function to read the data and ignoring the headers.

2. On checking the datatypes for all columns, we found the following:

minority        object

age            float64

gender          object

credits         object

beauty         float64

eval           float64

division        object

native          object

tenure          object

students         int64

allstudents      int64

prof             int64

dtype: object

Filling Incorrect/Missing Data

1.For columns involving strings, we found some typos in the provided data e.g. 'yesd' for 'yes' in minority column. Hence, we used pandas replace() function to replace them with correct values.

2.For numerical columns, we filled the NA values with the column 'mean'.

e.g.: data['eval'].fillna(data['eval'].mean(axis=0), inplace=True)

Data Exploration

Note: For 'Beauty' column data can be plotted by taking the approximation as the data is present upto 7 decimal places

Data for particular column

1.For 'Age' field we used a histogram to identify the age distribution as it is a numeric field.

COSC2670 Practical Data Science Assignment Help

2.For 'Eval' field we used a histogram to identify the eval distribution as it is a numeric field.

COSC2670 Practical Data Science Assignment Help

Get More Information -Data Science Assignment Help

3.For 'Students' field we used a histogram to identify the number of students distribution as it is a numeric field.

COSC2670 Practical Data Science Assignment Help

4.For 'All Students' field we used a histogram to identify the number of all students distribution as it is a numeric field.

COSC2670 Practical Data Science Assignment Help

5. For 'Gender' field we used a histogram to identify the gender as it is a choice field.

COSC2670 Practical Data Science Assignment Help

6. For 'Minority' field we used a histogram to identify the minority as it is a choice field.

COSC2670 Practical Data Science Assignment Help

7. For 'Credits' field we used a histogram to identify the credits as it is a choice field.

COSC2670 Practical Data Science Assignment Help

8.For 'Division' field we used a histogram to identify the division as it is a choice field.

COSC2670 Practical Data Science Assignment Help

9.For 'Native' field we used a histogram to identify the native as it is a choice field.

COSC2670 Practical Data Science Assignment Help

10. For 'Tenure' field we used a histogram to identify the tenure as it is a choice field.

COSC2670 Practical Data Science Assignment Help

Individual Plots

1. Age vs Eval

This plot will determine how age is influencing the eval score of a teacher. We will use a line plot to determine this.

COSC2670 Practical Data Science Assignment Help

Get More information - Case Study Assignment Help

2.This plot is to show if beauty of a teacher has its impact on students.

COSC2670 Practical Data Science Assignment Help

3.This plot is a comparative analysis of students, all students and prof on eval.

COSC2670 Practical Data Science Assignment Help

Scatter Plot

COSC2670 Practical Data Science Assignment Help