MGT202 Organizational Behavior Assignment
Delivery in day(s): 3
Sinking of Titanic boat was one of the most famous phenomenons which have created one of the most memorable historic events. There were so many passengers travelling through Titanic boat and some of them could survive while other lost their life. In present paper we would try to analyze survival rate of the passengers travelling through Titanic boat, factors which were of immense importance in determining their survival and developing a predictive model in order to determine chances of survival for the passengers based on specific variables.
Present paper would make usage of data collected for passengers in form of their age, sex, pclass, number of siblings in boat, number of parents in boat, body identification number and passenger fair etc. Rattle data mining tool would be used in order to analyze the data with specific tools such as correlation, decision tree and principal component analysis etc. These tools would be helpful in order to find out the variable of importance for the passengers which are helpful to ensure higher survival rate for the passenger. Further predictive modeling would be done based on the decision tree so that a model can be developed in order to predict whether under given circumstance a particular passenger would have survived or not.
Present section would identify the key elements which are of immense importance in order to find out key components of defining the survival rate for the passengers. Table below provides the summary statistics for the variables included in the business analysis for the data collected in order to find survival rate for the passengers.
Parameter 
Age 
Sibsp 
Parch 
Body 
fare 
Pclass 
Min 
0.33 
0.0 
0.0 
1 
0.0 
1 
1^{st} quartile 
21 
0.0 
0.0 
80 
7.89 
2 
Median 
28 
0.0 
0.0 
169 
14.45 
3 
Mean 
29.81 
0.49 
0.37 
168 
32.07 
2.3 
3^{rd} quartile 
38 
1.00 
0.0 
261 
30.50 
3 
Max 
80 
8.0 
9.0 
328 
512 
3 
Na 
186 
0.0 
0.0 
827 
1 
0.0 
Table 1: Showing summary statistics for the key variables
As shown in table above that there are six variables for which summary statistics has been made i.e. age, sibsp, parch, body, fare and pclass. For age variables 25% of the people have age below then 21 years while 25% of the passengers were having age more than 38 years. There were three passenger classes in the boat i.e. 1, 2 and 3. Rests of the variables are self explanatory.
Count of Sr. no 
Survived 

PCLASS 
0 
1 
Grand Total 
1 
123 
200 
323 
2 
158 
119 
277 
3 
528 
181 
709 
Grand Total 
809 
500 
1309 
Table 2: Showing cross tab for survival rate of the passenger against the passenger class
As shown in the table above and chart below that passenger in class 1 were having highest survival rate followed by class 3 and class 2. In terms of probability class 1 was having highest probability for passenger survival.
Figure 1: Showing cross tab for survival rate of passengers against passenger class
Figure 2: Showing principal component analysis for variables for Titanic sinking
From the figure 2 and table 3 it can be clarified that there are in total 5 factors which are responsible for variance in data. While PC 1 is responsible for 28.3% variance in data, PC 2 responsible for 25.5% variance in data, PC 3 responsible for 20.7% variance in data, PC 4 responsible for 15% variance and PC 5 are only able to explain 10% of total variance in data.
PC 
PC1 
PC2 
PC3 
PC4 
PC5 

SD 
1.190 
1.130 
1.019 
0.867 
0.717 

Variance 
0.283 
0.255 
0.207 
0.150 
0.102 

Cumm. Prop. 
0.283 
0.538 
0.746 
0.897 
1.000 

Factor 
PC1 
PC2 
PC3 
PC4 
PC5 

Age 
0.455 
0.180 
0.579 
0.641 
0.115 

Sibsp 
0.572 
0.222 
0.532 
0.115 
0.570 

Parch 
0.051 
0.761 
0.268 
0.112 
0.576 

Fare 
0.555 
0.037 
0.383 
0.735 
0.034 

Body 
0.392 
0.579 
0.401 
0.143 
0.571 

Table 3: Showing variance in data for the five principal components
These five principal components responsible for variance in data are age of the passengers, no of siblings aboard, no of parents/children aboard, fare paid by the passengers and port of embarkation. From the above table priority for each principal component can be found and as explained below in their decreasing priority level:
Figure 3: Showing correlation among the principal components
Factor 
Age 
Sibsp 
Parch 
Fare 
Survived 
Body 
Age 
1.000 
0.267 
0.147 
0.199 
0.052 
0.125 
Sibsp 
0.267 
1.000 
0.360 
0.151 
0.016 
0.129 
Parch 
0.147 
0.360 
1.000 
0.176 
0.078 
0.0670 
Fare 
0.199 
0.151 
0.176 
1.000 
0.233 
0.044 
Survived 
0.052 
0.016 
0.078 
0.233 
1.000 
NA 
Body 
0.125 
0.129 
0.067 
0.044 
NA 
1.000 
Table 4: Showing correlation data among the principal components identified
From the table 4 and figure 3 above it can be justified that variables parch and sibsp are having high degree of positive correlation which means that such passengers were travelling with their complete family. Further variable fare and survival were also having positive correlation means that higher the fare paid by the passenger higher was the chances for survival. Further there was high degree of negative correlation among the variables named age and sibsp which means that higher the number of siblings aboard lesser was the age of the passenger.
Figure 4: Showing correlation cluster for the different variables
From the figure of correlation cluster it can be suggested that there is high degree of correlation between the two variables named body identification number and survival rate of the passengers. After that variables parch and sibsp are having high degree of correlation which was even displayed through the correlation table earlier as well.
In order to develop decision making regarding the importance variables contributing to the determination of survival for the passengers decision tree can be framed which would provide variables of importance.
As shown in the figure above that with help of rattle data mining tool conditional decision tree can be plotted which shows survival rate for the passengers boarded in the Titanic boat based on the several variables of importance such as the fare paid by the passenger, parents/children boarded in the boat, no of sibling boarded in the boar and embarkation. All these variables are of prime importance in order to make decision regarding survival of the passengers which is also justified through principal component analysis method. As shown in the decision tree above that there is one root node and some intermediate nodes and finally there are leaf nodes which shows the chances of survival for the passengers. Each leaf node made in the decision tree represents a set of passengers showing similar character tics. There are in total 9 nodes which have been formed with the conditional basis with the variables of importance (Kantardzic, 2003).
Root node shows the amount of fare paid by each passenger and based on the condition for fair paid by the passenger survival rate can be divided among the two different nodes. The first condition for division among the passengers is based on the fair paid with amount 25.467. Node 2 consists of the passengers who have paid either equal or less than 25.467 of fair for the boarding while node 5 consist of the passengers who have paid fair more than 25.467. Sub nodes under each node i.e. node 2 and 5 also follows the similar trend for the passenger fair.
Second level of bifurcation among the nodes has been done based on the embarkation as there are three levels of embarkation i.e. C, S and Q. Passengers having embarkation as C or Q are classified under node 3 while passengers having embarkation as S are classified under node 4. Similarly from the node 5 division among the further node has been done based on the factor that variable sibsp has value less than or equal 2 or sibsp value is higher than 2. For passengers having sibsp value higher than 2 node 9 is represented while for passengers having sibsp value less or equal 2 node 6 represents such passengers with further bifurcation based on the parch values. For the parch values less than or equal to 1 node 7 represents the passenger class while for the passengers having parch values more than 1 node 8 represents such set of passengers. Actual survival rate for the passengers can be predicted through the five leaf nodes formed in the decision tree while intermediate nodes are just used in order to make bifurcation among the variables values and survival rate can’t be predicted through intermediate nodes.
For each leaf node survival rate among the passengers is shown through 1 to 3 point scale with higher the values shows higher probability of the passenger to survive in the boat and vice a versa. Among all the five nodes i.e. node 3, 4, 7, 8 and 9, node 8 shows the highest level of survival rate among the passengers with total of 15 passengers showing the similar exhibits. Hence from the predictive modeling formed with help of data it can be said that passengers having fare more than 25.467, sibsp <=2 and parch>1 are having highest survival rate among the other passengers boarded in the boat.
After leaf node 8, node 7 shows the high rate of survival among the passengers with the passengers exhibits as the fare >25.467, sibsp<=2 and parch<=1. Hence node 7 which represents a set of 65 passengers are having lesser survival rate than node 8 while among other passengers these passengers enjoys a much higher survival rate for the exhibits shown by these passengers. Following the similar trend node 4 is having the lesser survival rate in comparison to the node 8 & 7 with characteristics as the fare <=25.467, embarkation as S. Node 4 represents a set of 100 passengers boarding the boat and represents high level of survival rate for these passengers.
While node 9 and node 3 are having a set of 8 and 40 passengers respectively showing similar characteristics and there is very low survival rate for these passengers based on their characteristics as determined through principal components. Hence at each node level there is some decision making involved which can be justified with the help of conditional decision tree and for a particular node conditions can be examined through the four important variables as determined in the decision tree.
Data quality is of immense importance for the data warehouse architecture and defines the success of overall data warehouse architecture developed in the organization. Hence it is important to ensure that data quality should be as per standards and requirements of the data warehouse architecture (Ye, 2003). Data quality has its role in data warehouse as it enhance efficiency of the data warehouse, avoid duplication of work & efforts, saves cost and enhance decision speed for the management using data warehouse reports so as to make their day to day decisions.
Some of the key characteristics which a quality data set must include to form the data warehouse architecture in the organization structure include usefulness, validation, believability, accessibility and interpretability. These key qualities of data can be explained as given below:
Role of data quality in data warehouse is through the various benefits offered by quality data for the data warehouse architecture such as the non duplication of work, faster decision power, better understanding, no missing or garbage values and efficiency enhancement for the organization. A quality data would avoid duplication of work as data captured one time can be used for generation of several reports and making interpretation for the data (Ralf and Markus, 2011).
Further with high quality of data it becomes easier for the decision maker to make quick decision based on the better understanding obtained through the continuous data containing no missing values of garbage values. Efficiency enhancement is the major role which a quality data can play for the organization as it saves the time, effort and cost for the organization by implementing high quality of data in data warehouse architecture.
From the pivot table analysis through data presented regarding various state/province, their sales figure for each quarter below graph can be plotted. Key trend from the graph below shows that for quarter 3 all three parameters were high i.e. store cost, store sales and unit sales of the products. For Canada unlike general trend highest values of store cost, store sales and unit sales have been observed into quarter 3. While for Mexico quarter 3 was having highest figures for store cost, store sales and unit sales but for USA quarter 1 was having highest values.
Values 

Row Labels 
Store Cost 
Store Sales 
Unit Sales 
1998 
432565.7289 
1079147.47 
509987 
Quarter 1 
116512.6905 
290873.18 
137078 
Canada 
9576.6446 
23881.13 
11160 
Mexico 
47502.2264 
118589.41 
56133 
USA 
59433.8195 
148402.64 
69785 
Quarter 2 
115080.3318 
287009.99 
135745 
Canada 
11072.1808 
27685 
12885 
Mexico 
45683.9482 
113830.59 
54005 
USA 
58324.2028 
145494.4 
68855 
Quarter 3 
118322.14 
295040.55 
139412 
Canada 
10915.5866 
27176.3 
12966 
Mexico 
49267.9496 
122706.05 
57872 
USA 
58138.6038 
145158.2 
68574 
Quarter 4 
82650.5666 
206223.75 
97752 
Canada 
7768.1585 
19303.03 
9146 
Mexico 
30133.9206 
75167.54 
35904 
USA 
44748.4875 
111753.18 
52702 
Grand Total 
432565.7289 
1079147.47 
509987 
Unit Sales 
Column Labels 

Row Labels 
Canada 
Mexico 
USA 
Grand Total 
Breakfast Foods 
1453 
6594 
8502 
16549 
Cereal 
556 
2585 
3499 
6640 
Pancake Mix 
112 
701 
844 
1657 
Pancakes 
156 
603 
865 
1624 
Waffles 
629 
2705 
3294 
6628 
Grand Total 
1453 
6594 
8502 
16549 
From the table above it can be stated that among the three countries maximum of all products i.e. cereals, pancake mix, pancake and waffles are sold into USA. Hence overall consumption for all the products is highest in USA as compared to other two countries. Further it has been observed that cereal is the most sold food category followed by waffles, pancake mix and pancake. For all countries product consumption pattern is observed similar though there are differences in consumption patter for the individual countries.
Row Labels 
Unit Sales 
Store Sales 
USA 
186899 
396294.93 
OR 
60612 
128598.5 
WA 
126287 
267696.43 
Grand Total 
186899 
396294.93 
From the table above it can be observed that unit sales are more for the Washington state as compared to Oregon. Further it can be identified that similar trends are observed for the unit sales and store sales value because of the reason that prices for the particular unit are same hence store values and unit sales values would be in linear proportion due to which same kind of pattern has been observed for the two states. Further consumption levels are more than doubled in Washington as compared to Oregon hence store sales value are also doubled.
Table below gives the key trend for the beer and wine for the various brands of both the products. It has been found out that unit sales levels are much more for wine as compared to beer levels. Further among the beer sub categories it has been found out that “Good” is the most promising brand followed by the pearl, Portsmouth, walrus and top measure. Similar kind of trend is also observed for wine sub categories where in “Good” is the most sold brand followed by pearl, Portsmouth, top measure and walrus.
Row Labels 
Unit Sales 
Beer and Wine 
13069 
Beer 
3359 
Good 
767 
Pearl 
725 
Portsmouth 
713 
Top Measure 
546 
Walrus 
608 
Wine 
9710 
Good 
2097 
Pearl 
2028 
Portsmouth 
1942 
Top Measure 
1883 
Walrus 
1760 
Grand Total 
13069 
This study explains the survival chances of passengers on the Titanic Ship and significant variables were identified such as Age, Parch, Sibsp and Fare. Using principal component information techniques and using these significant variables a predictive model is developed which helps in understanding the survival chances of passengers on the Titanic.
Rattle Data mining tool is used for this and various correlation coefficients are determined between different variables which helps us in comprehending and analyzing the survival rates of passengers on the Titanic Ship. The Decision tree is developed based on significant variables and different nodes are formed, different nodes explain different survival chances of passengers on the Titanic.
Oz Assignment Help is the best assignment help provider in Australia. Our online assignment writing help Australia is especially dedicated for the students studying in all Australian colleges and universities. Order Now