HI6008 Data Collection and Analysis Assignments Solution

HI6008 Data Collection and Analysis Assignments Solution

HI6008 Data Collection and Analysis Assignments Solution

Overview Of Data Collection

This research will rely on two sources of data. The first source will be previously done work on deterioration and condition assessment of railway infrastructure. These research works will provide a comparative view for the eventual business model developed in this paper.

The information from the previously done work will also give an idea on how to develop the most reliable model for predicting the deterioration of railway infrastructure as well as assessing its condition.

The research works will work as gauges for the accuracy of the data and analysis done in this research paper. The error in the analysis can be evaluated by considering analysis and conclusions arrived at in previous works.

The data collected in previous works also form a reliable pool of data for consideration in present research.

The second source will be the data on the railway infrastructure in Australia. This will be data on the metrics of the railway infrastructure. These railway infrastructure parameters are; Track Geometry, Ballast, Sleepers, Rails, Speeds, Load Capacity and Weather Conditions.

These parameters are going to enable an understanding of the critical factors of the railways infrastructures that are both subject to deterioration and causes of deterioration. The data on these parameters is going to be collected from records kept by the relevant agencies, authorities and organizations.

Resources Used For The Research

Resources will be mainly used in the collection of data from one of the two data sources. The collection of data from the work previously done on deterioration of railway infrastructure will be the exception.

We will require resources to obtain data will be the railway infrastructure parameters. The resources required for the parameters will be as follows:

Track Geometry: In order to collect information on the track geometry, Track Geometry Cars will be used. These are vehicles that move on the rails and collect information on the track geometry [ CITATION Dep08 \l 1033 ]. The cars are high speed moving and operate in such a way that they don’t interfere with the operations in the rail network [ CITATION Tra09 \l 1033 ]. These cars will provide this research with information on the condition of the track geometry of the railways.

Ballast: The information on the ballast will be collected using three methods: visual inspection, digital inspection and the use of the LIDAR technology. The visual and digital inspection will work on an almost similar way. Observations will be made on the condition of the ballast either physically and in person or from recorded videos and images of the rails. The LIDAR (in full: Laser Image Detection and Ranging) Technology on the other hand operates by shooting laser light into the ballast and taking measurements of the light that reflects back [ CITATION Her09 \l 1033 \m Sha08 \m Voo10]. This enables the LIDAR to get information on the quality and condition of the ballast.

Sleepers: The information on the Sleepers is also collected using three methods: visual inspection, digital inspection and the use of the ultrasonic energy technology. The visual and digital inspection will work on an almost similar way. Observations will be made on the condition of the sleeper either physically and in person or from recorded videos and images of the rails. The ultrasonic technology operates by shooting ultrasonic energy onto the sleepers and taking measurements of the energy reflected back [ CITATION Mid07 \l 1033 ]. These measurements allow for information on the type and condition of the sleepers to be collected.

Rails: Similar to the sleepers, information on the rails is collected using three methods: visual inspection, digital inspection and the use of the ultrasonic energy technology. The visual and digital inspection will work on an almost similar way. Observations will be made on the condition of the rails either physically and in person or from recorded videos and images of the rails. The ultrasonic technology operates by shooting ultrasonic energy onto the rails and taking measurements of the energy reflected back [ CITATION Mid07 \l 1033 ]. These measurements allow for information on the type and condition of the rails to be collected.

Speeds: The average speeds of the trains is going to be considered for this parameter. Information will be collected from the agencies operating the railway systems on the average speeds of the train that use a particular rail network. This will provide information on the speeds that the rail is regularly subjected to.

Load Capacity: The average load capacity of the trains is going to be considered for this parameter. Information will be collected from the agencies operating the railway systems on the average load capacity of the train that use a particular rail network. Both the passenger and freight trains will be considered in computing the average load capacity. This will provide information on the load capacity that the rail is regularly subjected to.

Weather Condition: The dominant weather condition along a rail network will be considered for this parameter. This information will be collected from the relevant weather agencies throughout Australia. The information will be an indicator on the type of weather and climatic conditions that a rail network is exposed to [ CITATION Krz15 \l 1033 ].

Development of models

Classification of parameters

The diagram below shows the parameters that are going to be considered for the development of the deterioration of the railway infrastructure model in this research.

Figure 1: Broad Classification of Railway Infrastructure Parameters

The parameters for the railway infrastructure can be grouped into two main groups as shown in the diagram in Figure 1 above. The intrinsic parameters can be described as the parameters that are in born to the rail network itself. They can be termed as the internal factors of the rail system. These parameters are also static and hence are not expected to be varying for the analysis of developing the deterioration model. However, these parameters can vary due to maintenance activities.

The other group of the railway infrastructure parameters is the extrinsic parameters. These parameters are the external factors that influence the deterioration of the rail network. These are factors can be either controllable or uncontrollable. The controllable can be said to be adjustable and therefore manageable for the purpose of rail network maintenance. The uncontrollable are however neither adjustable nor manageable.

Figure 2: Breakdown of Intrinsic Parameters of Railway Infrastructure

Figure 3: Breakdown of Extrinsic Parameters of Railway Infrastructure

Figure 2 and Figure 3 above show the breakdown of each of the categories of the parameters of railway infrastructure.

Grading Scale For Defects (Likert Scale Technique)

This research is going to make use of the Likert Scale in order to grade the various parameters of the railway. The Likert Scale is a grading technique for questions designed to evaluate the strength of an attribute [ CITATION Nao11 \l 1033 \m Nor10]. The Likert Scale presents an ordinal measure of an attribute [ CITATION Rei08 \l 1033 ].

This research will apply the Likert Scale by posing the question, and then use the information available to provide an answer on the strength of an attribute.

The Likert Scale will first be applied in determining the weights of the various data sources. The question posed will be on the level of importance of the data source to development of the deterioration model for the railway infrastructure. This will need the use of a questionnaire for a focus group. The focus group may be made up of five peers or five experts in the field of deterioration modelling and preferably railway infrastructure deterioration modelling.

The questionnaire will be of the format below:

Focus Group Questionnaire (Tick Where Applicable)

How would you describe the importance of rail design on the development of a deterioration model for the Australian railway infrastructure?

Extremely Important _

Important _

Averagely Important _

Relatively Important _

Not Important _

How would you describe the importance of external factors on the development of a deterioration model for the Australian railway infrastructure?

Extremely Important 

Important 

Averagely Important

Relatively Important 

Not Important 

The importance of the intrinsic parameters and extrinsic parameters are evaluated in questions one and two respectively in the Focus Group Questionnaire.

The values assigned for the 5-point Likert Scale used for the Focus Group Questionnaire above are as follows:

RESPONSE

VALUE

Not Important

0

Relatively Important

1

Averagely Important

2

Important

3

Extremely Important

4

Table 1: Likert Scale Values for Focus Group Questionnaire

The average for the responses to the Focus Group Questionnaire will then be used as the weights for the data sources.

We will then apply the Likert Scale in determining the state or condition of the rail by considering the data collected on the intrinsic parameters. All the four parameters will be subject to the same Likert Scale. The aim will be to answer the question below:

How would you describe the condition of “parameter x” on “the given” Australian railway network?

Perfect 

Good 

Average 

Poor 

Deplorable 

The “parameter x” would represent the various intrinsic parameters while “the given” represents the specific Australian rail network being observed.

The values assigned for the 5-point Likert Scale used for the intrinsic parameters above are as follows:

RESPONSE

VALUE

Deplorable

0

Poor

1

Average

2

Good

3

Perfect

4

Table 2: Likert Scale Values for Intrinsic Parameters

Each of the three parameter will have separate evaluation and Likert Scale as follows:

For the Speed, the aim will be to answer the question below:

How would you describe the average speed on “the given” Australian rail network?

Very Fast 

Fast 

Average 

Poor 

Very Slow 

The “the given” represents the specific Australian rail network being observed. The range for the speed will be divided into five parts to accommodate the Likert Scale. The values assigned for the 5-point Likert Scale used for the speed parameter above are as follows:

RESPONSE

VALUE

Very Fast

0

Fast

1

Average

2

Slow

3

Very Slow

4

Table 3: Likert Scale Values for Speed Parameter

For the Load Capacity, the aim will be to answer the question below:

How would you describe the average load capacity on “the given” Australian rail network?

Very High 

High 

Average 

Low

Very Low 

The “the given” represents the specific Australian rail network being observed. The range for the load capacity will be divided into five parts to accommodate the Likert Scale. The values assigned for the 5-point Likert Scale used for the load capacity parameter above are as follows:

RESPONSE

VALUE

Very High

0

High

1

Average

2

Low

3

Very Low

4

Table 4: Likert Scale Values for Load Capacity Parameter

For the Weather Condition, the aim will be to answer the question below:

How would you describe the general weather condition on “the given” Australian rail network?

Balanced 

Cold and Dry 

Hot and Dry 

Cold and Wet 

Hot and Wet 

The “the given” represents the specific Australian rail network being observed. The values assigned for the 5-point Likert Scale used for the weather condition parameter above are as follows:

RESPONSE

VALUE

Hot and Wet

0

Cold and Wet

1

Hot and Dry

2

Cold and Dry

3

Balanced

4

Table 5: Likert Scale Values for Weather Condition Parameter

Models and critic

Weighted sum model using likert scale

We can assign the weights for the various data sources as follows:

DATA SOURCE

WEIGHT

Intrinsic Parameters

WI

Extrinsic Parameters

WE

Table 6: Weights for Data Sources

For the intrinsic parameters, say the results are given as follows;

IR for the rail condition.

IS for the sleepers’ condition.

IB for the ballast quality and condition.

IT for the track geometry condition.

Thus,

The intrinsic parameters will be computed as:

……….. Equation 1

For the extrinsic parameters, say the results are given as follows;

ES for the average speed.

EL for the average load capacity.

EW for the weather condition.

Thus,

The extrinsic parameters will be computed as:

………….Equation 2

Hence the aggregate score for the condition of a specific rail network would be computed summing the two equations above as follows:

The lowest possible score for the model above would occur for when the weights and parameters register the lowest scores on the 5-point Likert Scale which is 0. Thus, the lowest possible score in the model would be:

This value would represent the highest level of deterioration of a rail network.

The highest possible score for the model above would occur for when the weights and parameters register the highest scores on the 5-point Likert Scale which is 4. Thus, the highest possible score in the model would be:

This value would represent the lowest level of deterioration of a rail network.

Therefore the resultant grading scale for the model above will be:

SCORE

CONDITION

0-22.4

Highly Risky

22.5-44.8

Risky

44.9-67.2

Relatively Safe

67.3-89.6

Safe

89.7-112

Very Safe

Table 7: Grading Scale for Model

Artificial Neural Network Technique

The Artificial Neural Network (ANN) technique is a mathematical method that mimics the neural networks in the human brain [ CITATION Gal \l 1033 ]. This biological process in the brain is exploited for the mathematical process in order to produce better output or response from a set of input variables.

The Artificial Neural Network is developed as a learning system, the input and response values are known and the purpose is to have the network produce the response from the input in order for it to be robust enough for predicting the unknown responses for future inputs.

The technique assumes the existence of a hidden layer between the input and response variables [ CITATION Sha11 \l 1033 ]. The variables are taken through this hidden layer and eventually produce the output or response. Below is a sample of an artificial neural network [ CITATION Nat12 \l 1033 ].

Input Layer Hidden Layer Response Layer

Figure 4: Artificial Neural Network Sample

To calculate the output, arbitrary values (Weights) are assigned to the nodes H3, H4, H5, R6 and R7. The arrows are also assigned arbitrary values (Aij for i and j in different layers progressive wise) as weights [ CITATION How08 \l 1033 \m ONe13]. These values are usually small to start with so that the system can learn and the values adjusted accordingly.

The resultant output is given using the formula below:

In the equation; g represents the input for the present layer while h represents the output from the previous layer.

For our case for condition assessment of the railway infrastructure in Australia, we have the Artificial Neural Networks below;

Figure 5: Artificial Neural Network for the Intrinsic Parameters

The key for the above Artificial Neural Network is given in table below:

INITIAL

MEANING

TG

Track Geometry

RB

Rail Ballasts

RS

Rail Sleepers

R

Rail

H1

Weight for node H1

H2

Weight for node H2

H3

Weight for node H3

H4

Weight for node H4

H5

Weight for node H5

R1

Weight for node R1

Table 8: Key for Artificial Neural Network for the Intrinsic Parameters

The computations for the Artificial Neural Network for the Intrinsic Parameters is given below:

Response for H1 (Rh1) =1 / {1 + e-(H1 + ∑ (TG * AtgH1) + (RB * ArbH1) + (RS * ArsH1) + (R * ArH1))}

Response for H2 (Rh2) =1 / {1 + e-(H2 + ∑ (TG * AtgH2) + (RB * ArbH2) + (RS * ArsH2) + (R * ArH2))}

Response for H3 (Rh3) =1 / {1 + e-(H3 + ∑ (TG * AtgH3) + (RB * ArbH3) + (RS * ArsH3) + (R * ArH3))}

Response for H4 (Rh4) =1 / {1 + e-(H4 + ∑ (TG * AtgH4) + (RB * ArbH4) + (RS * ArsH4) + (R * ArH4))}

Response for H5 (Rh5) =1 / {1 + e-(H5 + ∑ (TG * AtgH5) + (RB * ArbH5) + (RS * ArsH5) + (R * ArH5))}

For the R1,

Rr1 = 1 / {1 + e-(R1 + ∑ (Rh1 * Ah1R1) + (Rh2 * Ah2R1) + (Rh3 * Ah3R1) + (Rh4 * Ah4R1) + (Rh5 * Ah5R1))

Figure 6: Artificial Neural Network for the Extrinsic Parameters

The key for the above Artificial Neural Network is given in table below:

INITIAL

MEANING

S

Speeds

LC

Load Capacity

WC

Weather Condition

H1

Weight for node H1

H2

Weight for node H2

H3

Weight for node H3

H4

Weight for node H4

R2

Weight for node R2

Table 9: Key for Artificial Neural Network for Extrinsic Parameters

The computations for the Artificial Neural Network for the Intrinsic Parameters is given below:

Response for H1 (Rh1) = 1 / {1 + e-(H1 + ∑ (S * AsH1) + (LC * AlcH1) + (WC * AwcH1))}

Response for H2 (Rh2) = 1 / {1 + e-(H2 + ∑ (S * AsH2) + (LC * AlcH2) + (WC * AwcH2))}

Response for H3 (Rh3) = 1 / {1 + e-(H3 + ∑ (S * AsH3) + (LC * AlcH3) + (WC * AwcH3))}

Response for H4 (Rh4) = 1 / {1 + e-(H4 + ∑ (S * AsH4) + (LC * AlcH4) + (WC * AwcH4))}

For the R2,

Rr2 = 1 / {1 + e-(R2 + ∑ (Rh1 * Ah1R2) + (Rh2 * Ah2R2) + (Rh3 * Ah3R2) + (Rh4 * Ah4R2))}

Figure 7: Joint Artificial Neural Network for Intrinsic and Extrinsic Parameters

Response for H1 (Rhh1) = 1 / {1 + e-(H1 + ∑ (Rr1 * Arh1H1) + (Rr2 * Arh2H1))}

Response for H2 (Rhh2) = 1 / {1 + e-(H2 + ∑ (Rr1 * Arh1H2) + (Rr2 * Arh2H2))}

Response for H3 (Rhh3) = 1 / {1 + e-(H3 + ∑ (Rr1 * Arh1H3) + (Rr2 * Arh2H3))}

For the R1,

Orr = 1 / {1 + e-(Or + ∑ (Rhh1 * Ah1Or) + (Rhh2 * Ah2Or) + (Rhh3 * Ah3Or)}

The Orr value can then be compared to the index in the condition assessment table below:

Category

Index

Description of the Condition

 

Excellent

 

100-85

Not very many imperfections. Track work isn't impeded. No prompt work activity is required, however normal or preventive maintenance could be planned for achievement.

 

Very Good

 

85-70

Minor weakening. Track work isn't disabled. No prompt work activity is required. Be that as it may, normal or preventive support could be planned for achievement.

 

Good

 

70-55

Moderate deterioration. Track capacity might be to some degree disabled. Routine support or minor repair might be required.

 

Fair

 

55-40

Significant Deterioration. Track work is weakened, however not genuinely. Routine support or minor repair is required.

 

Poor

 

40-25

Severe deterioration over a little level of the track. Less extreme weakening might be available in different parts of the track. Track work is truly debilitated. Significant repair is required

 

Very Poor

 

25-10

Critical Deterioration has happened over a huge rate or bit of the track. Less serious disintegration might be available in different bits of the track. Track is scarcely useful. Real repair or not as much as aggregate recreation is required

 

Failed

 

10-0

Extreme Deterioration has happened all through almost all or the whole track. Track is not any more useful. Real repair, finish rebuilding, or aggregate remaking is required

Table 10: Condition Assessment [ CITATION Uza93 \l 1033 ]

 

Chapter 4: Defect Based Models For Condition Assessement

CASE STUDY

RESEARCH WORK

MODEL USED

DESCRIPTION

Case Study of Montreal Rail Network

[ CITATION Lai17 \l 1033 ]

Weighted Sum Model (By use of Analytic Network Process Technique)

The research in this case study applies the use of the ANP (Analytic Network Process) Technique.

This Techniques is applied in order to generate the weights of the various components of the railway infrastructure. The purpose for using the ANP Technique, other than to generate the weights, was to account for the inter-relationship that may exist between the components [ CITATION Saa09 \l 1033 ].

The generated weights were then applied to develop the eventual model for the deterioration of railway infrastructure in the case of the Montreal Rail Network.

Case Study of Lorestan Railway

[ CITATION Zak12 \l 1033 ]

Deterioration Probabilistic Model

For the case study for the Lorestan Railway, the research applies a statistical approach. This approach applied the Markov and Semi-Markov model together with the hazard function to develop the final Deterioration Probabilistic model. The aim of the research was to produce a model with a binary output, that is, 0 for good condition railway infrastructure and 1 for poor condition infrastructure.

Table 11: Summary of Defect Based Condition Assessment Models

MODEL

GRADING

DRAWBACKS

Weighted Sum Model (By use of Analytic Network Process Technique)

E1 and E2 for emergency.

P1, P2, P3 for Priority.

N for Normal

A lot of emphasis is placed on the internal factors, that is, the components of the rail itself.

Little to no attention is given to the external factors that contribute to the deterioration of rail networks.

No data is collected on the consumer end.

Deterioration Probabilistic Model

0 for poor condition.

1 for good condition.

The model is too statistical and lacks a component of logical approach.

The model output or response is far too narrow. The output is either good or bad. This does not give room for other conclusions such as the extent to which a rail network is good or bad.

No data is collected on the consumer end.

Table 12: Grading and Drawbacks of Models

MODEL

CHANGES PROPOSED

IMPLEMENTATION

Weighted Sum Model (By use of Analytic Network Process Technique)

More attention to be given to external factors that contribute to deterioration of rail networks.

Consumer input also considered in the model development.

Collection of data on the external (non-intrinsic) factors contributing to deterioration of rail networks.

Collection of data from commuters that use the rail network system.

Deterioration Probabilistic Model

Provision of more options from the response variable to enable a non-binary evaluation.

Consumer input also considered in the model development.

Re-development of the model to ensure that the response is a categorical output as opposed to a binary output. This will give more options for evaluation of the rail network.

Collection of data from commuters that use the rail network system.

Table 13: Changes Proposed and Implementation

Pros

Cons

Takes into account the external factors that contribute to the deterioration of rail networks.

Collects and factors in data from the consumers, that is, the commuters that use the rail network systems.

Applies a mixed logical and statistical approach by using the Weighted Sum and Likert Scale together.

All the data collected is transformed into ordinal data entries, this may make other analysis, especially parametric statistics analysis, impossible.

Table 14: Pros and Cons of Weighted Sum Model Using Likert Scale

Chapter 5: Improvements To Existing Condition Assessment Approaches

There are two main improvements that can be made to the existing condition assessment approaches. These improvements are:

Inclusion of a broader range of factors that are to be assessed. This give the model reliability and efficiency. It ensures that all aspects of the process have been covered by the model that has been developed. Leaving other factors out of the modelling process leaves a gap in the prediction, which in turn leads to an inaccurate model.

Inclusion of the consumer aspect into the model. Incorporating the consumer aspect of the process also increases the accuracy of the model. The consumer aspect acts as a control especially for instances when the model is being used to compare several processes.

Chapter 6: Recommendations And Conclusions

From the research in this paper we can conclude that:

Many of the existing models for the deterioration of railways infrastructure do not put into consideration the consumer opinion.

Most of the existing models place emphasis on the intrinsic factors rather than the entire range of factors.

Therefore the recommendations would be giving consideration to the consumer opinion during the modelling process, as well as focusing on both the extrinsic and intrinsic factors that contribute to railway infrastructure deterioration.

References

1. Department of Planning Transport, and Infrastructure - Government of South Australia, 2008. Part 1025 Track Geometry. Track Geometry,15(2), pp. 1-5.
2. Galit, S. et al., 2018. Data Mining for Business Analytics pp: 365-375.New Delhi: John Wiley & Sons, Inc..
3. Heritage, G. & Large, A., 2009. Laser Scanning for the Environmental Sciences.2nd ed. London: John Wiley & Sons.
4. Howitt, D. & Cramer, D., 2010. Introduction to Descriptive Statistics in Psycology, 5th Edition.New York: Prentice Hall.
5. J, S. & K, T. C., 2008. Topographic Laser Ranging and Scanning.1st ed. New York: CRC Press.
6. Krzysztof, L., 2015. New Coefficients of Rail Transport Usage. International Journal of Engineering and Innovative Technology,5(6), pp. 89-91.
7. Laith, E., 2017. Defect-Based Codition Assessment Model of Railway Infrastructure,Montreal, Quebec, Canada: Concordia University.
8. Middleton, W., Smerk, G. & Delhi, R., 2007. "Track Inspection"; Encyclopedia of North America railroads.1st ed. Bloomington, IN: Indiana University Press.
9. Naomi, B. R. & Heiberger, M. R., 2011. Plotting Likert and Other Rating Scales. JSM 2011,2011(1), pp. 1058-1066.
10. Natarajan, G., 2012. Analysis of Queues: Methods and Applications.2nd ed. New York: CRC Press.
11. Norman, G., 2010. Likert Scales, Levels of Measurement and the Laws of Statistics. Advances in Health Science Education ,15(5), pp. 625-632.
12. O'Neil, C. & Schutt, R., 2013. Doing Data Science.3rd ed. London: O'Reily.
13. Reips, U. D. & Frederik, F., 2008. Interval Level Measurement with Visual Analogue Scales in Internet-based Research: VAS Generator. Behaviour Research Methods,40(3), pp. 699-704.
14. Saaty, T. L. & Cillo, B., 2009. The Encyclion: A Dictionary of Complex Decision Making Using Analytics Network Process.2 ed. Pittsburgh, Pennsylvania: RWS Publications.
15. Shaffer, C. A., 2011. Data Structures and Algorithms Business Analysis.Mineola: Dover.
16. Transportation Technology Center, 2009. Perfomance Based Track Geometry.1st ed. New York: Wayback Machine.
17. Uzarski, D. A., 1993. Development of Condition Indexes for Low Volume Railroad. Technical Report No. FM-93/14 USCASER,Washington DC, USA: USA: US Army Corp of Engineering. .
18. Vooselman, G. & Maas, H. G., 2010. Airborne and Terrestial Laser Scanning.3rd ed. Chicago: Whittle Publishing.
19. Zakeri, J. A. & Shahriari, S., 2012. Developing a Deterioration Probabilistic Model for Rail Wear. International Journal of Traffic and Transportation Engineering ,1(2), pp. 13-18.