MN691 Research Methods and Project Design Assignments

MN691 Research Methods and Project Design Assignments

MN691 Research Methods and Project Design Assignments

 


 

  1. Introduction

 

Today the world is relying on data only. Every process, project and individuals are having specific data and their values for some specific works. Everyone is there on social media and using many email servers and clients to exchange their information with others. It is generating lot of data every day and is having all kind of information either public or private.

Apart from all these social gathering and openness, everyone is pretty much concern about the privacy and security for their private data. To handle this big amount data and analyzing its privacy level researchers and developers are using Machine Learning and Deep Learning using AI and Neural Networking as Data Mining algorithms.

One can find various techniques for removing the sensitive data from the datasets thereby

training the machine learning models. We are going to mainly highlights the strategies for the

identification and the protection of sensitive information and the processes also help to address

the security measures you need to have the machine learning data. Handling the sensitive

information is the data which either user’s legal counsel or user’s need to protect with some

additional measures like restricted access or with the encryption. The examples like entering the

field name or entering your mail id, the billing information that is allowed by the cloud for the

data engineer for deducing the sensitive data which is termed to be sensitive.


 

 

  1. Problem domain and research questions

As we know, now a day, everyone is connected through internet and sharing their personal data over shared spaces intentionally or unknowingly. For the shared space owner its very much difficult to analyze the public and private data and keep the privacy accordingly because of a larger amount of data is updating every day. The client need to get the information about all those data which are kept over the server as private data and then provide some special security features to them.

2.1 Problem domain

Statistical Predicate Invention

The affirm invention in the ILP and the hidden changeable discovery in the analytical learning are the two different faces of one problem. The research scientists generally accept that it is the main issue in machine learning. After predicate inventions, machine learning becomes simpler.

Generalizing the across domains Usually, machine learning is defined as the hypothesizing across various tasks from similar domains. In the past few decades, this is carried away easily. The main difference between the people and the machine learners is, people are capable of analysing crosswise easily.

 

Studying maximum structuring levels

Till now, the algorithms for the statistical relational learning have been developed from the structured inputs and the outputs. But they aren’t used for learning the internal representations. In statistical learning and ILP, the models typically consist just two structure levels. Examples include: in the support vector machines, the dual levels include the clauses with the conjunctions. As the levels are enough for representing the interest of the functions, it is the efficient way of representing maximum functions.

 

Learning combination and inference

The inference is the most crucial factor during the process of structured learning. This leads to the ironical state of cases where one can spend more data. The learners must be biased and the inference requirements must be effective, so adept inference must be bias. The learners must design right from the scrape for learning the models those are powerful enough so that the inference over it must be efficient. The important problems here include resolution of the entity, matching schema, and the alignment concepts.

 

2.2 Research questions

Q-1: How machine learning algorithms can be used to identify sensitive data?

Machine learning makes use of higher level practices like PCI-DSS and HIPAA those specifically offer the best practices for sensitive data protection along with informing the clients and the customers about various ways of handling sensitive data. The certifications even allow the clients for making proper decisions regarding the security of data.

 

Q-2: How to remove and mask the sensitive data in machine learning?

There exist some cases wherein the removal of the sensitive data results in the reduction of the dataset values and whenever this situation occurs the data which is said to be sensitive must be masked with the help of one or more processes. Based on the dataset structure, you need to remove the data depending on various approaches.

Whenever we are unable of removing the sensitive information we need to mask the data. There are various techniques in machine learning those are used for masking the data.

 

Q-3: Is it difficult to handle sensitive data in large data sets?

The ownership concept inculcates and divides the data sets of the machine language those contain data from maximum users. Hence the data engineers must be permitted access for the complete data set for using the data set. The reduction of resolution or encryption of the data fields is mostly used to be the deterrent measure but it is not sufficient for the dataset of machine learning.

 

  1. Background and Project Objective

 

Machine Learning and Deep Learning are the methods and branches of Artificial Intelligence (AI) and mostly used to solve the real-life problems for researchers and developers. This method is very much into getting information using learning strategies from data instead of writing lengthier codes. Machine Learning for Data Science is one of the trending research topics these days and to get the information about sensitive data from huge databases is the key benefits of this research.

 

3.1 Summary of Literature Review

 

Machine learning is used in a very vary of computing tasks wherever algorithms with smart performance is tough or infeasible; for instance, applications embrace email filtering, detection of network intruders and laptop vision. An edge-based approach basically doesn't work in this day and age. A venture must insert security controls in the information or ensure they are attached to the information. In any case, there aren't boundless assets to address cybersecurity dangers. So, it's basic to comprehend what's delicate. This requires characterization instruments. We've built up a strategy that handles versatile information grouping utilizing machine learning. Via robotizing forms, characterizing archives and different kinds of unstructured information, and setting up confinements, for example, what's for inner utilize and what can be imparted to whatever is left of the world—it's conceivable to approach cybersecurity all the more adequately.

With a specific end goal to computerize the order procedure and make it more versatile, it's important to give the product classifier tests of touchy information and tests of non-delicate information. Machine learning enables the classifier to gain from the examples and concentrate administers about what makes archives delicate or not touchy. Once the classifier builds up the models for touchy records and exceptionally private archives, it can deal with the arrangement procedure independently. The following stage is to set up the suitable security controls. This incorporates layering extra security over existing controls, for example, dynamic confirmation. It utilizes the intellectual impression and in addition to other social strategies to recognize inconsistencies. It's likewise conceivable to utilize the order system to computerize different errands. For instance, the framework can scramble suitable information through a calculation or include extra verification necessities for specific kinds of reports. These could incorporate multifaceted verification

3.2 Objectives of the Project

The main methodologies that will involve in Machine Learning identification of sensitive private data are:

  • Diagnosis System

  • Reliability of Data

  • Supervised Learning

  • Classification of Data

  • Optimization of Data

  1. Project Requirements Analysis and Specification

 

Here appropriate software and hardware requirements necessary to find the sensitive data from large database have been listed. [15, 26] To handle the data from backend we need to parse some SOAP APIs. To verify all these APIs, we are going to use most popular API tools like Postman and SoapUI which will process the API on different methods like POST, GET and DELETE. It will reflect one JSON in object and array form, through which we will verify the data.

 

    1. Hardware Requirements

 

These are the hardware that will be used in this project for preparing the test system to run the ML scanner and its performance testing on this and install all related tools.

 

 

 

Table-1: Hardware Specifications:

Name of Hardware

Quantity

Project Requirement

Cost Estimate

Test System -

RAM: 8GB or above

HDD: 500GB or above

Processor : i3 or above

1

Test system for data analyzer and ML design

500 USD

Wireless Router

1

To prepare the single wireless network for testing. Enabled with WPA password.

100 USD

Android Mobile

1

To generate runtime sensitive private data to test its identity from ML System.

200 USD

Table-1: Hardware Specification


 

These are the hardware that will be used in this project for preparing the test system to run the ML scanner and its performance testing on this and install all related tools.

 

4.2 Software Specifications

These are the software that will be used in this project. These software tools and programming language will be used for preparing the test system to run the ML scanner and its performance testing on this and install all related tools.

 

 

Table-2: Software Specification


 

Operating System

Linux 64bit

Programming Language

Python and R

Documents

MS Word, Adobe Reader

Analysis

ML algorithms, R Studio

Server

Apache Server

API Tool

Postman, SoapUI

5. Project plan and preliminary design

5.1 Weekly Activity

Project flow with the time duration are listed here. The flow is having 11 levels of works that was divided into 12 weeks. As we mentioned here, we have already done most of the work based on this duration table.

Table-3: Weekly Activities

 

Week No.

Activities

1

Project Plan and Analysis Phase (Machine Learning to Identify Sensitive Private Data)

  • Did research work on machine learning

  • Analyse and discuss each aspect of machine learning in group

2

  • GDPR (EU General Data Protection Regulations) are the rules which are applied on companies to protect people’s private data under these rules companies are restricted to keep the personal information of user safe and hide it from other people.

 

  • API tools are the tools which helps to hide the personal data of people these tools shows only that information which is necessary and the hide all the sensitive information of user .

3

Designing Phase (Interface)

  • Select five different API tools of machine learning

  • All five members prepare one API tool as design approaches of project and following tools were selected by each member

 

  • But select one tool for the final project

4

Implement Phase according to the paper (Application Design)

  • Implement or practically install and run each tool of design approach

  • Install following API tools

  • Post man

  • Karate

  • NLTK

  • Swagger

  • Rest

 

  • Then all these tools were compared after the comparison of all these tools all members find NLTK the best tool to identify sensitive data so NLTK was selected as final project tool

  • Design of simulation of different sections in Machine Learning and Regression Techniques

  • Prepare Final design template on the basis of NLTK and then start final work on NLTK

 

5

Design of simulation of different sections in Machine Learning and Regression Techniques

6

Implementation of Supervised Learning and Prediction Model

7

Calculate and Analysis Result

8

Evaluate the scenario for data mining and identification

9

Compare result using Graph and Table

10

Testing

11

Documentation Work

12

Project Presentation

 

 

5.2 Roles and Responsibilities

 

Table-4: Roles and Responsibilities of each team member

 

ACTIVITIES/ROLES

KIRAN

MANDEEP

KUSUM

SANDEEP

RAJBEER

Project Plan and Analysis Phase (Machine Learning to Identify Sensitive Private Data)

Review on privacy preserving machine learning

Review on sensitive information acquisition

Review on intelligent assistance for data mining process

Review on automating anomaly detection

Review on risk analysis of protocols for prediction

Designing Phase (Interface)

Discuss summary of her literature review

Discuss summary of her literature review

Discuss summary of her literature review

Discuss summary of her literature review

Discuss summary of his literature review

Implement Phase according to the paper (Application Design)

Discussed and written by all team members

 

 

 

 

Design of simulation of different sections in Machine Learning and Regression Techniques

Preparing system to install ML tool and Apache Server.

Prepare background section area including GDPR

Prepare the approach to create MLTK framework to work on machine learning.

Work on input of data in the form of output of sensitive data

Work on postman api tool

Implementation of Supervised Learning and Prediction Model

Manage the server for hosting private sensitive data.

Manage the server for hosting private sensitive data.

Apply given approaches from reference papers to implement prediction model.

Work on different approaches to choose the best one

Describe different methodologies regarding predicted model

Calculate and Analysis Result

Follow the procedure of swagger API testing developer to describe the structure of the API so as the machine could read it.

 

Used the BI approaches to analyze the unstructured data by Hadoop and NoSQL database Cassandra as it is the best database to be used for streaming data.

 

Prepare flow chart for algorithm to generate runtime private data to test its identity from ML system

 

Karate tool released as an open-source tool by Intuit, working on it

 

run the Postman tool to test REST and SOAP APIs prepared to test the data integrity.

Evaluate the scenario for data mining and identification

All team members together work on evaluation of data mining and identification. We Prepare the business case study for the proposed methodologies of ML based identity system.

Prepare plan to get data from R based data mining system.

Compare result using Graph and Table

All team members together work on comparing result using graph and table and Describe all steps of implementation in detailed design diagram.

Prepare the result graphs and tables to compare with base model.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

5.3 Gantt Chart

This Gantt chart is used to characterized by our works what we have done till now. We have arranged the plan of 3 months (12 weeks) for our assignment work. Here we have specified all assignment what we have done alongside with the undertaking Gantt Chart.

 

 

Figuer-1: Gant Chart

 

5.4 High level Project Design with Diagram

 

The structure of the project is for the understanding of the Australian privacy act and the other international data privacy requirements along with identifying the detailed compliance needs. The project also develops the customized AI based solution for the compliance needs.

 

The project outcomes:

One can work on the challenging platforms and the various innovative features. The AI solutions of CBA are also used by the leading business in Australia. One can get a chance to work with those living in Australia and breather the product, design, tech and our proper work will have the real impact. Also, one can learn the technologies which are fundamentally changing the IT and the business markets. Hands-n training and guidance for self-learning would be given. We also have a potential job placement and the opportunity for our career growth with the company. Also, the team member would get a referral and the experience letter provided on project completion.

 

When we have completed this project, we will understand how to:

Build the custom machine learning model thereby using the easy graphical tools from the IBM Watson Stack and also have the Natural Language Understanding use for the model for identifying the personal data and private data.

 

Make use of regular expressions for augmenting the NLU for the identification of metadata. Configure the requirement of the personal data that needs to be identified and also assign the weight for the personal data to assign the score. View the score and personal data identified in a tree structure for better visualization and consume the output by other applications. The Cognitive Compliance Processor solutions address the requirements for identifying the personal data from the various unstructured documents. The solution influences more on advanced natural language processing, machine learning, and the text analytics for precisely identifying the sensitive data and the personal or private data from the large volumes of the unstructured data and then alerts the other systems. The solution of the project will enable the various organizations for automatically detecting the privacy and the sensitive data at scale for speeding up that is difficult to accomplish along with large workforce.

 

The overall objective of the project is developing the MVP application that demonstrates the application approval of the AI capabilities for a process of larger volumes of the unorganized information at scale. The project even uses the "boiler-plate" for the fast development. The Cognitive Compliance Processor would employ the combination of the logical rules, the recognition of the text patterns, machine learning and language processing for identifying the private and the sensitive data from the unstructured data sources like documents, enterprise applications like the ERP and the CRM systems, the service call logs, customer emails those might violate the compliances like the privacy acts or GDPR. Research methods to be used for the next stage of the project.

 

 

 

 

 

Figuer-2: Supervised learning modal

 

 

Figuer-3: Three approaches to fairness-aw are machine learning without holding sensitive characteristics.

 

 

5.5 Individual design approach 1

NLTK

 

However, we easily forget the amount of data stored in our daily conversations, with advancements in the digital mural, or with the help of Natural Language Processing as it is expanding the field in machine learning and artificial intelligence. The text is available in various ways right from individual word list to various paragraphs and sentences containing special characters. Transformation of this text to some algorithm is a difficult process and contains various parts like cleaning followed by annotation, normalization and then the analysis. There exist various methods for pre-processing. Those include:

 

Capitalization: The text contains various capitalizations those reflect the starting points of the sentence or emphasizes the proper nouns. The important and the common approach is reducing it to the lower cases for simplicity but, at the same time, you need to keep in mind some things like, if the word "US" is changed to “us” then it changes the meaning of the whole sentence.

 

Tokenization: It describes breaking up the paragraphs in small sentences and then the sentences into single words. Here, for the algorithms those are language specific, you need to make use of Punkt models from the NLTK.

 

Even though making the text is a complicated process that needs some choice of the optimal tools, most of the libraries which are pre-built and the services are used for mapping the words and the term manually. When the dataset is prepared, then you can apply the machine learning techniques.

 

 

 

 

 

Figuer-4: Processing of NLTK with Machine Learning

 

 

 

5.6 Individual design approach 2

POSTMAN

 

Mainly four request methods frequently, which are as below:

POST Request - For Creating or Updating data,

PUT Request - For Updating data,

GET Request - For Retrieving/Fetching data and

DELETE Request - For Deleting data.

Request URL - It is where to make the http request.

Request Headers - In request headers it contains key-value of the application. I have used mainly two key-value, which are as follows

Content-Type - A content-type describes the format of object data. Content-type, which I have used the most for the requests and responses, is application/json.

Authorization - An authorization token, included with requests, is used to identify the requester.

Request Body - It contains the data, if any (depends on type of request method), to be sent with request. I have used raw form of data for sending request.

Figuer-5: Procedure of POSTMAN API Testing

 

5.7 Individual design approach 3

SWAGGER

 

A swagger is an approach which allows the developer to describe the structure of the API so as the machine could read it. We have different API and tools to perform swagger approach. The steps to be followed would be: -

1. Open the swagger editor

2. Write swagger definitions according to the API in the swagger editor.

3. The swagger definition generates the code for the API with the user interface (UI). The UI generated and the swagger codegen is further processed by the system automatically. Deployed the compiled definition to the API so as the machine could perform the operation by learning about the definition defined in the swagger by its own using different libraries and API. The swagger follows the top-down approach or we call as a stepwise approach.

Figure-6: Procedure of SWAGGER API Testing

 

5.8 Individual design approach 4

KARATE

Karate based API is a REST API used to test the machine learning algorithms. This tool was released as an open-source tool by Intuit. This tool is used to automate the API testing and all the required features and dependencies to test the API. The API uses Java language to process and test the ML API.The important feature about Karate based API is that you don't need to write the steps definitions as karate would create those steps definition automatically because karate is DSL and it works on Cucumber-JVM. After the testing of the API is completed the Karate API provides a report for the testing results being generated.

 

Figure-6: Procedure of KARATE API Testing

 

5.9 Individual design approach 5

BI

 

In order to analyze the unstructured data, the first step is to save the data in a proper database, so for storing the data we build the algorithms to process the data on the platforms as Hadoop (Hadoop distributed file system) and NoSQL database Cassandra as it is the best database to be used for streaming data.

The second step is to fetch or analyze the sensitive/ important data from the bunch of datasets stored in the databases, for that, you need to use a powerful business intelligence tool to process the data and provide values or output in the format of graphs and GUI interfaces.

The output generated would be in the format if graphs and images which could be further deployed on the dashboard not be given to the client according to the requirements.

 

Figure-7: Procedure of BI Approach

5.10 Budget and References:

We included some cost for project.

Important Note: Linux Operating System is open source and we can use free version. We are mentioning about hardware simulation based cost.

Table-4: Hardware Cost

Names of Hardware

Quantity

Project Requirement

Cost Estimate

Test System -
RAM : 8GB or above
HDD : 500GB or above
Processor : i3 or above

1

Test system for data analyser and ML design

500 AUS

Wireless Router

1

To prepare the single wireless network for testing. Enabled with WPA password

100 AUS

Android Mobile

1

To generate runtime sensitive private data to test its identity from ML System

200 AUS

 

Total Cost: 500+100+200 = $800

 

Research methods to be used for the next stage of the project

 

Data Collection Process:

 

There are mainly two types of data collection process and those are the qualitative approach and the quantitative approach. In the qualitative approach of data collection process the research is based on the evaluation of the aspects of the topic which are related to the subject on which the research is to be done. It is based on the point of view of the people that are associated with the project and this would be enabling the analyst to conduct the analysis which is in-depth for the subject of the field of in which the project is to be developed. The main process for this are the interviews and the questionnaires. Whereas the quantitative analysis depends on the collection of the data that is generally used for the processing of the data in a formal process. In this study the data that is to be analysed would be done with the help of the qualitative data collection process for the understanding the importance of the Machine Techniques and Artificial Intelligence.

 

Data Collection Technique:

 

For the collection of the data two techniques can be considered; the primary data collection method and the secondary data collection technique. The primary data collection is basically done by the process of interviews and questionnaire. In this process the point of view of a certain group of people that are associated with the project and the also to some extent the targeted respondents are the experts in the field of research. However, in the secondary data collection technique the data that is collected is already established and published by some other researcher. For the research in the field of Machine learning and Artificial Intelligence the secondary data would be used in this project developed.

 

Structure of Secondary Data:

 

The secondary data would be used for the development of the research of the project. The collected data would be very important for the researcher as the research has already been performed by someone else and time for research can be utilized in some further analysis of some additional resources for the research. The research can be refined to a greater deal and the data resulting from the resource would be approximated more accurately. Hence the secondary data would be used for the analysis for the research.

Conclusion, limitations and future work

 

For conclusion it is to be said that the machine learning tool can be used for the development of the process that would be removing the data from the datasets which are very sensitive. The identification and protection of the data have been highlighted in this research. The protection of the sensitive data can be handled efficiently from this project and research would further increase the possibilities of improved security that would be used for the cause such as the encryption procedures and this would enhance various type of procedures such as the process of billing and data protection plans. The NLTK tool has been identified as the final project tool and hence, the final work of the design would be prepared on the template and in future there would be further improvement in the NLTK tool which would be utilized in the project as well. There is a huge improvement in the field of Machine Learning and Artificial Intelligence. The Natural Language Process Implementation is used for the cleaning of the data and the process used for the implementation of NLTK is described below:

 

Capitalization: The text contains various capitalizations those reflect the starting points of the sentence or emphasizes the proper nouns. The important and the common approach is reducing it to the lower cases for simplicity but, at the same time, you need to keep in mind some things like, if the word "US" is changed to “us” then it changes the meaning of the whole sentence.

 

Tokenization: It describes breaking up the paragraphs in small sentences and then the sentences into single words. Here, for the algorithms those are language specific, you need to make use of Punkt models from the NLTK.

 

Although making the text is a complicated process which requires some optimal tools there are a large number of libraries which are pre-installed in the process. The machine learning process can be applied after the dataset is prepaired.

References

[1] J. D. Procaccino, J. M. Verner, K. M. Shelfer, and D. Gefen, "What do software practitioners really think about project success: an exploratory study" Journal of Systems and Software, Vol. 78, no. 2, pp. 194-203, 2005.

[2] K. Schwalbe, Information Technology Project Management, 3rd ed. Boston: Course Technology, 2004.

[3] J. D. Procaccino and J. M. Verner, "Software project managers and project success: An exploratory study" Journal of Systems and Software, Vol. 79, no. 11, pp. 1541-1551, 2006.

[4] A.-P. Bröhl and W. Dröschel, the V-Model. Oldenbourg-Verlag, 1995.

[5] T. Abdel-Hamid and S. Madnick, Software Project Dynamics. Prentice Hall, 1991.

[6] M. Deininger and K. Schneider, "Teaching Software Project Management by Simulation - Experiences with a Comprehensive Model" In Conference on Software Engineering Education (CSEE), ser. Lecture Notes in Computer Science 750, Austin, Texas, 1994, pp. 227-242.

[7] A. Jain and B. Boehm, "SimVBSE: Developing a Game for Value-Based Software Engineering" In Software Engineering Education and Training, 2006. Proceedings. 19th Conference on, 2006, pp. 103-114.

[8] K. Schneider, AusführbareModelle der Software-Entwicklung. Struktur und RealisierungeinesSimulationssystemes. vdf, 1994.

[9] M. Deininger and K. Schneider, "Teaching Software Project Management by Simulation - Experiences with a Comprehensive Model" In Conference on Software Engineering Education (CSEE), ser. Lecture Notes in Computer Science 750, Austin, Texas, 1994, pp. 227-242.

[10] D. Rodriguez, M. Satpathy, and D. Pfahl, "Effective software project management education through simulation models. An externally replicated experiment" In International Conference on Product Focused Software Process Improvement (PROFES), ser. Lecture Notes in Computer Science 3009. Kansai Science City, Japan: Bomarius, F., 2004.

[11] K. Schwalbe, Information Technology Project Management, 3rd ed. Boston: Course Technology, 2004.

[12] K. Schneider, "A Descriptive Model of Software Development to Guide Process Improvement" In Conquest. NÃ/rnberg, Germany: ASQF, 2004.

[13] JilKlünder, Kurt Schneider, Fabian Kortum, Julia Straube, Lisa Handke, Simone Kauffeld, Human-Centered and Error-Resilient Systems Development, vol. 9856, pp. 111, 2016.

[14] W. D. Scott & Co, Information Technology in Australia: Capacities and opportunities: A report to the Department of Science and Technology. [Microform]. W. D. Scott & Company Pty. Ltd. in association with Arthur D. Little Inc. Canberra: Department of Science and Technology, 1984.

[15] "Functional Organizational Structure Advantages", Smallbusiness.chron.com, 2018. [Online]. Available: https://smallbusiness.chron.com/functional-organizational-structure-advantages-3721.html. [Accessed: 11- Sep- 2018].

[16] "Consensus-based transfer linear support vector machines for decentralized multi-task multi-agent learning", Rui Zhang ; Quanyan Zhu, 2018 52nd Annual Conference on Information Sciences and Systems (CISS), Year: 2018, Pages: 1 - 6.

[17] "Preserving Model Privacy for Machine Learning in Distributed Systems", Qi Jia ; Linke Guo ; Zhanpeng Jin ; Yuguang Fang, IEEE Transactions on Parallel and Distributed Systems, Year: 2018, Volume: 29, Issue: 8, Pages: 1808 - 1822

[18] "Scrutinizing action performed by user on mobile app through network using machine learning techniques: A survey", Rutuja A. Kulkarni, 2018 2nd International Conference on Inventive Systems and Control (ICISC), Year: 2018, Pages: 860 - 863

[19] Privacy Risk in Machine Learning: Analyzing the Connection to Overfitting", Samuel Yeom ; Irene Giacomelli ; Matt Fredrikson ; Somesh Jha, 2018 IEEE 31st Computer Security Foundations Symposium (CSF), Year: 2018, Pages: 268 - 282

[20] Distributed Differentially Private Stochastic Gradient Descent: An Empirical Study", István Hegedus ; Márk Jelasity, 2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP), Year: 2016, Pages: 566 - 573

[21] A. Miri, " Privacy-preserving protocols for perceptron learning algorithm in neural networks Saeed Samet", 2008, pp. 10-65 - 10-70.

[22] K. Lin and M. Chen, "On the Design and Analysis of the Privacy-Preserving SVM Classifier", IEEE Transactions on Knowledge and Data Engineering, vol. 23, no. 11, pp. 1704-1717, 2011.

[23] N. Chen, B. Ribeiro, A. Vieira, J. Duarte and J. Neves, "Extension of Learning Vector Quantization to Cost-sensitive Learning", International Journal of Computer Theory and Engineering, pp. 352-359, 2011.

[24] A. Sarwate and K. Chaudhuri, "Signal Processing and Machine Learning with Differential Privacy: Algorithms and Challenges for Continuous Data", IEEE Signal Processing Magazine, vol. 30, no. 5, pp. 86-94, 2013.

[25] C. Guivarch and S. Hallegatte, "2C or Not 2C?", SSRN Electronic Journal, 2012.

[26] M. Senekane and B. Taele, "Prediction of Solar Irradiation Using Quantum Support Vector Machine Learning Algorithm", Smart Grid and Renewable Energy, vol. 07, no. 12, pp. 293-301, 2016.

[27] "Confusion-Matrix-Based Kernel Logistic Regression for Imbalanced Data Classification",

Miho Ohsaki ; Peng Wang ; Kenji Matsuda ; Shigeru Katagiri ; Hideyuki Watanabe ; Anca Ralescu,

[28] "Learning privately: Privacy-preserving canonical correlation analysis for cross-media retrieval",

[29] "An approach to identifying cryptographic algorithm from ciphertext", Cheng Tan ; Qingbing Ji

2016 8th IEEE International Conference on Communication Software and Networks (ICCSN),

Year: 2016, Pages: 19 - 23

[30] "Automated big security text pruning and classification", Khudran Alzhrani ; Ethan M. Rudd ; C. Edward Chow ; Terrance E. Boult, 2016 IEEE International Conference on Big Data (Big Data), Year: 2016, Pages: 3629 - 3637

[31] "Censoring Sensitive Data from Images", Stefan Postavaru ; Ionut-MihaIta Plesea, 2016 18th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC), Year: 2016, Pages: 443 - 448

[32] "Insider Threat Detection with Face Recognition and KNN User Classification", M Subrahmanya Sarma ; Y Srinivas ; M Abhiram ; Lakshminarayana Ullala ; M. Sahithi Prasanthi ; J Rojee Rao, 2017 IEEE International Conference on Cloud Computing in Emerging Markets (CCEM), Year: 2017, Pages: 39 - 44

[33] "SmarPer: Context-Aware and Automatic Runtime-Permissions for Mobile Devices", Katarzyna Olejnik ; Italo Dacosta ; Joana Soares Machado ; Kévin Huguenin ; Mohammad Emtiyaz Khan ; Jean-Pierre Hubaux, 2017 IEEE Symposium on Security and Privacy (SP), Year: 2017, Pages: 1058 - 1076

[34] "Cloak and Swagger: Understanding Data Sensitivity through the Lens of User Anonymity",

Sai Teja Peddinti ; Aleksandra Korolova ; Elie Bursztein ; Geetanjali Sampemane, 2014 IEEE Symposium on Security and Privacy, Year: 2014, Pages: 493 - 508

[35] "A Framework of Privacy Decision Recommendation for Image Sharing in Online Social Networks",

Donghui Hu ; Fan Chen ; Xintao Wu ; Zhongqiu Zhao, 2016 IEEE First International Conference on Data Science in Cyberspace (DSC), Year: 2016, Pages: 243 - 251

[36] Toward intelligent assistance for a data mining process: an ontology-based approach for cost-sensitive classification", A. Bernstein ; F. Provost ; S. Hill, IEEE Transactions on Knowledge and Data Engineering, Year: 2005, Volume: 17, Issue: 4, Pages: 503 - 518

[37] "Biased locality-sensitive support vector machine based on density for positive and unlabeled examples learning", Lujia Song; Bing Yang; Ting Ke; Xinbin Zhao; Ling Jing, 11th International Symposium on Operations Research and its Applications in Engineering, Technology and Management 2013 (ISORA 2013), Year: 2013, Pages: 1 - 6

[38] "Protecting data from malware threats using machine learning technique", Mozammel Chowdhury ; Azizur Rahman ; Rafiqul Islam, 2017 12th IEEE Conference on Industrial Electronics and Applications (ICIEA), Year: 2017, Pages: 1691 - 1694

[39] "Sensitive Information Acquisition Based on Machine Learning", Wenqian Shang ; Hongjia Liu ; Rui Lv, 2012 International Conference on Industrial Control and Electronics Engineering, Year: 2012,

Pages: 1117 - 1119

[40] "Privacy preserving extreme learning machine classification model for distributed systems", Ferhat Özgür Çatak ; Ahmet Fatih Mustaço?lu ; Ahmet Ercan Topçu, 2016 24th Signal Processing and Communication Application Conference (SIU), Year: 2016, Pages: 313 - 316

[41] "Privacy Preserving Decision Tree Learning Using Unrealized Data Sets", Pui Kuen Fong ; Jens H. Weber-Jahnke, IEEE Transactions on Knowledge and Data Engineering, Year: 2012, Volume: 24, Issue: 2,

[42] "Differentially private query learning: From data publishing to model publishing", Tianqing Zhu ; Ping Xiong ; Gang Li ; Wanlei Zhou ; Philip S. Yu, 2017 IEEE International Conference on Big Data (Big Data), Year: 2017, Pages: 1117 - 1122.

[43] "Private and Scalable Personal Data Analytics Using Hybrid Edge-to-Cloud Deep Learning", Seyed Ali Osia ; Ali Shahin Shamsabadi ; Ali Taheri ; Hamid R. Rabiee ; Hamed Haddadi, Computer

Year: 2018, Volume: 51, Issue: 5, Pages: 42 – 49

[44] “Privacy-Preserving Data Classification and Similarity Evaluation for Distributed Systems",

Qi Jia ; Linke Guo ; Zhanpeng Jin ; Yuguang Fang, 2016 IEEE 36th International Conference on Distributed Computing Systems (ICDCS), Year: 2016, Pages: 690 - 699

 


 

Glossary and Abbreviations

Appendices

 

Appendix I: Industry Client Details

 

Client Details

Group Number: cognitive compliance 1

Student ID and Names:

 

KUSUM RANI (MIT172198)

KIRAN BALA (MIT172481)

MANDEEP KAUR (MIT172825)

RAJBIR SINGH (MIT172501)

SANDEEP KAUR (MIT171974)

 

Project Title: COGNITIVE COMPLIANCE PROCESSOR

 

Name of Industry Placement Agent (if any):

Agent Name: OUTCOME.LIFE

Contact Name: GERARD HOLLAND

Contact number: 0402329476

E-mail id: gerard@outcome.life

 

Industry Client Details (Compulsory for all proposals):

 

Company Name: CBA STRATEGIC IT, AUSTRALIA AND NEW ZEALAND

ABN: 74605724141

Company address: LEVEL-2 Riverside Quarry, 1 Southbank boulevard, Melbourne VIC 3006

Company Profile: CBA Strategic IT is leading Cognitive Artificial Intelligence

Website: http://cbastrategicit.com

Industry Professional (Contact) Name:

 

Contact person email id: dheeren.velu@gmail.com

Contact Number: +61399824461

Brief Bio of the industry professional: Dheeren Velu

1. Very experienced and deep technical expertise.

2. Helping and giving chance to future growth.

Signature of the industry professional:

Date:

 

 

 

 

 

 

 

 

 

 

Project Definition

Background and Rationale for the project

The client for which we are going to develop this project is CBA Strategic IT, a leading IT company which is mainly dealing with cognitive solutions for global market. There are regular data storing from larger amount of customers with every type of data like personal and public. The requirements arriving from the customers for their private data safety so our client want us to prepare one solution for the same.

Now a day, we are very much concern about information safety and following the data compliances and regulations. In this way, GDPR a big deal indeed, after it has been declared as a law in European Union this year, to improve security and safety procedure in respect to customer data by any of the company. The main concern of GDPR is "Personal Data" that belong to any individual either directly or indirectly, and provided a rule to handle them in three ways:

  • Non-discrimination Right,

  • Right to Explanation, and

  • Right to be Forgotten

In our project we are using BigML platform that is open to support these GDPR strategies while working on data, following all the data compliances and Australian Privacy Act.

Project goals and Objectives

Company need a secure data center with capability to identify different types of data and then provide specific security level to each. The main objective of client is to prepare one algorithm or tool using Machine Learning that can identify sensitive private data from data center and then give option to apply some function on them like security and protection.

Desired Outcomes/Deliverables

Following points should be met in the report:

  • Read Larger Dataset.

  • Identify sensitive and private data.

  • Secure these data without any vulnerabilities.

 

Appendix II: Individual Literature review

Literature Review by KIRAN BALA (MIT172481) 1-6

1. Privacy-preserving quantum machine learning using differential privacy

Here in this research paper we are talking about the progress of man-made consciousness when all is said in done and machine learning specifically has brought about the need to give careful consideration to the arrangement of protection to the information being investigated. While dealing with the sensitive data let take one case of delicate information investigation may be in the examination of people's restorative records. In such a case, there may be a need to draw bits of knowledge from information while in the meantime keeping up security of the members. Such cases have brought forth security safeguarding information investigation. Security is commonly ensured by a deferentially private system [15].

 

 

2. Consensus-based transfer linear support vector machines for decentralized multi-task multi-agent learning

 

Here in this research paper we are talking about Machine learning and its recent improvements as it has been created to enhance the exhibitions of various yet related undertakings in machine learning. In any case, such procedures turn out to be less effective with the expansion of the extent of preparing information and the quantity of errands. Also, security can be damaged as a few errands may contain touchy and private information, which are imparted among hubs and assignments. We propose an agreement based disseminated exchange learning system, where a few assignments intend to locate the best direct help vector machine (SVM) classifiers in a conveyed organize. With rotating course strategy for multipliers, assignments can accomplish better grouping exactness all the more effectively and secretly, as every hub and each errand prepare with their own particular information, and just choice factors are exchanged between various undertakings and hubs [16].

 

 

3. Preserving Model Privacy for Machine Learning in Distributed Systems

 

Here in this research paper we are talking about the concept of Machine Learning based information grouping is a generally utilized information mining system. By taking in gigantic information gathered from this present reality, information characterization enables students to find shrouded information designs. These concealed information designs are spoken to by the scholarly model in various machine learning plans. In light of such models, a client can arrange whether the new approaching information has a place with a current class; or, numerous elements may test the likeness of their datasets. Be that as it may, because of information territory and protection issues it is impossible for huge scale spread frameworks to share every one datasets for arranging or testing. From one viewpoint, the educated model is an element's non-public resource and can release this data that ought to be all around shielded from all other non-shared elements [17].

 

4. Scrutinizing action performed by user on mobile app through network using machine learning techniques: A survey

 

Here in this research paper we are talking about the procedure of Cell phone and how they are utilized in an extensive scale for various exercises like bank exchange, shopping, surfing learning and so on. Everybody is conveying portable in their pocket where delicate or private data is put away about individual client. As client cooperated with versatile applications part of system activity is created by sending or accepting solicitation. Created arrange movement investigated by aggressor and there is a plausibility that private data can be seen by assailant. That implies client information isn't anchored so how aggressor track client is clarified in the paper [18].

 

 

5. Privacy Risk in Machine Learning: Analysing the association to Overfitting

 

Here in this research paper we are talking about the procedure of Machine learning calculations, when connected to touchy information, represent an unmistakable danger to protection. A developing group of earlier work shows that output created by calculations may release particular private data in the preparation information to assailant, either through the models' structure or their perceptible conduct. In any case, the basic reason for this protection hazard isn't surely knew past a bunch of narrative records that propose overfitting and impact may assume a part [19].

 

 

6. Distributed Differentially Private Stochastic Gradient Descent: An Empirical Study

 

Here in this research paper we are talking about the concerns of inclined substantial scale appropriated conditions stochastic slope plummet (SGD) is a mainstream way to deal with actualize machine learning calculations. Information protection is a key worry in such conditions, which is regularly tended to inside the system of differential security. The yield nature of respectfully private SGD executions as a component of plan decisions has not yet been altogether assessed. In this investigation, we look at this issue tentatively. We expect that each datum record is put away by a free hub, which is a run of the mill setup in systems of cell phones or Internet of things (IoT) applications. In this model we recognize an arrangement of conceivable dispersed respectfully private SGD executions. In these usage all the delicate calculations are entirely nearby, and any open data is ensured by respectfully private components [20].

 

 

Literature Review by KUSUM RANI (MIT172198) 7-12

 

7. Privacy-preserving protocols for perception learning algorithm in neural networks

 

Here in this research paper we are talking about the working of Neural systems and how they have turned out to be progressively imperative in territories, for example, therapeutic finding, bio-informatics, interruption discovery, and country security. In the greater part of these applications, one noteworthy issue is safeguarding protection of individual private data and touchy information. In this paper, we have a tendency to propose 2 secure conventions for perception learning calculation once input data is equally and vertically parcelled among the gatherings. These conventions may be connected in each directly dissociable and non-distinct dataset, whereas not simply data having an area with every gathering stays non-public, however, the last learning model is likewise safely shared among those gatherings [21].

 

8. On the look and Analysis of the Privacy-Preserving SVM Classifier

 

Here in this research paper we are talking about working of The SVM or support vector machine is a broadly utilized device in arrangement issues. The SVM trains a classifier by taking care of AN improvement issue to settle on those occurrences of the preparation informational index area unit bolster vectors, that area unit the basic academic cases to form the SVM classifier. Since facilitating vectors area unit clean taken from the preparation informational assortment, discharging the SVM classifier for open utilizes or delivering the SVM classifier to customers can unveil the personal substance of facilitating vectors. This damages the safety saving wants for a few legitimate or business reasons. the problem is that the classifier learned by the SVM as such abuses the safety This security infringement issue will confine the pertinence of the SVM [22].

 

 

9. Hybrid Genetic rule and Learning Vector Quantization Modelling for Cost-Sensitive Bankruptcy Prediction

 

Here during this analysis paper, we have a tendency to square measure talking concerning the procedure of Cost-touchy order calculations that empower viable expectation, wherever the expenses of misclassification may be altogether totally different, square measure crucial to loan bosses and inspectors in credit hazard examination. Learning vector quantization (LVQ) is an intense instrument to take care of insolvency forecast issue as a characterization undertaking. The hereditary calculation (GA) is connected generally in conjunction with counterfeit keen techniques. The hybridization of hereditary calculation with existing characterization calculations is very much represented in the field of liquidation forecast [23].

 

 

 

10. Signal Processing and Machine Learning with Differential Privacy: Algorithms and Challenges for Continuous Data

 

Here in this research paper we are talking about the mechanism of Privately owned businesses, government substances, and foundations, for example, healing centres routinely assemble immense measures of digitized individual data about the people who are their clients, customers, or patients. A lot of this data is private or delicate, and a key innovative test for what's to come is the means by which to outline frameworks and handling systems for drawing inductions from this extensive scale information while keeping up the protection and security of the information and individual characters [24].

 

 

11. A replacement model for privacy conserving sensitive data processing

 

Here during this analysis paper, we have a tendency to square measure talking regarding the operating with the data Mining and information Discovery could be a basic innovation for business and investigates in various fields, as an example, measurements, machine learning, style acknowledgment, databases, and elite reckoning. within which Privacy conserving data processing will presumably build the scope and blessings of knowledge mining innovation. this allows distributing a small information while not unveiling non-public information. Distributing info regarding individuals while not uncovering delicate information regarding them is an indispensable issue [25].

 

12. Automating anomaly detection for exploratory data analytics

 

Here in this research paper we are talking about an outline to computerize the procedure of exploratory information examination with an accentuation on exception and irregularity recognition. The paper examines the area of exploratory information examination, the unpredictability associated with computerizing it and an answer utilizing the most recent advances in registering to meet this. The arrangement points of interest a system that can acknowledge information, comprehend the structure and kind of factors, remove imperative factors and recognize exceptions or oddities for understanding procedure bottlenecks [26].

 

 

 

Literature Review by MANDEEP KAUR (MIT172825) 13-18

 

13. Confusion-Matrix-Based Kernel Logistic Regression for Imbalanced Data Classification

 

Here in this research paper we are talking about the methodologies of ML for data classifications and focus on numerous endeavors to arrange imbalanced information, since this characterization is basic in a wide assortment of utilizations identified with the discovery of abnormalities, disappointments, and dangers. Numerous traditional techniques, which can be ordered into testing, cost-touchy, or troupe, incorporate heuristic and undertaking subordinate procedures. So as to accomplish a superior grouping execution by plan without heuristics and errand reliance, we propose disarray framework-based piece strategic relapse (CM-KLOGR) [27].

 

14. Learning privately: Privacy-preserving canonical correlation analysis for cross-media retrieval

 

Here in this research paper we are talking about the enormous blast of different kinds of information has been activated in the "Huge Data" period. In enormous information frameworks, machine learning assumes an essential part because of its viability in finding concealed data and significant learning. Information security, be that as it may, turns into an unavoidable worry since enormous information as a rule include various associations, e.g., diverse social insurance frameworks and clinics, who are not in a similar trust space and might be hesitant to share their information openly [28].

 

 

15. A Method for identifying cryptographic algorithm from cipher text

 

Here in this research paper we are talking about the method of using cryptographic calculation assumes a critical part in a cryptosystem, which shields those delicate and private information from been acquired by some malignant aggressors. All things considered, the insights for cryptographic calculation connected in a cryptosystem are frequently obscure to one cryptanalyst. At the point when a cryptanalyst deals with cryptanalysis, he will have much inconvenience on the off chance that he has not any idea about the utilized cryptographic calculation [29].

 

 

16. Automated big security text pruning and classification

 

Here in this research paper we are talking about the process of Numerous security and its related huge information issues, including record, movement, and framework log investigation require examination of unstructured content. Consider the undertaking of examining organization reports for secure capacity. Some may be excessively delicate, making it impossible to put on an open cloud and need private stockpiling, some may safe on the cloud in encoded frame, and some might be adequately non-touchy to be put away on the cloud in plain-content without encryption and decoding overhead [30].

 

 

17. Censoring Sensitive Data from Images

 

Here in this research paper we are talking about the present working of ML and discuss how in the ongoing years, the immense volume of computerized pictures accessible empowered a substantial scope of learning strategies to be pertinent, while making human information old for some assignments. In this paper, we are tending to the issue of expelling private data from pictures. At the point when gone up against with a moderately huge number of pictures to be made open, one may discover the undertaking of manual altering out delicate areas to be unfeasible. In a perfect world, we might want to utilize a machine learning way to deal with computerize this assignment [31].

 

18. Insider Threat Detection with Face Recognition and KNN User Classification

 

Here in this research paper we are talking about the use of Data Security in distributed storage is a key fear concerning Degree of Trust and Cloud Penetration. Cloud client network needs to determine execution and security by means of QoS. Various models have been proposed to manage security concerns. Discovery and anticipation of insider dangers are worries that likewise should be handled. Since the aggressor knows about delicate data, dangers because of cloud insider is a grave concern. In this paper, we have proposed a validation system, which performs verification in view of confirming facial highlights of the cloud client, notwithstanding username and secret key, in this manner going about as two factor confirmation [32].

 

 

Literature review by RAJBIR SINGH (MIT172501) 19-24

 

19. SmarPer: Context-Aware and Automatic Runtime-Permissions for Mobile Devices

 

Here in this research paper we are talking about the process of Authorization frameworks and discuss how they are the fundamental resistance that versatile stages, for example, Android and iOS, give advantage to clients to shield their personal information from applications. Nonetheless, because of the pressure amongst convenience and control, such frameworks have a few impediments that frequently constrain clients to overshare delicate information. We address a portion of these restrictions with SmarPer, a propelled authorization component for Android. To identify the unbending nature of present authorization frameworks and their poor coordinating of clients' security inclinations, SmarPer depends on relevant data and machine learning techniques to foresee consent choices at runtime.[33]


20. Cloak and Swagger: Understanding Data Sensitivity through the Lens of User Anonymity


Here in this research paper we are talking about how the large portion of what we comprehend about information affectability is through client self-report (e.g., studies), this paper is the first to utilize conduct information to decide content affectability, by means of the pieces of information that clients offer concerning what data they consider private or delicate through their utilization of protection upgrading item includes. We play out a vast scale investigation of client obscurity decisions amid their movement on Quora, a prominent inquiry and-answer site. We recognize classifications of inquiries for which clients will probably practice namelessness and investigate a few machine learning approaches towards anticipating whether a specific answer will be composed secretly [34].

21. A Framework of Privacy Decision Recommendation for Image Sharing in Online Social Networks

 

Here in this research paper we are talking about the process to use the Picture partaking in Online Social Networks (OSNs) faces potential dangers of uncovering clients' private or delicate data to others. In this paper, we build up a structure of computing the protection level of a computerized picture in view of perceptual hashing and semantic security rules. Specifically, we plan two protection safeguarding discernment hashing techniques: the first depends on the SIFT highlights which centre around the portrayal of delicate questions in the picture and the second one depends on the LBP highlights which centre around the depiction of countenances in the picture. The two techniques utilize the mystery key to shield hashes from enemies or non-confided in servers [35].

 

 

22. Toward intelligent assistance for a data mining process: an ontology-based approach for cost-sensitive classification

 

Here in this research paper we are talking about the situation of the information mining process and what are they includes numerous stages. A straightforward, yet run of the mill, process may incorporate pre-processing information, applying an information mining calculation, and post-processing the mining results. There are numerous conceivable decisions for each stage, and just a few blends are legitimate. In view of the vast space and nontrivial communications, the two amateurs and information mining authorities require help with making and choosing DM (Data Mining) forms [36].

 

 

23. Biased locality-sensitive support vector machine based on density for positive and unlabelled examples learning

 

Here in this research paper we are talking about the process of the gaining from positive and unlabelled illustrations (PU learning) has been a hotly debated issue for grouping in machine learning. The key component of this issue is that there is no named negative preparing information, which makes the conventional order methods inapplicable. As per this component, we propose a calculation called one-sided area touchy help vector machine in view of thickness (BLSBD-SVM) for PU realizing which accepts unlabelled cases as negative cases with commotion [37].

 

 

24. Protecting data from malware threats using machine learning technique

 

Here in this research paper we are talking about the methodologies of Digital assaults against touchy information have progressed toward becoming as genuine dangers everywhere throughout the world because of the rising utilizations of PC and data innovation. New malware or pernicious projects are discharged ordinary by digital crooks through the Internet trying to take or devastate imperative information. Thus, look into on ensuring information gets colossal enthusiasm for the digital network. To adapt to new variations of malevolent programming, machine learning methods can be utilized for precise order and location [38].

 

 

Literature review by SANDEEP KAUR (MIT171974) 25-30

 

25. Sensitive Information Acquisition Based on Machine Learning

Here in this research paper we are talking about the internet on the internet all data enhanced extraordinarily. It turns into tremendous fortune of data. However likewise it is flooding waste data for example infections, Trojans, viciousness, explicit entertainment, betting et cetera. The threatening powers out of Nation and criminal components are using online data to participate in criminal activities that are dangerous for national security. So how to distinguish this data to locate comparing site and to bear on compulsory administration has turned into an earnest issue [39].

 

 

26. Privacy preserving extreme learning machine classification model for distributed systems

 

Here in this research paper we are talking about the process using in Machine learning based order techniques are broadly handed down to break down extensive level datasets in this time of enormous information. Extraordinary learning machine (ELM) arrangement calculation is a moderately new technique in view of summed up single-layer feed forward system structure. Conventional ELM learning calculation verifiably accept finish access to entire informational index. This is a noteworthy protection worry in the vast majority of cases. Sharing of private information (i.e. therapeutic documentation) is counteracted due to security concerns [40].

 

 

27. Privacy Preserving Decision Tree Learning Using Unrealized Data Sets

 

Here in this are each paper we are talking about the various aspects of the Security conservation and how it is essential for machine learning and information mining, however measures intended to ensure private data regularly result in an exchange off: lessened utility of the preparation tests. The paper presents a security saving methodology that can be connected to choice tree learning, without attending loss of precision. It depicts a way to deal with the conservation of the protection of gathered information tests in situations where data from the example database has been mostly lost [41].

 

28. Differentially private query learning: From data publishing to model publishing

 

Here in this research paper we are talking about the ML strategies and how a standout amongst the most persuasive security definitions, differential protection gives a thorough and provable security ensure for information distributing. Be that as it may, the guardian needs to discharge countless in a bunch or an engineered dataset in the Big Data period. Two difficulties should be handled: one is the way to diminish the relationship between’ s expansive arrangements of inquiries, while the other is the manner by which to foresee on new questions. This paper exchanges the information distributing issue to a machine learning issue, in which questions are examined as preparing tests and an expectation model will be discharged instead of inquiry results or engineered datasets [42].

 

 

29. Private and Scalable Personal Data Analytics Using Hybrid Edge-to-Cloud Deep Learning

 

Here in this research paper we are talking about the procedures and methods to call machine learning in spite of the fact that the capacity to gather, order, and investigate the tremendous measure of information produced from digital physical frameworks and Internet of Things gadgets can be useful to the two clients and industry, this procedure has prompted various difficulties, including security and adaptability issues. The creators show a cross breed system where client focused edge gadgets and assets can supplement the cloud for giving protection mindful, precise, and proficient investigation [43].

 

 

30. Privacy-Preserving Data Classification and Similarity Evaluation for Distributed Systems

 

Here in this research paper we are talking about the important information grouping and how it is a generally utilized information digging procedure for enormous information investigation. Via preparing monstrous information gathered from this present reality, information grouping enables students to find shrouded information designs. Notwithstanding information preparing, given a prepared model from gathered information, a client can group whether another approaching information has a place with a current class, or, different dispersed substances may team up to test the similitude of their prepared outcomes. Be that as it may, because of information territory and protection fields, it is practicable for huge scale circulated frameworks to share every individual's datasets with each other for information comparability [44].

 

Appendix III: Assignment 1 updates summary

 

Comments Updates

There were some comments given by tutor and supervisor in assignment 1 to update work in assignment 2. Here we are going to update that comments according to requirements.

 

First Comment: Need to follow the report template.

Updated: In assignment 2 all work is provided according to final report template. Font size, colure, spacing, deign are followed according to requirements.

 

Second Comment: Text in the report needs to be justified and text style follow the template.

Updated: introduction part is updated and text in introduction is justified and also text style in Garamond size 12 according to template requirement.

 

Third Comment: overview of the report not included

Updated: Overview of the report is tried to updated according to supervisor’s requirement.

 

Fourth Comment: Once a new section is started some text should be there, should not move directly from 2 to 2.2. This applies to the whole report.

Updated: In problem domain and research questions there is some text before starting problem domain. This is updated in whole assignment.

 

Fifth Comment: The problem domain could be more specific to the project seems very general.

Updated: Problem domain is updated now there is detailed information provided in problem domain.

 

Sixth Comment: The question needs to be rephrased as it is not correct grammatically.

Updated: All research questions are rephrased and grammatically error free.

 

Seventh Comment: Where is the all research work you have completed?? very short Literature Review.

Updated: Summary of literature review is updated and provided deep Literature Review.

 

Eighth Comment: Is this the summary of 30 papers?

Updated: Summary of Literature Review is updated with more information.

 

Ninth Comment: Figure number missing and if the figure is taken from any source, citation and reference needs to be provided.

Updated: This is updated in all assignment figure numbers and references are provided.

 

Tenth Comment: Costs should be in AUD

Updated: In section 4.1 Hardware requirements cost is updated in Australian Dollars.

 

Eleventh Comment: Table caption comes at the top

Updated: This is updated in whole assignment. Table numbers are provided and Table caption is at the top of the table.

 

Twelfth Comment: Avoid using I in a group project.

Updated: This is updated in whole assignment. Now there is no personal words.

 

Thirteenth Comment: References not in IEEE Style

Updated: References style is updated according to IEEE Style.

 

Fourteenth Comment: Reference style needs to be improved in Assignment 2

Updated: References style is improved in assignment 2

 

Fifteenth Comment: Clients signature missing

Not Updated: We could not have updated this comment. We asked from client to provide signature. But he said he cannot sign any paper without company’ s permission and he is working with New Zealand company.

 

Sixth Comment: References not cited

Updated: References is cited in all Literature Reviews.

 

Appendix IV: Assignment 2 updates summary

 

Assessment 2 updates made in assessment 3

 

 

Appendix V: ACS Core Body of Knowledge Mapping

 

ACS CBOK Elemenets

 

Appendix VI: Meeting log book

 

Meeting log.