##### Data Analysis and Decision Modelling Oz Assignments

Delivery in day(s): 4

Trading storage for computation is referred to the process in which empty storage space is utilized for increasing the efficiency of computation of numerical calculations. In this system, the storage space is saved by removing the results of the calculations and instead, the inputs, outputs, **business processes** and data are saved. This system is mainly useful for the systems that use small volumes of input data but generate huge volumes of data that is only for temporary use and will not be used or accessed later. These volumes of output data take up unnecessary amount of storage space that will not even have any future use. Hence, keeping the input data and formulations intact, if the outcomes are not stored, it will help save a large amount of storage as well as keep the formulations for future reference and use. With the integration with cloud computing technology, this system has been emerging and finding extensive applications in the business and other operational organizations.

This is the final project report that summarizes the literature review, final analysis results and findings that have been gathered throughout the course of the project. Further, special emphasis has been given to ensure the specified research questions are suitably answered through the research findings and data collected.

**The research questions are summarized as follows.**

1. How trading storage for computation is useful for the business organizations?

2. How the cloud computing technology is advantageous in mitigating the issue that is created by the standard programming paradigm?

3. What are steps that are required to be followed by the business organizations for developing trading storage for computation?

According to Joshi, Liu and Soljanin (2014), the combination of cloud computing and tradeoff storage has been benefitting trans coding in real life scenario as well as video layer encoding and decoding. The authors based their work on wireless technologies like WCDMA and LTE and set out to evaluate the requirements for implementing trade off technology for efficient and effective computation purposes.

As per McEvoy and Correll (2015), the trading storage for computing process has been significantly benefitted by the integration with cloud computing because cloud computing is an easily accessible technology with very less complexity. They have reported that the cloud computing holistic view is essentially useful for selecting the specific results that is to be stored for future accessibility and removing every other results that will lose relevance after some time.

Chen (2015) said that in addition to saving significant amount of storage space and increasing computing efficiency, the storage trade off system also helps in cutting costs and time during operations and processing of data. The efficiency of computation significantly increases with the implementation of trading storage for computation.

Zhao et al. (2015) developed certain programming paradigms in order to manage the storage trade off process such that it follows a particular framework for calculation, storage and retrieval of data. However, they also reported that finding the balance between storage and computation is extremely difficult even through the storage is traded off with the computing processes. The authors have also identified that standard programming paradigm for storing of data results in the formation of numerous challenges and issues in spite of the benefits received from the system.

Authors like Sasidharan, Senthoor and Kumar (2014) developed conceptual frameworks of the storage trading for computation that have been published in their papers. They reported that when a simple model is implemented, the final computation state is stored and during access for further reference, the results are directly read from the final computation state itself.

Authors Liu et al. (2014) also added recomputation process that utilizes cloud architecture and is used within the holistic model of computing. They said that the cloud infrastructure is utilized to perform the recomputation process that is much more effective and less time consuming. Furthermore, the authors have used some storage algorithms that they implemented in python. A part of the storage algorithm developed by the authors is shown as follows.

Storage_Algorithm.py: uses DAG_Reader.py to parse the sample xml file containing a DAG and returns the cost of store-all and the storage and re-computation costs of the nodes selected by the storage algorithm from DAG_Reader import DAGParser DAG_FILE="dag2.xml" storage/re-computation costs from Amazon Cloud computation_cost = .085 io_cost = .12 storage_cost = .1 formula to calculate storage costs def calc_storage(storage, store_cost, io_cost): total_storage_cost = (storage * io_cost) + (storage * store_cost) return total_storage_cost # formula to calculate re-computation costs def calc_recomp(storage, reuse, store_cost, io_cost, comp_cost, comp_time): recomp_cost = reuse * ((storage * io_cost) + (comp_time * comp_cost)) return recomp_cost create DAGParser object dagGraph = DAGParser(DAG_FILE) find the storage cost for 'store-all' print 'calculating costs for store-all\n' total_scost = 0 for ID in dagGraph.DAG: node = dagGraph.DAG[ID] storage = float(node['storage']) scost_per_node = calc_storage(storage, storage_cost, io_cost) total_scost = total_scost + scost_per_node print 'node ID: ', ID, 'node storage cost: ', scost_per_node print 'store-all total storage cost (in $): ', total_scost, '\n' # storage algorithm: determines the cost of storage and the cost of re-computation for all nodes in the DAG and stores based on the following procedure: if node re-computation cost > node storage cost ---> STORE if node re-computation cost < node storage cost ---> RE-COMPUTE (or NOT STORE) print 'calculating costs for storage algorithm\n' total_rcost = 0 total_scost = 0 for ID in dagGraph.DAG: node = dagGraph.DAG[ID] reuse = float(node['reuse']) storage = float(node['storage']) comp_time = float(node['compUnits']) / float(node['compPower']) node_scost = calc_storage(storage, storage_cost, io_cost) node_rcost = calc_recomp(storage, reuse, storage_cost, io_cost, computation_cost, comp_time) print 'node ID: ', ID, 'node recomp cost: ', node_rcost if node_scost > node_rcost and int(ID) != 1: # node should NOT be stored, so add node_rcost to total_rcost print 'node ID: ', ID, 'node storage cost: ', node_scost, '---> NOT STORED\n' total_rcost = total_rcost + node_rcost else: print 'node ID: ', ID, 'node storage cost: ', node_scost, '---> STORED\n' total_scost = total_scost + node_scost print 'storage algorithm total re-computation cost (in $): ', total_rcost print 'storage algorithm total storage cost (in $): ', total_scost print 'storage algorithm total cost (in $): ', total_scost + total_rcost |

**Xue et al. (2016) also implemented algorithms for calculating storage costs during recomputation process using case example as shown below.**

calculating costs for store-all (reuse = 1) node ID: 1 node storage cost: 225.28 node ID: 3 node storage cost: 77.0 node ID: 2 node storage cost: 33.0 node ID: 5 node storage cost: 50.336 node ID: 4 node storage cost: 110.0 store-all total storage cost (in $): 495.616 calculating costs for storage algorithm node ID: 1 node recomp cost: 122.965 node ID: 1 node storage cost: 225.28 ---> STORED node ID: 3 node recomp cost: 42.17 node ID: 3 node storage cost: 77.0 ---> NOT STORED node ID: 2 node recomp cost: 18.085 node ID: 2 node storage cost: 33.0 ---> NOT STORED node ID: 5 node recomp cost: 27.5835 node ID: 5 node storage cost: 50.336 ---> NOT STORED node ID: 4 node recomp cost: 60.255 node ID: 4 node storage cost: 110.0 ---> NOT STORED storage algorithm total re-computation cost (in $): 148.0935 storage algorithm total storage cost (in $): 225.28 storage algorithm total cost (in $): 373.3735 |

calculating costs for store-all (mixed reuse) node ID: 1 node storage cost: 225.28 node ID: 3 node storage cost: 77.0 node ID: 2 node storage cost: 33.0 node ID: 5 node storage cost: 50.336 node ID: 4 node storage cost: 110.0 store-all total storage cost (in $): 495.616 calculating costs for storage algorithm node ID: 1 node recomp cost: 122.965 node ID: 1 node storage cost: 225.28 ---> STORED node ID: 3 node recomp cost: 42.17 node ID: 3 node storage cost: 77.0 ---> NOT STORED node ID: 2 node recomp cost: 36.17 node ID: 2 node storage cost: 33.0 ---> STORED node ID: 5 node recomp cost: 27.5835 node ID: 5 node storage cost: 50.336 ---> NOT STORED node ID: 4 node recomp cost: 120.51 node ID: 4 node storage cost: 110.0 ---> STORED storage algorithm total re-computation cost (in $): 69.7535 storage algorithm total storage cost (in $): 368.28 storage algorithm total cost (in $): 438.0335 |

Cloud computing is a fast evolving technology that has found extensive use in almost every field of commercial operations. In addition to its basic features, it has also been integrated with a large number of systems in order to enhance their operations as well as increase the usability of the cloud computing services. Similarly, cloud computing is now utilized with the storage trading process in order to enhance its functionality and usability in the **business organizations**.

However, implementation of cloud computing in the process has its own limitations and risks. It is evident from case studies that cloud computing is often vulnerable to external threats like cyber attacks, malware injections and others. While cloud computing is integrated with any other processes, the risks are also brought to the same. Hence, it is important to scale the features in such a way that the chances of the risks are minimized as much as possible.

Storage trading process also has its limitations as evident from the works of various authors. The main limitation is that the storage trading requires a complex architecture of the systems and suitable executions of algorithms that mainly utilized with programming languages like python. Thus it is also clear that maintenance of such a system requires suitable technical expertise.

In this entire course of the project, the trading storage process for computation has been analyzed and its applications in the current computing world have been determined. It has been found that combination of trading storage process combined with cloud computing services has been a very effective technology in various operational sectors that conduct vast amount of calculations everyday along with the generation of a vast amount of outputs as well. While in some organizations, the outputs need to be stored for future analysis and reference, in some others, once the outputs are analyzed, they are not further needed. However, these outputs remain in the database and take up significant amount of storage space. This also increases the cost of system maintenance without any actual benefits. For instance, in a climate computing office, at various times, computations are done to determine various parameters of the climate like expected temperature or rainfall but after the time passes, these computed data are not required anymore but remain stored in the database. However, this new technology of storage trading for computing can significantly change this process. In this technology, the results are not stored at all. Instead, the data used for the computation are kept and the formulations used for the computation are saved. In case the results are required in future, recomputation is performed based on the stored input data and formulations and the results are generated back as necessary.

This entire project was based on the research and analysis of storage trading for computation process that has become a very popular technology among the operational sectors where large volumes of data are computed every day. Further, cloud computing has been integrated with the storage trade off to increase the overall efficiency of the computation. The basic concept of storage trade off is to remove any data results stored after computing and in case of future requirement, recomputation is done to retrieve back the results. In spite of the uses, there are limitations of the system including complex architecture and other threats. However, the technology is very promising and is constantly evolving. With spontaneous technical upgradation and development, it can be safety assumed that this technology will become more popular in the near future. Also, cloud computing technology is improving at a fast pace and it will only benefit the storage trading process and generate more efficiency for the system.

1. Chen, X. (2015). Decentralized computation offloading game for mobile cloud computing. IEEE Transactions on Parallel and Distributed Systems, 26(4), 974-983.

2. Chen, X., Jiao, L., Li, W., & Fu, X. (2016). Efficient multi-user computation offloading for mobile-edge cloud computing. IEEE/ACM Transactions on Networking, (5), 2795-2808.

3. Gangwar, H., Date, H., & Ramaswamy, R. (2015). Understanding determinants of cloud computing adoption using an integrated TAM-TOE model. Journal of Enterprise Information Management, 28(1), 107-130.

4. Joshi, G., Liu, Y., & Soljanin, E. (2014). On the delay-storage trade-off in content download from coded distributed storage systems. IEEE Journal on Selected Areas in Communications, 32(5), 989-997.

5. Liu, C., Chen, J., Yang, L. T., Zhang, X., Yang, C., Ranjan, R., & Kotagiri, R. (2014). Authorized public auditing of dynamic big data storage on cloud with efficient verifiable fine-grained updates. IEEE Transactions on Parallel and Distributed Systems, 25(9), 2234-2244.

6. McEvoy, M. A., & Correll, N. (2015). Materials that couple sensing, actuation, computation, and communication. Science, 347(6228), 1261689.

7. Sasidharan, B., Senthoor, K., & Kumar, P. V. (2014, June). An improved outer bound on the storage-repair-bandwidth tradeoff of exact-repair regenerating codes. In Information Theory (ISIT), 2014 IEEE International Symposium on (pp. 2430-2434). IEEE.

8. Xue, B., Zhang, M., Browne, W. N., & Yao, X. (2016). A survey on evolutionary computation approaches to feature selection. IEEE Transactions on Evolutionary Computation, 20(4), 606-626.

9. Yi, S., Li, C., & Li, Q. (2015, June). A survey of fog computing: concepts, applications and issues. In Proceedings of the 2015 workshop on mobile big data (pp. 37-42). ACM.

10. Zhang, Q., Yang, L. T., & Chen, Z. (2016). Deep computation model for unsupervised feature learning on big data. IEEE Transactions on Services Computing, 9(1), 161-171.

11. Zhao, H., Zheng, Q., Zhang, W., Du, B., & Chen, Y. (2015, July). A version-aware computation and storage trade-off strategy for multi-version VoD systems in the **Cloud *** Computing and Communication (ISCC), 2015 IEEE Symposium on* (pp. 943-948). IEEE.

12. Zhao, H., Zheng, Q., Zhang, W., Du, B., & Li, H. (2017). A Segment-Based Storage and Transcoding Trade-off Strategy for Multi-version VoD Systems in the Cloud. IEEE Trans. Multimedia, 19(1), 149-159