Knowledge-based checkpointing strategy for spot instances in cloud computing

Authors

  • Sumit Tomar Computer Science and Engineering Department, Motilal Nehru National Institute of Technology Allahabad, Prayagraj, U. P. India, 211004
  • Ashish Kumar Mishra Computer Science and Engineering Department, Motilal Nehru National Institute of Technology Allahabad, Prayagraj, U. P. India, 211004
  • Dharmendra K Yadav Computer Science and Engineering Department, Motilal Nehru National Institute of Technology Allahabad, Prayagraj, U. P. India, 211004

DOI:

https://doi.org/10.59796/jcst.V13N2.2023.1754

Keywords:

checkpointing, cloud computing, fault tolerance, spot instances

Abstract

The Amazon EC2 offers spot-priced virtual machines (VMs) at a reduced price compared to on-demand and reserved VMs. However, Amazon EC2 can terminate these VMs anytime due to the spot price and demand fluctuation. Using spot VMs results in a longer execution time and disrupts service availability. Users can use fault-tolerant techniques such as checkpointing, migration, and job duplication to mitigate the unreliability of spot VMs. In this paper, a knowledge-based checkpointing strategy is proposed to minimize the overall checkpointing overhead during the execution of jobs. The proposed scheme uses real-time price history to decide when to take a checkpoint. Results show that the proposed approach can significantly reduce the turnaround time by 18% compared to Hourly Checkpointing Strategy and 9% compared to Rising-Edge Checkpointing Strategy. One can also achieve 54% to 78% reliability with a cost saving of 78% for the workload used with the described approach.

References

Agarwal, S., Mishra, A. K., & Yadav, D. K. (2017). Forecasting price of amazon spot instances using neural networks. International Journal of Applied Engineering Research, 12(20), 10276-10283.

Agmon Ben-Yehuda, O., Ben-Yehuda, M., Schuster, A., & Tsafrir, D. (2013). Deconstructing Amazon EC2 spot instance pricing. ACM Transactions on Economics and Computation (TEAC), 1(3), 1-20. https://doi.org/10.1145/2509413.2509416

Alourani, A., & Kshemkalyani, A. D. (2020, July). Provisioning spot instances without employing fault-tolerance mechanisms. In 2020 19th International Symposium on Parallel and Distributed Computing (ISPDC) (pp. 126-133). IEEE. https://doi.org/10.1109/ISPDC51135.2020.00026

Amazon Web Services, Inc. (n.d.a). Secure and resizable cloud compute – Amazon EC2. Retrieved June 28, 2022, from https://aws.amazon.com/ec2/

Amazon Web Services, Inc. (n.d.b). Amazon EC2 Spot – Save up-to 90% on On-Demand Prices. Retrieved June 28, 2022, from https://aws.amazon.com/ec2/spot/

Amazon Web Services. (n.d.c). High-Performance Block Storage Retrieved June 27, 2022, from https://aws.amazon.com/ebs/

Andrzejak, A., Kondo, D., & Yi, S. (2010, August). Decision model for cloud computing under SLA constraints. In 2010 IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (pp. 257-266). IEEE. https://doi.org/10.1109/MASCOTS.2010.34

Buyya, R., Yeo, C. S., Venugopal, S., Broberg, J., & Brandic, I. (2009). Cloud computing and emerging IT platforms: Vision, hype, and reality for delivering computing as the 5th utility. Future Generation computer systems, 25(6), 599-616. https://doi.org/10.1016/j.future.2008.12.001

Calheiros, R. N., Ranjan, R., Beloglazov, A., De Rose, C. A., & Buyya, R. (2011). CloudSim: a toolkit for modeling and simulation of cloud computing environments and evaluation of resource provisioning algorithms. Software: Practice and experience, 41(1), 23-50. https://doi.org/10.1002/spe.995

Cao, J., Simonin, M., Cooperman, G., & Morin, C. (2015). Checkpointing as a Service in Heterogeneous Cloud Environments. https://doi.org/10.1109/ccgrid.2015.160

Chen, C., Lee, B. S., & Tang, X. (2014, December). Improving hadoop monetary efficiency in the cloud using spot instances. In 2014 IEEE 6th International Conference on Cloud Computing Technology and Science (pp. 312-319). IEEE. https://doi.org/10.1109/CloudCom.2014.35

Chohan, N., Castillo, C., Spreitzer, M., Steinder, M., Tantawi, A. N., & Krintz, C. (2010). See spot run: using spot instances for mapreduce workflows. HotCloud, 10, 1-7. https://dl.acm.org/doi/10.5555/1863103.1863110

Cloud Infrastructure Solutions | IBM. (n.d.). Retrieved June 25, 2022, from https://www.ibm.com/in-en/cloud/infrastructure

Compute Engine: Virtual Machines (VMs) | Google Cloud. (n.d.). Google Cloud. Retrieved June 27, 2022, from https://cloud.google.com/compute/

Dawoud, W., Takouna, I., & Meinel, C. (2012, June). Increasing spot instances reliability using dynamic scalability. In 2012 IEEE Fifth International Conference on Cloud Computing (pp. 959-961). IEEE. http://dx.doi.org/10.1109/CLOUD.2012.58

Deldari, A., & Salehan, A. (2021). A survey on preemptible IaaS cloud instances: challenges, issues, opportunities, and advantages. Iran Journal of Computer Science, 4(3), 1-24. https://doi.org/10.1007/s42044-020-00071-1

Di, S., Robert, Y., Vivien, F., Kondo, D., Wang, C. L., & Cappello, F. (2013, November). Optimization of cloud task processing with checkpoint-restart mechanism. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (pp. 1-12). http://doi.acm.org/10.1145/2503210.2503217

Goiri, Í., Julia, F., Guitart, J., & Torres, J. (2010, April). Checkpoint-based fault-tolerant infrastructure for virtualized service providers. In 2010 IEEE network operations and management symposium-NOMS 2010 (pp. 455-462). IEEE. https://doi.org/10.1109/NOMS.2010.5488493

Hussain, Z., Znati, T., & Melhem, R. (2019, May). Optimal placement of in-memory checkpoints under heterogeneous failure likelihoods. In 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS) (pp. 900-910). IEEE. https://doi.org/10.1109/IPDPS.2019.00098

Jangjaimon, I., & Tzeng, N. F. (2015). Effective cost reduction for elastic clouds under spot instance pricing through adaptive checkpointing. IEEE Transactions on Computers, 64(2), 396-409. https://doi.org/10.1109/TC.2013.225

Jangjaimon, I., & Tzeng, N. F. (2013, May). Adaptive incremental checkpointing via delta compression for networked multicore systems. In 2013 IEEE 27th International Symposium on Parallel and Distributed Processing (pp. 7-18). IEEE. http://dx.doi.org/10.1109/IPDPS.2013.33

Javadi, B., Thulasiram, R. K., & Buyya, R. (2013). Characterizing spot price dynamics in public cloud environments. Future Generation Computer Systems, 29(4), 988-999. http://dx.doi.org/10.1016/j.future.2012.06.012

Javadi, B., Thulasiramy, R. K., & Buyya, R. (2011, December). Statistical modeling of spot instance prices in public cloud environments. In 2011 fourth IEEE international conference on utility and cloud computing (pp. 219-228). IEEE. https://doi.org/10.1109/UCC.2011.37

Jung, D., Chin, S., Chung, K., Yu, H., & Gil, J. (2011). An efficient checkpointing scheme using price history of spot instances in cloud computing environment. In Network and Parallel Computing: 8th IFIP International Conference, NPC 2011, Changsha, China, October 21-23, 2011. Proceedings 8 (pp. 185-200). Springer Berlin Heidelberg. http://dx.doi.org/10.1007/978-3-642-24403-2_16

Liu, W., Wang, P., Meng, Y., Zhao, C., & Zhang, Z. (2020). Cloud spot instance price prediction using kNN regression. Human-centric Computing and Information Sciences, 10(1), 1-14. https://doi.org/10.1186/s13673-020-00239-5

Mattess, M., Vecchiola, C., & Buyya, R. (2010, September). Managing peak loads by leasing cloud infrastructure services from a spot market. In 2010 IEEE 12th International Conference on High Performance Computing and Communications (HPCC) (pp. 180-188). IEEE. https://doi.org/10.1109/HPCC.2010.77

Mishra, A. K., Kesarwani, A., & Yadav, D. K. (2019a, March). Short term price prediction for preemptible vm instances in cloud computing. In 2019 IEEE 5th International Conference for Convergence in Technology (I2CT) (pp. 1-9). IEEE. https://doi.org/10.1109/I2CT45611.2019.9033677

Mishra, A. K., Umrao, B. K., & Yadav, D. K. (2018). A survey on optimal utilization of preemptible VM instances in cloud computing. The Journal of Supercomputing, 74, 5980-6032. https://doi.org/10.1007/s11227-018-2509-0

Mishra, A. K., Yadav, D. K., Kumar, Y., & Jain, N. (2019b). Improving reliability and reducing cost of task execution on preemptible VM instances using machine learning approach. The Journal of Supercomputing, 75, 2149-2180. https://doi.org/10.1007/s11227-018-2717-7

Popovici, F. I., & Wilkes, J. (2005, November). Profitable services in an uncertain world. In SC'05: Proceedings of the 2005 ACM/IEEE conference on Supercomputing (pp. 36-36). IEEE. https://doi.org/10.1109/SC.2005.58

Ramesh, A., Pradhan, V., & Lamkuche, H. (2021, July). Understanding and analysing resource utilization, costing strategies and pricing models in cloud computing. In Journal of Physics: Conference Series, 1964(4), Article 042049. https://doi.org/10.1088/1742- 6596/1964/4/042049

Sharma, P., Irwin, D. E., & Shenoy, P. J. (2016). How Not to Bid the Cloud. In A. Clements & T. Condie (Eds.), 8th {USENIX} Workshop on Hot Topics in Cloud Computing, HotCloud 2016, Denver, CO, USA, June 20-21, 2016. {USENIX} Association. https://www.usenix.org/conference/hotcloud16/workshop-program/presentation/sharma

Song, Y., Zafer, M., & Lee, K. W. (2012, March). Optimal bidding in spot instance market. In 2012 Proceedings IEEE Infocom (pp. 190-198). IEEE. https://doi.org/10.1109/INFCOM.2012.6195567

Toosi, A. N., Vanmechelen, K., Khodadadi, F., & Buyya, R. (2016). An auction mechanism for cloud spot markets. ACM Transactions on Autonomous and Adaptive Systems (TAAS), 11(1), 1-33. https://doi.org/10.1145/2843945

Voorsluys, W., & Buyya, R. (2012, March). Reliable provisioning of spot instances for compute-intensive applications. In 2012 IEEE 26th international conference on advanced information networking and applications (pp. 542-549). IEEE. https://doi.org/10.1109/AINA.2012.106

Wang, Y. M., Huang, Y., Vo, K. P., Chung, P. Y., & Kintala, C. (1995, June). Checkpointing and its applications. In Twenty-fifth International Symposium on fault-tolerant Computing. Digest of papers (pp. 22-31). IEEE. https://doi.org/10.1109/FTCS.1995.466999

Wu, L., Garg, S. K., & Buyya, R. (2012). SLA-based admission control for a Software-as-a-Service provider in Cloud computing environments. Journal of Computer and System Sciences, 78(5), 1280-1299. https://doi.org/10.1016/j.jcss.2011.12.014

Yang, S., Khuller, S., Choudhary, S., Mitra, S., & Mahadik, K. (2021, December). Scheduling ML training on unreliable spot instances. In Proceedings of the 14th IEEE/ACM International Conference on Utility and Cloud Computing Companion (Article 29, pp. 1-8). https://doi.org/10.1145/3492323.3495594

Yi, S., Andrzejak, A., & Kondo, D. (2012). Monetary cost-aware checkpointing and migration on amazon cloud spot instances. IEEE Transactions on Services Computing, 5(4), 512-524. https://doi.org/10.1109/TSC.2011.44

Yi, S., Kondo, D., & Andrzejak, A. (2010, July). Reducing costs of spot instances via checkpointing in the amazon elastic compute cloud. In 2010 IEEE 3rd International Conference on Cloud Computing (pp. 236-243). IEEE. https://doi.org/10.1109/CLOUD.2010.35

Downloads

Published

2023-07-15

How to Cite

Sumit Tomar, Ashish Kumar Mishra, & Dharmendra K Yadav. (2023). Knowledge-based checkpointing strategy for spot instances in cloud computing. Journal of Current Science and Technology, 13(2), 412–427. https://doi.org/10.59796/jcst.V13N2.2023.1754

Issue

Section

Research Article