Clustering and Exploring of Gene Functional Modules from Cassava Root Gene Expression Data

Authors

  • Porntip Dechpichai Department of Mathematics, Faculty of Science, King Mongkut's University of Technology Thonburi, Bangkok, Thailand
  • Fareeda Puengpien Department of Mathematics, Faculty of Science, King Mongkut's University of Technology Thonburi, Bangkok, Thailand
  • Sirilak Sittipoonprachaya Department of Mathematics, Faculty of Science, King Mongkut's University of Technology Thonburi, Bangkok, Thailand
  • Chunchom Salikupata Department of Mathematics, Faculty of Science, King Mongkut's University of Technology Thonburi, Bangkok, Thailand
  • Treenut Saithong Bioinformatics and Systems Biology Program, School of Bioresources and Technology, King Mongkut's University of Technology Thonburi, Bangkok, Thailand
  • Saowalak Kalapanulak Bioinformatics and Systems Biology Program, School of Bioresources and Technology, King Mongkut's University of Technology Thonburi, Bangkok, Thailand

Keywords:

Gene Clustering Analysis, Gene Expression, Cassava

Abstract

Cassava is an important economic crop, both in Thailand and internationally. Advances in sequencing technology have allowed cassava genome to be deciphered. However, identifying the functions of all genes in the cassava genome using plant molecular biology laboratory requires is a tedious and resource-extensive. The present research therefore aimed to predict gene functions based on their expression profiles using the K-means clustering method and to propose their functions to unknown genes via the use of Gene Set Enrichment Analysis (GSEA). Three tissues of cassava roots, including storage root, fibrous root and root apical meristem, were used in the study. The gene expression data were divided into 2 subsets, which are SET1: fibrous root and root apical meristem and SET2: storage root, fibrous root and root apical meristem. Cassava genes could be divided into 21 groups and 20 groups, respectively; however, only 14 groups can be assigned the significant functions in both subsets. 8,561 and 8,727 unknown genes can be assigned the functions in SET1 and SET2, respectively. Totally, putative related functions can be assigned to 8,736 cassava genes or 26.45 percent of all the genes in the cassava genome. The results allow 75.38 percent of the genes in the genome to be assigned with their related functions.

References

Office of Agricultural Economics, 2020, Thailand Foreign Agricultural Trade Statistics 2019 [Online], Available: http://www.oae.go.th/assets/portals/1/files/journal/2563/trade st-at62.pdf. (In Thai)

Food and Agriculture Organization of the United Nations (FAO), 2017, The Future of Food and Agriculture: Trends and Challenges [Online], Available: http://www.fao.org/3/-a-i6881e.pdf.

Bredeson, J.V., Lyons, J.B., Prochnik, S., Wu, G.A., Ha, C.M., Ha, C.M., Edsinger-Gonzales, E., Edsinger-Gonzales, E., Grimwood, J., Schmutz, J., Rabbi, I.Y., Egesi, C., Nauluvula, P., Lebot, V., Ndunguru, J., Mkamilo, G.S., Bart, R., Setter, T.L., Gleadow, R. M., Kulakow, P., Ferguson, M., Rounsley, S., Rokhsar, D.S., Rokhsar, D.S. and Rokhsar, D.S., 2016, “Sequencing Wild and Cultivated Cassava and Related Species Reveals Extensive Interspecific Hybridization and Genetic Diversity,” Nature Biotechnology, 34 (5), pp. 562-570.

Mackenzie, R., 2018, RNA-seq: Basics, Applications and Protocol [Online], Available: https://www.technologynetworks.com/genomics/articles/rna-seq-basics-applica-tions-and-protocol-299461.

Goodstein, D., Shu, S., Howson, R., Neupane, R., Hayes, R.D., Fazo, J., Mitros, T., Dirks, W., Hellsten, U., Putnam, N.H. and Rokhsar, D.S., 2012, “Phytozome: a Comparative Platform for Green Plant Genomics,” Nucleic Acids Research, 40, pp. 1178-1186. https://doi.org/10.1093/nar/gkr944

Wong, D.C., Sweetman, C. and Ford, C.M., 2014, "Annotation of Gene Function in Citrus Using Gene Expression Information and Co-expression Networks," BMC Plant Biology, 14 (1): 186. https://doi.org/10.1186/1471-2229-14-186

Villaverde, A.F. and Banga, J.R., 2014, "Reverse Engineering and Identification in Systems Biology: Strategies, Perspectives and Challenges," Journal of the Royal Society Interface, 11: 20130505. https://doi.org/10.1098/rsif.2013.0505

Wilson, M.C., Mutka, A.M., Hummel, A.W., Berry, J., Chauhan, R.D., Vijayaragha-van, A., Taylor, N.J., Voytas, D.F., Chitwood, D.H. and Bart, R.S., 2017, "Gene Expression Atlas for the Food Security Crop Cassava," New Phytologist, 213 (4), pp. 1632-1641. https://doi.org/10.1111/nph.14443

Brown, T.A., 2002, Genomes [Online], Available: https://www.ncbi.nlm.nih.gov/books/NBK21130/.

The Gene Ontology Consortium, Gene Ontology Overview [Online], Available: http://geneontology.org/docs/ontology-documentation/.

Smid, M., Coebergh van den Braak, R.R.J., van de Werken, H.J.G., van Riet, J., van Galen, A., de Weerd V., van der Vlugt-Daane, M., Bril, S.I., Lalmahomed, Z.S., Kloosterman, W.P., Wilting, S.M., Foekens, J.A., IJzermans, J.N.M., Martens, J.W.M. and Sieuwerts, A.M., 2018, “Gene Length Corrected Trimmed Mean of M-values (GeTMM) Processing of RNA-seq Data Performs Similarly in Intersample Analyses while Improving Intrasample Comparisons,” BMC Bioinformatics, 19: 236. https://doi.org/10.1186/s12859-018-2246-7

James, G., Witten, D., Hastie, T. and Tibshirani, R., 2013 An Introduction to Statistical Learning: with Applications in R, Springer, New York.

Shi, J. and Walker, M.G., 2007, "Gene Set Enrichment Analysis (GSEA) for Interpreting Gene Expression Profiles," Current Bioinformatics, 2 (2), pp. 133-137. https://doi.org/10.2174/157489307780618231

Subramanian A., Tamayo P., Mootha V.K., Mukherjee S., Ebert B.L., Gillette M.A., Paulovich A., Pomeroy S.L., Golub T.R., Lander E.S. and Mesirov, J.P., 2005, “Gene Set Enrichment Analysis: a Knowledge-based Approach for Interpreting Genome-wide Expression Profiles,” Proceedings of the National Academy of Sciences of the United States of America, 102 (43), pp. 15545-15550. https://doi.org/10.1073/pnas.0506580102

Maruschke, M., Hakenberg, O.W., Koczan, D., Zimmermann, W., Stief, C.G. and Buchner, A., 2014, "Expression Profiling of Metastatic Renal Cell Carcinoma Using Gene Set Enrichment Analysis," International Journal of Urology, 21 (1), pp. 46-51. https://doi.org/10.1111/iju.12183

Wu, B., Li, C., Xie, J., Du, Z., Luo, L., Wu, J., Zhang, P., Xu, L. and Li, E., 2014, "Bioinformatics Analyses of m-RNA Profiling Following Ezrin Knockdown in Esophageal Squamous Cell Carcinoma," Journal of Cancer Science and Therapy, 6 (9), pp. 314-321. https://doi.org/10.4172/1948-5956.1000287

Yu, Y., Blokhuis, B.R., Garssen, J. and Redegeld, F.A., 2019, "A Transcriptomic Insight into the Impact of Colon Cancer Cells on Mast Cells," International Journal of Molecular Sciences, 20 (7): 1689. https://doi.org/10.3390/ijms20071689

Klopfenstein, D.V., Zhang, L., Pedersen, B.S., Ramírez, F., Vesztrocy, A.W., Naldi, A., Mungall, C.J., Yunes, J.M., Botvinnik, O., Weigel, M., Dampier, W., Dessimoz, C., Flick, P. and Tang, H., 2018, "GOATOOLS: A Python Library for Gene Ontology Analyses," Scientific Reports, 8: 10872. https://doi.org/10.1038/s41598-018-28948-z

Huang, D.W., Sherman, B.T. and Lempicki, R.A., 2009, "Bioinformatics Enrichment Tools: Paths toward the Comprehensive Functional Analysis of Large Gene Lists," Nucleic Acids Research, 37 (1), pp. 1-13. https://doi.org/10.1093/nar/gkn923

Robinson, M.D., McCarthy, D.J. and Smyth, G.K., 2010, edgeR: a Bioconductor Package for Differential Expression Analysis of Digital Gene Expression Data, Bioinformatics, 26 (1), pp. 139-140. https://doi.org/10.1093/bioinformatics/btp616

Chen, Y., McCarthy, D., Robinson, M. and Smyth, G.K, 2008, edgeR: Differential Expression Analysis of Digital Gene Expression Data User's Guide [Online], Available: http://www.bioconductor.org/packages/release/bioc/vignettes/edgeR/inst/doc/edgeRUsersGuide.pdf.

Williams, A. and Halappanavar, S., 2017, “Application of Bi-clustering of Gene Expression Data and Gene Set Enrichment Analysis Methods to Identify Potentially Disease Causing Nanomaterials,” Data in Brief, 15, pp. 933-940. https://doi.org/10.1016/j.dib.2017.10.060

Downloads

Published

2021-09-30

How to Cite

Dechpichai, P., Puengpien, F., Sittipoonprachaya, S., Salikupata, C., Saithong, T., & Kalapanulak, S. (2021). Clustering and Exploring of Gene Functional Modules from Cassava Root Gene Expression Data. Science and Engineering Connect, 44(3), 485–500. retrieved from https://ph04.tci-thaijo.org/index.php/SEC/article/view/10445

Issue

Section

Research Article