Clustering and Exploring of Gene Functional Modules from Cassava Root Gene Expression Data
Keywords:
Gene Clustering Analysis, Gene Expression, CassavaAbstract
Cassava is an important economic crop, both in Thailand and internationally. Advances in sequencing technology have allowed cassava genome to be deciphered. However, identifying the functions of all genes in the cassava genome using plant molecular biology laboratory requires is a tedious and resource-extensive. The present research therefore aimed to predict gene functions based on their expression profiles using the K-means clustering method and to propose their functions to unknown genes via the use of Gene Set Enrichment Analysis (GSEA). Three tissues of cassava roots, including storage root, fibrous root and root apical meristem, were used in the study. The gene expression data were divided into 2 subsets, which are SET1: fibrous root and root apical meristem and SET2: storage root, fibrous root and root apical meristem. Cassava genes could be divided into 21 groups and 20 groups, respectively; however, only 14 groups can be assigned the significant functions in both subsets. 8,561 and 8,727 unknown genes can be assigned the functions in SET1 and SET2, respectively. Totally, putative related functions can be assigned to 8,736 cassava genes or 26.45 percent of all the genes in the cassava genome. The results allow 75.38 percent of the genes in the genome to be assigned with their related functions.
References
Office of Agricultural Economics, 2020, Thailand Foreign Agricultural Trade Statistics 2019 [Online], Available: http://www.oae.go.th/assets/portals/1/files/journal/2563/trade st-at62.pdf. (In Thai)
Food and Agriculture Organization of the United Nations (FAO), 2017, The Future of Food and Agriculture: Trends and Challenges [Online], Available: http://www.fao.org/3/-a-i6881e.pdf.
Bredeson, J.V., Lyons, J.B., Prochnik, S., Wu, G.A., Ha, C.M., Ha, C.M., Edsinger-Gonzales, E., Edsinger-Gonzales, E., Grimwood, J., Schmutz, J., Rabbi, I.Y., Egesi, C., Nauluvula, P., Lebot, V., Ndunguru, J., Mkamilo, G.S., Bart, R., Setter, T.L., Gleadow, R. M., Kulakow, P., Ferguson, M., Rounsley, S., Rokhsar, D.S., Rokhsar, D.S. and Rokhsar, D.S., 2016, “Sequencing Wild and Cultivated Cassava and Related Species Reveals Extensive Interspecific Hybridization and Genetic Diversity,” Nature Biotechnology, 34 (5), pp. 562-570.
Mackenzie, R., 2018, RNA-seq: Basics, Applications and Protocol [Online], Available: https://www.technologynetworks.com/genomics/articles/rna-seq-basics-applica-tions-and-protocol-299461.
Goodstein, D., Shu, S., Howson, R., Neupane, R., Hayes, R.D., Fazo, J., Mitros, T., Dirks, W., Hellsten, U., Putnam, N.H. and Rokhsar, D.S., 2012, “Phytozome: a Comparative Platform for Green Plant Genomics,” Nucleic Acids Research, 40, pp. 1178-1186. https://doi.org/10.1093/nar/gkr944
Wong, D.C., Sweetman, C. and Ford, C.M., 2014, "Annotation of Gene Function in Citrus Using Gene Expression Information and Co-expression Networks," BMC Plant Biology, 14 (1): 186. https://doi.org/10.1186/1471-2229-14-186
Villaverde, A.F. and Banga, J.R., 2014, "Reverse Engineering and Identification in Systems Biology: Strategies, Perspectives and Challenges," Journal of the Royal Society Interface, 11: 20130505. https://doi.org/10.1098/rsif.2013.0505
Wilson, M.C., Mutka, A.M., Hummel, A.W., Berry, J., Chauhan, R.D., Vijayaragha-van, A., Taylor, N.J., Voytas, D.F., Chitwood, D.H. and Bart, R.S., 2017, "Gene Expression Atlas for the Food Security Crop Cassava," New Phytologist, 213 (4), pp. 1632-1641. https://doi.org/10.1111/nph.14443
Brown, T.A., 2002, Genomes [Online], Available: https://www.ncbi.nlm.nih.gov/books/NBK21130/.
The Gene Ontology Consortium, Gene Ontology Overview [Online], Available: http://geneontology.org/docs/ontology-documentation/.
Smid, M., Coebergh van den Braak, R.R.J., van de Werken, H.J.G., van Riet, J., van Galen, A., de Weerd V., van der Vlugt-Daane, M., Bril, S.I., Lalmahomed, Z.S., Kloosterman, W.P., Wilting, S.M., Foekens, J.A., IJzermans, J.N.M., Martens, J.W.M. and Sieuwerts, A.M., 2018, “Gene Length Corrected Trimmed Mean of M-values (GeTMM) Processing of RNA-seq Data Performs Similarly in Intersample Analyses while Improving Intrasample Comparisons,” BMC Bioinformatics, 19: 236. https://doi.org/10.1186/s12859-018-2246-7
James, G., Witten, D., Hastie, T. and Tibshirani, R., 2013 An Introduction to Statistical Learning: with Applications in R, Springer, New York.
Shi, J. and Walker, M.G., 2007, "Gene Set Enrichment Analysis (GSEA) for Interpreting Gene Expression Profiles," Current Bioinformatics, 2 (2), pp. 133-137. https://doi.org/10.2174/157489307780618231
Subramanian A., Tamayo P., Mootha V.K., Mukherjee S., Ebert B.L., Gillette M.A., Paulovich A., Pomeroy S.L., Golub T.R., Lander E.S. and Mesirov, J.P., 2005, “Gene Set Enrichment Analysis: a Knowledge-based Approach for Interpreting Genome-wide Expression Profiles,” Proceedings of the National Academy of Sciences of the United States of America, 102 (43), pp. 15545-15550. https://doi.org/10.1073/pnas.0506580102
Maruschke, M., Hakenberg, O.W., Koczan, D., Zimmermann, W., Stief, C.G. and Buchner, A., 2014, "Expression Profiling of Metastatic Renal Cell Carcinoma Using Gene Set Enrichment Analysis," International Journal of Urology, 21 (1), pp. 46-51. https://doi.org/10.1111/iju.12183
Wu, B., Li, C., Xie, J., Du, Z., Luo, L., Wu, J., Zhang, P., Xu, L. and Li, E., 2014, "Bioinformatics Analyses of m-RNA Profiling Following Ezrin Knockdown in Esophageal Squamous Cell Carcinoma," Journal of Cancer Science and Therapy, 6 (9), pp. 314-321. https://doi.org/10.4172/1948-5956.1000287
Yu, Y., Blokhuis, B.R., Garssen, J. and Redegeld, F.A., 2019, "A Transcriptomic Insight into the Impact of Colon Cancer Cells on Mast Cells," International Journal of Molecular Sciences, 20 (7): 1689. https://doi.org/10.3390/ijms20071689
Klopfenstein, D.V., Zhang, L., Pedersen, B.S., Ramírez, F., Vesztrocy, A.W., Naldi, A., Mungall, C.J., Yunes, J.M., Botvinnik, O., Weigel, M., Dampier, W., Dessimoz, C., Flick, P. and Tang, H., 2018, "GOATOOLS: A Python Library for Gene Ontology Analyses," Scientific Reports, 8: 10872. https://doi.org/10.1038/s41598-018-28948-z
Huang, D.W., Sherman, B.T. and Lempicki, R.A., 2009, "Bioinformatics Enrichment Tools: Paths toward the Comprehensive Functional Analysis of Large Gene Lists," Nucleic Acids Research, 37 (1), pp. 1-13. https://doi.org/10.1093/nar/gkn923
Robinson, M.D., McCarthy, D.J. and Smyth, G.K., 2010, edgeR: a Bioconductor Package for Differential Expression Analysis of Digital Gene Expression Data, Bioinformatics, 26 (1), pp. 139-140. https://doi.org/10.1093/bioinformatics/btp616
Chen, Y., McCarthy, D., Robinson, M. and Smyth, G.K, 2008, edgeR: Differential Expression Analysis of Digital Gene Expression Data User's Guide [Online], Available: http://www.bioconductor.org/packages/release/bioc/vignettes/edgeR/inst/doc/edgeRUsersGuide.pdf.
Williams, A. and Halappanavar, S., 2017, “Application of Bi-clustering of Gene Expression Data and Gene Set Enrichment Analysis Methods to Identify Potentially Disease Causing Nanomaterials,” Data in Brief, 15, pp. 933-940. https://doi.org/10.1016/j.dib.2017.10.060
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2021 King Mongkut's University of Technology Thonburi

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Any form of contents contained in an article published in Science and Engineering Connect, including text, equations, formula, tables, figures and other forms of illustrations are copyrights of King Mongkut's University of Technology Thonburi. Reproduction of these contents in any format for commercial purpose requires a prior written consent of the Editor of the Journal.