An improved note segmentation and normalization for Query-by-Humming


  • Nattha Phiwma Faculty of Information Technology, Rangsit University, Patumthani 12000, Thailand
  • Parinya Sanguansat Faculty of Engineering and Technology, Panyapiwat Institute of Management, Nonthaburi 11120, Thailand


Query-by-Humming, melody contour, Dynamic Time Warping, pitch, Subharmonic-to-Harmonic Ratio, note segmentation


To improve Query-by-Humming, in this paper, we propose a note segmentation by humming sound method, melody contour extraction technique, and new normalization methods. The noise interference from both the environment and acquisition instrument is the critical issues in humming sound. The query problems about variation of pitch and timing are because most users are not professional singers. The advantage of the note segmentation by humming method is it can separate the sound and silent parts from each other through the process.  The melody contour extraction can reduce noise resulting in pitch smoothing. Our approach starts from pre-processing by using features for note segmentation by a humming sound. The process consists of three steps as follows: Firstly, the pitch is extracted from the humming sound by Subharmonic-to-Harmonic Ratio (SHR). Afterwards, we used various new normalization methods, including melody contour extraction, for scaling and noise robust. Finally, Dynamic Time Warping (DTW) is applied to the melody contour, for similarity of measurement between the humming sound and the melody sequence. Comparing our proposed technique and the traditional method, the results show that our proposed techniques can perform more effectively.


Astola, J., Haavisto, P., & Neuvo, Y. (1990). Vector median filters. In Proceedings of the IEEE, 78, 678-689.

Behroozmand, R., & Almasganj, F. (2005, December). Comparison of neural networks and support vector machines applied to otimized features extracted from patients' speech signal for classification of vocal fold inflammation. In Proceedings of the Fifth IEEE International Symposium on Signal Processing and Information Technology, 844-849.

Dannenberg, R. B., Birmingham, W. P., Pardo, B., Hu, N., Meek, C., &Tzanetakis, G. (2007). A comparative evaluation of search techniques for query-by-humming using the MUSART testbed. Journal of the American Society for Information Science and Technology, 58(3), 687–701. doi: 10.1002/asi.20532

Gallagher, N. J., & Wise, G. (1981). A theoretical analysis of the properties of median filters. IEEE Transactions on Acoustics, Speech and Signal Processing, 9(6),1136-1141.

Gao, L., & Wu, Y. (2006). A system for melody extraction from various humming inputs. In IEEE International Symposium on Signal Processing and Information Technology, 680-684. doi: 10.1109/ISSPIT.2006.270886

Ghias, A., Logan, J., Chamberlin, D., & Smith, B. C. (1995, November 5 - 9). Query by humming: Musical information retrieval in an audio database. In Proceedings of the third ACM international conference on Multimedia, at San Francisco, CA, USA, 231-236. doi>10.1145/217279.215273

Hu, J., Ray, B., & Han, L. (2006). An Interweaved HMM/DTW approach to robust time series clustering. In Proceedings of the 18th International Conference on Pattern Recognition (ICPR), at Washington, DC, USA, 145-148.

Jang, J-S. R., & Lee, H-R. (2001). Hierarchical filtering method for content-based music retrieval via acoustic input. In Proceedings of the ninth ACM International Conference on Multimedia, at New York, 401-410.

Keogh, E. (2002, August 20-23). Exact indexing of dynamic time warping. In Proceedings of the 28th International Conference on Very Large Data Bases (VLDB '02), at Hong Kong SAR, China, 406--417.

Kim, H-G., & Sikora, T. (2004, September 6 - 10). Audio spectrum projection based on several basis decomposition algorithms applied to general sound recognition and audio segmentation. In XII European Signal Processing Conference, Vienna, Austria, 1047-1050.

Liu, Y.,Xu, J-P., Wei, L., & Tian, Y. (2007). The study of the classification of chinese folk songs by regional style. In Proceedings of the International Conference on Semantic Computing (ICSC), at Washington, DC, USA, 657-662. doi>10.1109/ICSC.2007.99

McNab, R. J., Smith, L. A., Witten, I. H., Henderson, C. L. & Cunningham, S. J. (1996). Towards the digital music library: Tune retrieval from acoustic input. In Proceedings of the first ACM international conference on Digital libraries, at Bethesda,11-18.

Nishimura, T., Zhang, J. X., & Hashiguchi, H. (2001). Music signal spotting retrieval by a humming query using start frame feature dependent continuous dynamic programming. In Proceeding of the third International Symposium on Music Information Retrieval Continuous Dynamic Programming, 211-218.

Nguyen, H. Q., Nocera, P., Castelli, E., & Van Loan, T. (2008, June 4 - 6). Tone recognition of Vietnamese continuous speech using hidden Markov model. Second International Conference on Communications and Electronics, at Hoi an, Viatnam, 235-239.

Raphael, C. (1999). Automatic segmentation of acoustic musical signals using hidden Markov models. In IEEE Transactions on Pattern Analysis and Machine Intelligence, 360-370.

Shih, H. H., Narayanan, S. S., & Kuo, C. C. J. (2003a). An hmm-based approach to humming transcription. In Proceedings of the IEEE International Conference on Multimedia & Expo (ICME), 337-340.

Shih, H. H., Narayanan, S. S., & Kuo, C. C. J. (2003b). Multidimensional humming transcription using a statistical approach for query by humming systems. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 5, 541-544.

Sun, X. (2002, May 13 - 17). Pitch determination and voice quality analysis using subharmonic-to-harmonic ratio. In Proceedings of the International Conference on Acoustics, Speech, and Signal, 2002, at Orlondo, Florida, USA, 1, 333-336.

Uitdenbogerd, A. L., & Zobel, J. (1999). Melodic matching techniques for large music databases. Proceedings of the seventh ACM International Conference on Multimedia (Part 1), at Orlando, Florida, USA, 57-66.

Vega-L´opez, I. F., & Moon, B. (2006, January 23 - 25). Quantizing time series for efficient similarity search under time warping. In Proceedings of the 2nd IASTED International Conference on Advances in Computer Science and Technology, at Puerto Vallarta, Mexico, 334-339.

Wang, Lei, et al. (2008). An Effective and Efficient Method for Query by Humming System Based on Multi-Similarity Measurement Fusion. International Conference on Audio, Language and Image Processing, at Shanghai, Chaina, 471-475.

Wu, Z., Cai, L., & Meng, H. (2006). Multi-level fusion of audio and visual features for speaker identification. In In: Proc. Int. Conf. Biometrics LNCS 3832.

Zhu, Y., Kankanhalli, M. S., & Xu, C. (2001). Pitch tracking and melody slope matching for song retrieval. In Proceedings of the Second IEEE Pacific Rim Conference on Multimedia, at London, UK, 530-537.

Zhu, Y., Kankanhalli, M., & Tian, Q. (2002, December 9 - 12). Similarity matching of continuous melody contours for humming querying of melody databases. In proceedings of IEEE Workshop on Multimedia Signal Processing, at St. Thomas, Virgin Islands, USA, 249-252.




How to Cite

Phiwma, N. ., & Sanguansat, P. (2023). An improved note segmentation and normalization for Query-by-Humming. Journal of Current Science and Technology, 1(2), 139–148. Retrieved from



Research Article