Relative Functional Comparison of Neural and Non-Neural Approaches for Syllable Segmentation in Devnagari TTS System
This paper presents methods for automatic speech signal segmentation using neural network. Speech signal segmentation is carried out to form syllables. Syllable is a common unit for concatenative TTS systems. Concatenative TTS being using speech segments of recorded speech is natural as compare to Formant or Articulatory TTS systems. This TTS stores small segments of speech and join them together to form new word. This helps to generate more number of words based on very small database. As manual segmentation is very time consuming and it has certain limitation on naturalness, some neural network models are used to improve naturalness of resulting segments in speech synthesis. The proposed work explains how neural network approaches like Maxnet, K-means outweighs in performance than traditional non neural approaches like slope detection and simulated annealing. About more than 90% accuracy is achieved with neural network models for syllable segmentation which resulted in naturalness improvement of Marathi TTS.
Keywords: Neural Network Approach, Non-Neural Approach, Text to Speech (TTS) System, Speech Segmentation.
Download Full-Text
ABOUT THE AUTHORS
Smita P. Kawachale
Eight years expereince in M.I.T., College of Engineering, Pune, ISTE membership, 10 international and national conference papers.
Janardan S. Chitode
More than 18 years of Teaching experience in Bharati Vidyapeeth College of Engineering, University topper, Best M.E. and research guide.
Smita P. Kawachale
Eight years expereince in M.I.T., College of Engineering, Pune, ISTE membership, 10 international and national conference papers.
Janardan S. Chitode
More than 18 years of Teaching experience in Bharati Vidyapeeth College of Engineering, University topper, Best M.E. and research guide.