Text to Speech Server are ideal to be deployed on Cloud Computing Platform as high quality, natural voice needs quite good computing power on demand. It was good if we had an article on Text to Speech and Text to Speech Server itself. Because building a Text to Speech Server is not that easy and we will definitely want to remain with Open Source. So here is a reference article on Text to Speech, Text to Speech Server, the mechanism of human speech, mechanism of Text to Speech, deploying Text to Speech Server on Cloud and most importantly, the practical usage of Text to Speech Server. For this article, we will take The Rackspace Cloud as The Standard for Cloud Platform and API reference will be of Rackspace or Open Stack, content management software is WordPress.
Text to Speech : More of Cognitive Science than Simple Coding
Basically, most current Text to Speech software has never taken cognitive science in consideration because of lack of knowledge. The speech generation is fully ‘mechanical’ and depends on space, punctuation etc. The ideal Text to Speech Server software should read before the audio (or voice) generation, i.e. with a buffer, pass through filters from Brain Simulation Software and then produce them as natural voice.
Speech is very difficult thing than text, both encoding and decoding. I am saying about human brain. We write in English across the globe almost in the same way, but we speak, pronounce and understand quite differently. It is so much different that a person from UK, US and India can easily be distinguished.
The other creatures, usually the pet warm blooded animals and birds can understand human speech. If you make a odd sound or speech to indicate a street dog to go away, it will go away. Those creatures who can reproduce human speech, partially or fully understand the meaning. Understanding depends on training, avoidance of negative reinforcement to understand. The example of negative reinforcement employed model of speech production was (because legally this practice has been stopped in most advanced countries or rather sane countries) – talking parrot. The living model was like a tape recorder. Birds are quite efficient in speech production. The pet parakeets often orders other pet animals of the house, jokes with them verbally by making them fool.
It is well known that, dogs get sad when you tell your pet dog a sad thing. They quite nicely understand. Human, oddly has no such default understanding function. You will not understand French if you do not know French. You can forget about English, it is by pronunciation is the most complex language. Most sadly, human forgets to speak – in certain diseases and with aging. Basically these diseases like Alzheimer’s disease has probable problem with a gene that is associated with aging.
Now you have been satisfied that speaking is not that easy. Probably you are getting scared to speak.
Text to Speech Server : Importance and Applications
People with visual impairments uses Text to Speech software solutions to read text aloud. For Socially Blinds, a screen reader uses a software and get the text content and announces them. The particular interest is the use of software that creating MP3 files for creating simple podcasts or audio blogs. Experience has shown that the production of podcasts or audio blogs can be very time consuming and consumes quite huge computing power.
Text to Speech Server on Cloud Platform
Forget about smart ideal Text to Speech Server software, we are talking about existing softwares.
The above image shows the current most advanced Text to Speech Server software’s mechanism. Only this part can suck a huge computing resource. Now add the Text to Speech software which has taken cognitive science in consideration. That is the most ideal model. Consider a given text :
Anna – “I love you so much”.
John – “I do not love you ! You know…”
Anna – “I will kill you!”
With an usual Text to Speech Server software it will read quite pathetic. It will appear, Anna is a sort of criminal and taking out a gun. The reason is, there is non existent analytical power. The most advanced Text to Speech Server Software can make one sentence a bit feel ‘humane’. Like – I’ll killllllll youuu. The existing software (open source) that can work as a Text to Speech Server software bundle is :
You can, for example, with WordPress function can fetch the content, convert it to mp3 and podcast it. This is possible but, it is not possible to make it sound realistic.
The ideal Text to Speech Server on Cloud Platform should be able to :
- Have a separate database server build programatically as the data is huge
- Should pool the title and content as text and upload to cloud files cdn as they are static content
- The Text to Speech Server on Cloud Platform will analyze this text files and read to artificially understand the content and convert to own specific file type with metadata.
- After conversion to own specific file type with metadata, it should covert to mp3 / or ogg and upload to cloud files and get the streaming url.
- Text to Speech server client on WordPress would pull these streaming urls and add to proper posts.