"Hey, it's on that this is a pool of. So with you regarding the place cos so whenever you get a chance, please give me a call about all I would take a look at it next often, although maybe water, so give me a call back at 2(wrongly transcribed phonenumber with more than usual number of digits). Thank you"
This probably doesn't make sense to you. It doesn't make sense to me either and this was the output from my Google Voice account (by invite only, if you must know)- the transcribed voice mail. Oh where oh where do I begin to explain how very wrong this above message is in comparison to the actual voicemail? The transcription (though valiantly attempted) was less than 10% correct to the actual message. What is 'on that'? There is no 'pool of', no 'the place' regarding which I could expect a message, 'next often' is never seen together and what exactly is 'maybe water'?
Text-to-speech is a feature I have used in my eBook reader software for the purposes of laughing at toneless renderings of very high action or emotional passages in books, that provided me brief hours of merriment until my vacillating nature took over and I wanted other sources of irreverent humor for my personal amusement.
What passages, you ask? The proposal scene from Gone with the Wind was fun and the bombing chapters of Patriot Games (Jack Ryan rocks!). In fact I urge you to try anything at all as long as its in .lit eBook version. Oooh wait. Exception - the text to speech feature literally hara-kiried itself over The Fellowship of the Ring. That was not fun and in hind sight I should not have attempted to have elvish read out to me. L
Ah...Microsoft Sam, you are so cold and alien that you're forever associated in my nightmare visions of Skynet like rise of the machines...they could have just called you a 'Dalek' and not taken the pains to name you.
[irrelevant train of thought: Anyone else loved the Doctor Who scene where the Daleks face off with the Cybermen, shouting 'Exterminate' to their chants of 'Delete'? Common programming language syntaxes, get it? No? Never mind. A shout out to the TARDIS, the most funky looking space/time ship ever! Wooo hooo! J]
Anyone would prefer a homicidally logic driven yet human sounding HAL...or is it just me? Also...seems to me that ship's AIs that have female voices rarely try to kill of the human crew. If I am wrong, please feel free to quote the example. Say I am right (yaay me), then the philosophical question arises (ala sound made by a tree that falls with no witnesses in uninhabited forest) - Is it the gender of the voice of the hypothetical AI that determine its penchant for extermination of the humans?
Not only can’t systems speak like us, they don’t get our speech. Anyone who has struggled with a voice activated dialing in a supposedly hands free mobile or user of automated voice service in any customer service dept of telephone provider, bank etc (random institution/corporation) knows how frustrating and steadfastly unhelpful it is to deal with an entity that needs coding to 'listen & understand' your speech.
Familiar scenario - commands of 'Call Mom' [fervently repeated] results in 'Dialing Ron' (aka your boss who thinks you're hospitalized for the last week largely due to the email you had sent to him earlier) and ends in raged epithets that further urges the phone to Dial Tuck, Fitch and Lestrade. Or consider this - how very often do you come across people walking with their bluetooth headsets on screaming ' NO!' when a calm voice on the other side says 'You've selected to check-in 8 pieces of luggage. Please confirm by saying Yes or No' when all that the customer person wanted to do, was track schedule of the flight they were to take?
My current source of mirth is the speech to text or Automatic Transcribe feature and going by Voice, I would say it’s not very successful. Speech is very individual (like finger, toe and nose prints). That is why de-individualized people are often shown robot like in speech (not going to loop back to sci-fi references, I promise. Mainly because there are far too many for my exploding brain to rationally pick from). You can have a bunch of people that speak similarly but never exactly the same. Intonations, Accents and physical irregularities of the speaker can cause the same words to sound different. It would be very unsettling to have uniform speech because that would intend standardization of tone, verbiage and other parameters like speed, pitch and pauses which usually give us the depth/meaning to the actual content as harbingers of the non-verbal part of the communication. But I digress, the point central here is that current system cannot even correctly identify the verbiage of what is being said, let alone comprehend the meaning or information in the words.
This probably is why voice activated security systems are limited to very few words because any more and the speaker cannot render them the same way every time. Imagine being locked out of your own secure place by a system that doesn’t think your current repeat of Mark Anthony’s 'I've not come to praise Caesar' speech from Julius Caesar matches with the recording you made when you set up the security lock in the first place. Conversely, mimics can definitely say a few words to match a voice printed password...so this means of security is not good enough (yet).
One could argue that writing is as individual as well but it adapts better to the world dominated by the internet because it doesn't involve translation by a soulless entity (not talking about non-english languages here because that warrants a entire post) and increasingly more so because written language is shrinking rapidly due to the unchallenged invasion of pre-pubescents/teens on the internet and in mobile communications. They hate long drawn out sentences, grammar and any semblance to actual spelling. After all they are so very busy that it’s not reasonable for them to not brutally mutate the English language. BTW (and not withstanding acronyms) the youth have yet to corrupt the spoken language nearly as much. A teenaged relative may have her fbk status as 'waz siked bffs cud cum 2 da party!' but on the phone she verbalizes the content with the same sounds associated with the words - 'was','psyched', 'the', 'could', 'to' and 'come' [soft sigh of relief].
Let’s switch back from teens to machines because I would rather deal with Cylons any day than the erratic, hormone powered roller coasters, spawned by humans, in their intermediate growth states. Associative memory helps to interpret speech correct when we are just talking about words here (thus taking away non content parameters from current scenario or simplifying the scenario) in humans, so the best way to build the system to work that way would be neural networks with artificial intelligence programming constructs that 'learns' each word from all possible variations of how that word can be spoken. This database will be nearly infinite and will add to its rosters on a daily basis but the system itself will have to 'grow' to be able to transcribe and will still be susceptible to breaks.
You cant make a machine version of the human ear+audio processing of brain+memories/learning...but you can strive to make something close and the current stages of this feature are below even the most basic, infantile standards that can be set by the world's kindest judge (which I am not even close to by a infinitely long shot).
Piece of gyan related to non-human systems & human voices - Don't argue with the GPS Lady when you are driving-1. she is programmed to always be right & 2. You look like (and probably are) a crazy person.
We speak therefore we exist,
Rain
No comments:
Post a Comment