Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Unreal Speech – Text-to-Speech API (unrealspeech.com)
91 points by jazz3020 on June 7, 2022 | hide | past | favorite | 152 comments



The AWS example is $4 per million char. The starter of this service is $7.92. And the cheapest option is $2 per million which is 2x cheaper, not 8x. Yes the AWS (and google) neural voices are $16 but then the front page voice sampled is not from the neural AWS Polly but standard (Matthew IIRC).

Shady, really shady. It's a shame, I would not mind a good competitor to Polly/Google.


This is incorrect. The AWS example is Polly Neural Matthew voice which costs $16, not $4. Neural Matthew: https://unreal-tts-live-demo.s3.us-west-1.amazonaws.com/aws/... Standard Matthew: https://unreal-tts-live-demo.s3.us-west-1.amazonaws.com/aws/...


As an AMZN stock holder, I love that this "competition" to AWS is hosted on AWS. Amazon still wins!


Well, not exactly! We have clusters on GCP actually, sorry to break the news. But we did train models on AWS.


Ah, ok. I saw the S3 URLs and assumed that's the platform you're using.


The stuff on GCP is a long story, which makes us an edge case. "Amazon still wins" is generally the case. Hold on to the stock.


Well anyway Amazon will win because they have the full source-code and training set for the model (the result is sounding great btw)


Intellectual property rights doesn’t exist here?


Only if you have better lawers than Amazon (lawops?)


Plus this service starts at $249/month for the first non-"evaluation" plan compared to AWS & GCP which are fully Pay as you go. That's a complete deal breaker for anyone who wants to use it for a side project.


Yes, that's true. We're currently not targeting side projects at the moment as we do not have infrastructure to do so. I hope do be able to support smaller developers (like myself) soon. Step-by-step!


Who are you targeting at the moment then?


I’d say businesses that already spend $$$ on a 3rd party TTS service and know how expensive it is or have content but haven’t pulled the trigger due to the prohibitive cost. I wrote this on the Product Hunt page: > “… read aloud apps (e.g. Pocket, Speechify, etc), UGC platforms (e.g. Medium, SubStack, etc), publications (e.g. Bloomberg, NY Times, etc), and e-learning platforms (e.g. Duolingo, Pearson, etc). We’re also interested in providing discounts to non-profits (e.g. Wikipedia, etc)”


I'm going to argue yours is not better. I preferred the AWS example, which was no doubt cherry picked, over all of your voices. Especially during the transition between sentences there is an audible glitch in the audio, whereas AWS transitioned smoothly.


I'm gonna agree with you.

I get lots of weird random artifacts in Unreal. Sometimes it's a weird "vibrato" on some words, some words are unrealistically raspy. Plus it's a bit too sibilant in general compared to AWS. Clicking the "Redo" button fixes it but then other artifacts crop up in other places, which leads me to believe this is completely fixable.

And what do you mean by cherry-picked? I don't think the Amazon example is cherry picked, you can type your own custom text there! They're just using the AWS API themselves. Maybe I misunderstood you.

Anyway, it's a bit unfair because Amazon probably spent millions in their product, but I wouldn't exactly call it better...


Sure, that's why I put "arguably" as I honestly know that AWS is really good. But I did pick the best AWS voice which is the most popular one amongst many read aloud apps, though! Anyways, I'll try to implement speaker selection for AWS so that the comparison is more fair. I barely finished our multi-speaker feature before launching on PH.


You should probably edit the title - some amount of puffery is reasonable when showcasing one's own work but if you're so wide of the mark, the only feedback you'll get is about how your representations are inaccurate. Which is what's happening.


The title has been edited (I think by a yc moderator) which I think is a good move.


I think both are pretty good, but to my ears the UnrealSpeech ones all have very sharp, grating "s" sounds. The AWS voice is much smoother in this regard. Is this something that can be configured to a degree or would you have to post-process it with an de-esser? Because I can't imagine listing to those voices for anything longer than the example text.


Yup, trying to fix this


I like the character of the voices, especially Male B. Sounds a little like Brokaw, maybe.

But yeah sometimes I'd have to regenerate before I got non-glitchy audio. Still though, very cool.


Just copy-pasted a piece from the other HN tread (https://news.ycombinator.com/item?id=31630193) and it's not even close to AWS:

https://unrealspeech.com/k6g8a


Especially considering other languages, AWS is pretty good at both Spain and latin american Spanish.


The unreal guy sounds really nervous to me. Or like constipated or something. His random energy across sentences makes him seem uncomfortable.


The demo and copy are great. Not everyone will agree it is "better" because that's subjective but your website does a good job of telling people it is better and the demo is good enough that some people will accept what they see/read/hear. :clap:

Your heading makes it seem like you want the direct comparison to AWS polly so maybe add a table to directly compare different aspects of your product vs aws that make it better. Sound quality is just one attribute to compare. What about SDKs, limits, code samples, use cases, more nitty gritty sound comparison details, etc.

Pricing should also be more transparent if you want to compare aws to yours - what maths did you do to get 8x cheaper because at first glance that is misleading.


I greatly appreciate the feedback!

I agree on the table idea to more candidly and clearly compare other aspects. This launch/experiment was that the quality/cost would be the main factors, and we’d go from there, iterating and customizing it to work for early customers.

The 8x math is per 1M characters. I do see that since we’re charging a subscription, it may not be a fair comparison. But the minimum commitment is is so small that I thought it wouldn’t matter for the customers I’m targeting right now. I do think it can be misleading because people might expect pay-as-you-go.

We aren’t able to provide pay-as-you-go right now, so I’ll look into updating the copy or how we communicate the subscription model!


I definitely recognized the voice of real people alive today in your example set. I assume it's some kind of a trained ai you feed hours of content to for them to determine the speech pattern, question is, are you paying royalties to those individuals for their contributions?


I'm not. I'll try to reach out and figure out a license of some sort. I suspect the royalties we could pay out is probably negligible. I think it might be safest for me to get rid of the "recognizable" voices and simply create new synthetic voices that are not of real people.


I'm pretty sure tech like this is similar to self-driving tech like 6-10 years ago in the sense that there are no laws addressing it. Like no one wrote a law saying "a driver must be in the driver's seat of a car" ahead of time. Youtube has already reinstated a Jay-Z audio deepfake that was originally taken down.

https://www.theverge.com/2020/4/28/21240488/jay-z-deepfakes-...

has more details


I agree, I think entirely synthetic voices will be the way a lot of services like this can operate in the future. Unfortunately I haven't seen much research in this area. Guess it's outside the typical "take a dataset and optimize the hell out of it" realm of a lot of ML research since the synthetic voice will not exist in any dataset ahead of time.

Been thinking a lot about how to accomplish this myself for a similar product I'm building, glad to hear someone else is thinking about it too!


Would it be possible to mix multiple people/voices in a training set? Or does that confuse the AI? Could be interesting to create a real but kind of not real model. If that makes sense…


Plenty of the internet lawyers came out to rabidly defend the right of Github to pirate data to feed into Copilot, so I wouldn't be that worried about IP. I would be more worried about picking the wrong voices, such as those with strong political connotations.


That's a great insight. I'll look into the Github/Copilot more. I can't code without Copilot anymore and that piques my interest. But yeah, we def need politically neutral voices.


on the flip side, I found the "professor" voice endearing. I'm not necessarily a fan (although not a hater either), but I thought it was:

A. Impressive display of capability

B. A very clever choice as it's recognizable but not as universal as say Obama's voice or Joe Rogan's would be

C. Brilliant marketing

I probably wouldn't offer it as a "real" voice for use in bulk through the API due to the legal concerns, but on the marketing page it's really cool and I would hang on to it. Plus if you get sued it would be great publicity :-)


Haha, I really appreciate this feedback. Everything I wanted to hear.

Frankly, trying to fight against the Goliath, with 0 marketing budget, and I’m desperately hoping to create noise. Breaking a rule or two is something AWS can’t do at their scale.

On the bulk offering, I do have a clear path forward. In short, I can create new synthetic voices. It’s like those this-person-does-not-exist images but for voice. “unreal speech”


This is a typical brute behavior. It's not about breaking a rule or two for the sake of success, it's about doing it while riding on people's backs. You think that stealing someone's voice is ok for a small startup company because you don't have the budget to know better while this is not the issue at hand. You obviously know this is wrong and are trying to get away with it riding on people's good intentions.

This is blunt and clear immoral behavior and you are fully aware of it. You just think that you have the right to reach success not by your own skills, but by stepping on other people, and until they complain you'll keep doing it.

PS: Even if Amazon does the same thing it's no excuse for you. Look up tu quoque fallacy.


Alright, Mr. Noble. You're either naive, narrow-minded, unable to see the big picture, or all above the above.

Bringing others down will never be an antidote to not building or achieving anything in your life. Your time will be better spent if you focus on bringing yourself up and others around you.

I think you want to get a rise out of someone over the Internet, with your identity hidden, to get attention. Hence, this will be my last comment-- but if you want to turn it around, I'd be down to help.


Hilarious. Get off your high horse and stop ripping off other people's voices.


The "Professor" voice is very clearly Jordan Peterson, which is likely what you're getting at.


That name might cause you some grief, as you're shortening it to "[logo] UNREAL".

I personally think it's fine, but I'd expect some people finding it searching for speech synthesis for the unreal engine and then getting outraged about getting "tricked".

Not sure what to do about it though. It just jumped to the top of my mind while looking at the landing page.


Got it, I appreciate the feedback. I didn't really think about people searching for speech synthesis for unreal engine. I'll try to address this somehow!


outrage seems a bit silly but my first reaction was that its a plugin for unreal as well.


Tried a few samples, but in all of them, I preferred the AWS versions sound/vocalization/tempo


Yeah, ours is obviously an early version developed with far less resources in less time. But I'm hoping that the 6-8x lower price point is a selling point (considering the quality is relatively comparable, and it'll get better)


How long before we get something like this quality (SOTA?) at the prices your offering ?

https://speechresearch.github.io/naturalspeech/


The only sample I found arguably better than AWS was Female B. The others were too close to an uncanny valley (Professor, Entrepreneur), or the cadence was all over the place (Male A and Male B).

I spend a lot of time listening to AWS's text-to-speech. It can be distracting at times. But with Unreal Speech's text-to-speech, I'd lose focus incredibly quickly and focus on the issues with cadence or general weirdness (Professor's pitch change is too abrupt, and weirdly gets caught on the word used which throws off the cadence of natural speech).


Honestly, I'm super happy that you found one voice arguable better than AWS.

I might argue, though, you might get used to our voices quickly once you start using it frequently. I've had this feedback before from someone who was very used to AWS's monotonous voice, and he actually changed his opinion after listening to a couple of articles. Previously, I built https://audioread.com which got me to talk to a bunch of users.


I think the random generator might be broken.... It spit out some offensive stuff that maybe you don't want on a business site, though it did give me a laugh lol...

> What the fk did you just fking say about me, you little btch? I'll have you know I graduated top of my class in the Navy Seals, and I've been involved in numerous secret raids on Al-Qaeda, and I have over 300 confirmed kills.


dammit to hell (in a disappointed but congratulatory voice), I'm building basically the exact same product as audioread.com right now. I thought it was a great idea and was shocked I couldn't find any implementations yet :-(

kudos to you on having a wonderful idea :-D Guess I'll move on to my next idea


Yo! Let’s at least chat and brainstorm about it. Would you want to ping me on twitter? @automationism


I'm not a big fan of NLP, personally, as I think its kind of creepy how easily it is to deepfake a voice these days given a very moderate amount of content that can be scraped online: but I like the 'disrupt the giant' attitude you have.

> Yo! Let’s at least chat and brainstorm about it. Would you want to ping me on twitter? @automationism

But honestly, THIS is why I keep coming back to HN, rather than take an adversarial route and needlessly bicker entrepreneurs should be collaborating and utilizing each others strengths.

Well done, I know want to see your progress as your product matures! Who knows, I (a AI/ML student) might have a need for your services yet.


For every voice sample that Unreal offers on the front page comparison, I found that the Unreal voice mashed up words (or tried to speak syllables too quickly), most noticeably on "synthesis." I did notice some more natural-like pauses and cadence, but the clarity of speech wasn't as good as the AWS example. For anything I could think that I would want AI-narrated content, I would prefer clarity over natural sounding nuances.


This is fantastic feedback! The "mashed up words" are something I'm trying to fix. And I totally agree that the AWS voice is much more clear. Maybe we can get to clear & natural before AWS?


The "Professor" voice sounds like Jordan Peterson. Is this related to https://notjordanpeterson.com? That one was taken down: https://www.vice.com/en/article/43kwgb/not-jordan-peterson-v...


No, this is not related to the site you mentioned, but it seems like a bad sign.


> Text-to-Speech API Better & 8x Cheaper than AWS

Having "Better" on the title of your landing page probably isn't the best idea. This is such a subjective territory that it's very difficult to say that one voice sounds better than another unless you have a blind poll with at least a few hundred people claiming they preferred your product over AWS.

It doesn't matter if you can argue its better, it only matters if you can show that the vast majority of your target demographic prefers your product over AWS, which I couldn't find any evidence of.


I respectfully disagree. I think it's great to have on the landing page. "Better" is a pretty clearly subjective term so I immediately know it's somebody's opinion. The fact that you can directly compare them also makes it so you can decide for yourself within seconds.

The "8x cheaper than AWS" feels deceptive though. The pricing is not apples to apples so it's only 8x cheaper at the most favorable point in the graph (when spending $1,000 a month and using exactly the number of characters offered). For me that's outrageously more expensive than AWS. It's more like 8x more expensive than AWS rather than 8x cheaper, which is such a dramatic swing I wondered if I was misreading the pricing somehow. My usage on Polly last month was about $75 but the month before was $5 and this month will be closer to $20.

It's a real shame because after hearing the output I was ready to move everything over. I should have looked at pricing before getting excited, but I took the 8x claim at face value.

I use a lot of AWS Polly neural and have listened to a hundred or so hours of Polly output (building an MVP/prototype). The cost of AWS is high (for me as an independent dev) but I've tried several other services and none have been as good at making natural speech (the kind one could tolerate for an audiobook for example).

If this was cheaper or if I could buy a small time license and run it on my own hardware (which takes the heavy costs away from Unreal) I would totally do that. Alternatively, I'd be willing to pay close to AWS pricing for a pay as you go, with a one-at-a-time rate-limit in place (to avoid the scaling/provisioning challenges on the ops side). I know it's not likely to happen, but just wanted to throw it out there.


Is this from Unreal as in Unreal Engine?


Nope, I’m sorry for the confusion. We did use title-cased letters in our logos to reduce similarity (their logo is all-cap: UNREAL ENGINE)


Interesting. I think AWS wins hands down for the prompts:

> Sarah Connor?

and

> Come with me if you want to live.

But the professor edges out on top for:

> Never harm a human or through inaction allow a human to come to harm.


Right off the bat, the third sentence looks much more like what my data looks like, and the kind of text I wanted people to use it for. Where is the line from?


I'm guessing it's from Asimov's Three Laws of Robotics.


Yes, it's a (paraphrase from memory) of the first law of robotics. It's good that that's more in line with how the ai should be used, I suppose ;)

Ed: according to Wikipedia I got it pretty close:

> First Law

> A robot may not injure a human being or, through inaction, allow a human being to come to harm.

> Second Law

> A robot must obey the orders given it by human beings except where such orders would conflict with the First Law.

> Third Law

> A robot must protect its own existence as long as such protection does not conflict with the First or Second Law.

https://en.m.wikipedia.org/wiki/Three_Laws_of_Robotics

Just remember, Asimov himself pointed out through his works that the three laws are not enough! :)


The AWS example is much better, hands down. All the voices in Unreal have lots of artifacts, and the diction is subtly but noticeably stilted.


Yeah, 100% working on the artifacts. I also agree on the stiltedness, not sure how to go about fixing it yet.


It seems like a very bad idea to use two famous public speakers' voices as training data when you clearly do not have permission to do so (no credit to be found, you simply renamed Jordan Peterson as "Professor"). It's hard to imagine that working out well as you try to sell API access. Seems like a legal nightmare waiting to happen.


Not directly related to this API, but I’ve begun wondering if the next revolution in speech synthesis is to integrate with a natural language model like gpt-3 in order to gain semantic awareness, and use that context to produce emotional expressiveness and inflection that is attuned to the meaning and tone of the text.


Imho, the next revolution in speech synthesis will come from using guided diffusion models, leveraging the recent breakthrough in image synthesis (Dall-e), to generate spectrograms (spectrograms are images).

Using this slower generative approach it will allow to produce large high-quality enriched audio datasets with parametric text, timings and emotion.

Then you use these datasets to bootstrap in a supervised fashion the existing traditional architectures to make the generation faster.

The usual problem of text-to-speech is that you have to go from a low-information space (aka text) to a high-information space (aka sound). And therefore training is ill-defined because one input text can have several correct sound. But once you have an enriched text with inflections and parameters, speaker embedding, the mapping then become one enriched text to one exact audio and the training become well-defined and easy.


Everyone's working on emotional expressiveness right now. Many researches published left & right!


Cool - I probably shouldn’t be surprised that the smart minds in machine learning and natural language processing are well ahead of me, an interested lay person!

Any good links or HN comment threads that you’d recommend?


It seems to get this wrong:

"$99/mo (non-commercial use) too rich for my blood."


Hey, the developer Eric here. Thanks for checking out this super early version of the project. How can I make it right?


Put that in the demo and listen to it.

https://unrealspeech.com/Ry4K_


Oh, got it. You mean it doesn't pronounce "$99/mo" correctly? Yeah, there are quite a few cases like this where we don't translate symbols, abbreviations, etc quite right, yet. But unfortunately, in this case, I spelled out "$99 per month" but it still sounds awkward.


I could be wrong but I feel like you've missed the sarcasm....


There was a technical issue with the how it sounded but, yes the underlining point was you need some kind of pay as you go model or something. if you can't offer a free tier.

Don't get me wrong it's pretty cool tech if you trained your own model.

I'm also having a hard time coming up a use case. Outside of spam and games I'm having a hard time coming up with generative AI is useful at all anyway.


The more real it sounds, the more dangerous it can be. Personally, I want to know I'm talking to a computer the second it "opens it's mouth".


Title seems disingenuous. AWS is pay-as-you-go, and this, apparently, only offers subscriptions.

So one would need to consume the maximum every month, for several months, for the price comparison to be true.


The service is for businesses. For them, tens of millions of characters per month is not much at all. And they'd integrate the API and provide services to end-users/consumers. I can see how one could find it disingenuous, though, although I don't 100% agree.


I have to say that this is the most insane thing I've seen on the internet this week.

And I'm not talking about the quality of your product. I mean to use Jordan Peterson without consent or reference is not bold as other are saying, but is verging on criminal.

How can you even calmly post this on HN?


I imagine they used him as a majority in training because so many of his lectures and samples are available for free...

But that "professor" voice is extremely recognizable as him.

The entrepreneur voice is familar as well but I can't place it. It might be a better blend of more people.


Sounds like Gary Vee - I had similar thoughts, seems a bit sketchy and no reply from OP makes it seem like they know they are probably in the wrong.


They did.

Found this from comments on ProductHunt

> thanks for a great question. Candidly speaking, I’d say we’re taking a “move fast” and “seek forgiveness later” approach. The plan is a) to try to apologize and get a license if there are demands and if that doesn’t work b) create new synthetic voices that are not of real people.

This infuriates me to no end. People with that kind of mentality fuck up the entire startup ecosystem for everyone.


Apparently the creator has been down this road before and got told off. Apparently a firm "stop using people's voices without permission" was not sufficient.

So that makes this intentional and the excuse of "I didn't know any better" no longer plays.


Same thing. I immediately recognized Gary Vee. And I imagine that using audio generated that sounds exactly like him without his permission is gray area at best.


Imitation is the sincerest form of flattery.


Identity theft is not flattery


Are voices really protected? Could you cite the statute?

I really don't see the outrage. If I do an impression of Jordan Peterson am I also in violation of a social code?


You're intentionally missing the point. This is wrong and you know it. Impressions are different than building a model that can say anything using someone's voice, especially if they're recognizable and/or relative public figures.

It's like someone selling stolen credit cards and then claiming they're doing nothing wrong because people who are buying them are the ones committing the crimes.


Cherry picking one of the AWS voices is a bit fishy to say the least and Azure is running away with the quality of their voices anyway.


You were not kidding! Azure is extremely impressive. Try the demo here: https://azure.microsoft.com/en-us/services/cognitive-service...


That's insane. It also has SSML and voice types (angry sad etc). This is hands down the winner for me.


Holy cow. That's outrageously good. The female voices are way better than Unreal/AWS


Hmm, I've been in the space for a bit, and I think it's not unsafe to say I picked the best voice AWS provides. I could've implemented a multi-speaker feature for AWS, but I just didn't get a chance. I did try IBM, but it sounds worse than AWS?


Azure is Microsoft.


Oh, I meant to say Azure. Not sure why I typed IBM.


If you build a product no matter what you have to be honest to yourself and imho most of the neural voices from azure sound better than your example. They may miss some of the tempre of your voices but the tempre comes from the examples you fed it... tbh it's not much better than doing it yourself with something like https://github.com/neonbjb/tortoise-tts


Well, sure, I mean MS is a 2T company with 180K employees. So I wouldn't be too surprised if theirs sounds better than mine. The tortoise tts repo seems pretty random though. Are you trying to promote something of your own or something? haha


No I am telling you that your implementation is subpar even to open source once that need only few shot training.


The first syllable is always trimmed off for me - is this like a "free tier" restriction?


No it’s not. I think it’s how the mp3 is somehow played in the browser maybe. It happens to me too.


While unrealspeech voices sounded more human, I found the AWS voice is easier to understand.


Yeah, another person mentioned this as well, and I get it. Will definitely work to improve on that.


That the "professor" voice sounds like Jordan Peterson potentially unintentionally elevates this into the realm of cringe art.


Azure is the best in industry, also you only have English? What about SSML?


English only, no SSML. This was probably the least minimal MVP I've built so far, though. Gotta still iterate. SSML is prob more important than non-English for now.


Heads up, your "random" can get pretty vulgar, LMAO!: https://unrealspeech.com/GG15L

I was having fun making Jordan Peterson say silly things when I clicked random. First Russian came up and failed, then x-rated sir-mix-a-lot struck!


Yeah it looks like they're including user's inputs in the random list. I've seen some pretty vulgar ones


Using Jordan Peterson as your example professor, and the default demo on your landing page, is a bold choice lol.


Professor = Jordan Peterson

Entrepreneur = Gary Vaynerchuk

Honestly not a good look.

Otherwise cool demo!


I actually like the Gary V one. I think using voices that people are very familiar with is helpful for showing the ability of the product. It allows me to A-B the voice with how I know Gary's voice is in my head.


Thank you. What do you suggest I do? Remove the professor and entrepreneur voices?


Just pick somebody completely neutral with no political slant whatsoever. There will unfortunately always be people - on each side - who are rather sensitive.


It looks like you haven't been following the guy on Twitter.


I'm not sure. But it sure did take me off guard. Perhaps Attenborough?


Professor = Jordan Peterson

A very good look for many, many people, so I'll assume you are talking about the other guy who I've never heard of.


Gary Vaynerchuk (GaryVee) wrote Crush It and is pretty synonomous with hustle culture. He loves the grind and promotes that as the way to move up in the world, if people so choose.

I guess people get upset when someone on the internet (that they could easily ignore, just as you have) tells them to work harder.

He's also loves promoting arbitrage plays... so I kind of blame him for the insane secondhand markets on a ton of normal stuff. Everyone is buying up all the stock and trying to flip it for a profit so they can be like GaryVee at garage sales. This does bother me. I went to go buy a new pair of shoes I bought 3 years ago for $90, but no one has them. I looked on eBay and they are going for $600. That's madness. It's not just Gary's fault though, it's the companies that opt for "drops" and hype over scale and actually meeting demand. But that's a whole differnet rabbit hole.

Both people have a lot of fans, but also get a lot of hate from a particular demogrpahic. I assume the majority is indiffernet.


I don't think the point was that the voices are controversial (that can be debated) but rather the questionable legality of using essentially a deep fake of a celebrity's voice and selling it.


I was talking about both really.


I was hoping to like this but the pricing is a total deal-breaker for me.

Personally, AWS' and Google's pay-as-you-go plan win every time regardless of the "better" and "cheaper" claims IMO. You must introduce pay-as-you-go plans if you really want some good traction. I really like the Entrepreneur voice (the professor sounds like Jordan Peterson)

On more thing, I think in the comparison you have sampled the AWS non-neural voice while using the price of neural voice for the price comparison - this does not sound like a fair comparison to me (but please correct me if I'm wrong and I'll edit my comment).


I agree and I'm saddened that it's a deal breaker for you. Unfortunately, we do not have infrastructure to offer pay-as-you-go (actually pretty challenging ops). We're forced to target a smaller niche today and expand in the future (i.e. offer pay-as-you-go).

But for those who are already spending, let's say, $250+ a month on TTS, this is a sweet deal. They are my initial target customers.

We're sampling the Neural AWS Polly Matthew voice. FYI: Neural Matthew ($16/1M chars): https://unreal-tts-live-demo.s3.us-west-1.amazonaws.com/aws/... Standard Matthew ($4/1M chars): https://unreal-tts-live-demo.s3.us-west-1.amazonaws.com/aws/...

Well, our $250/mo is at $2.6 per 1M chars which is even cheaper than Standard Matthew, and I think ours clearly sounds better?


The Professor sounds like Jordan Peterson, is that on purpose?


It is only available in English ?


Currently, yes.


Non-American accents too please as an option for future English voices.


Which ones do you have in mind?


English though I guess each English speaking nation will obviously have their local preferences.


I need a Gary Busey voice.


How is using Jordan Petersons voice legal?


How is it illegal?


Someone's voice seems to me like unique artwork. Especially when its obvious who the speaker is.

Its like you have a Tupac hologram advertising cheese sicks on TV, no permission, you use his voice and body, but he is alive.

You are using someone's 'likeness' for profit.

Voice might be under protected biometric data as well.


It seems more straightforward to me in cases of imagery, where there is less room for ambiguity. How different does a voice have to sound to not "be" another person's voice? I believe the laws on the books today are intended to deal explicitly with the actual person's voice recorded by some microphone. I think once you move into the realm of "this is a voice, generated from a model trained on audio samples of some person's voice" then it becomes unclear whether existing laws apply. If you further add into the equation that some models are trained on several speakers then it gets even muddier I think. If you advertise "this is person X" then I think it becomes problematic, but for different reasons, since at that point you're using the persons' name to advertise your product.

IDK what you mean about biometric data being protected. I'm pretty sure there's no law stopping me from pulling your fingerprint off of a coffee mug you left at Starbucks, creating a high resolution scan of it, and posting it at godmode2019-fingerprint.com

EDIT: after some quick Googling it looks like there are some biometric privacy laws on the books in certain U.S. states that would prevent something like godmode2019-fingerprint.com, but it does not appear to be comprehensive across the US. Not sure about other countries.


All I know is I would be pretty annoyed if a company made a clone of my voice and made me say things for money.


Yeah, that I totally get. I just think tech like this falls squarely into the gray zone for existing laws.


Cheaper is good, but the quality is much worse and your first demo is very obviously Jordan Peterson, who has a famously annoying voice and some would say personality.


Hmm, when I demoed around, people pretty much unanimously picked the voice as the best-sounding one.


Jordan Peterson was right to be afraid of compelled speech:

https://unrealspeech.com/HyhoR


Is the professor meant to sound like Jordan Peterson?


Your synthetic voice hears like Jordan Peterson.


EDIT: OP clarified the Random feature uses previous submitted inputs. So, this was unintentional. It's just unlucky I hit that particular sample lol

The voices sound fine (some are worse than AWS, but some are indeed better). However, as a queer person, putting in sample texts like these[1] kinda put me off from your product no matter how good they are. IMO, that's absolutely uncalled for.

On a professional note, it's very immature( and silly even?) to use a product page to voice hostile idiosyncratic political opinions in general (regardless of whether I agree with them or not).

You're of course entitled to your opinions, and welcome to market your product however you want, though; I'm not trying to encroach on that.

[1] https://imgur.com/a/9zyZxtT


Hey, thanks for the comment. The "sample" must've been pulled from the "random" feature. It basically shows texts other people have tried. This feature might have been a bad idea. I'm sorry if it offended you in any way. Not sure if I have ability to moderate the content. Do you think it's best if remove the "random" feature altogether?


This feature might have been a bad idea

You need to make it pull from Wikipedia or some other semi-moderated source, else your Random button is going to turn into Microsoft's Tay real quick. I also notice your Professor voice is pretty clearly Jordan Peterson, which some people may have a problem with.


I’d definitely take out the random selection of previous inputs… have a set list of random quotes you have setup yourself.

Also - if you are storing previous inputs (for any purpose) I’d let people know upfront!


Ah yes, it was from the Random feature! Sorry if I jumped to conclusions.

I would maybe have a list of paragraphs from a few select public domain works in there, instead of using unfiltered user input.


This feature is definitely a bad idea. People have been putting stupid stuff into text-to-speech since the late 90s with iMacs G3.

If you're looking to position yourself as an inclusive company, don't regurgitate text put in by previous users. Because idiots on the internet will idiot. And that idiocy now is cosigned with your company name and logo.


> People have been putting stupid stuff into text-to-speech since the late 90s with iMacs G3.

People have been putting stupid stuff into text-to-speech since the early 80s after it was popularised by the SP0256 chip.


If you're trying to sell this product and showing text that random people from the internet have inputted, then yes this is a terrible idea. It's only a matter of time before "Hitler was right" is displayed on your site. Do you want the trollings of dorks associated with your brand?


You should remove it if you can't control it.


Can someone explain what the hell "Vichy" means in that context, my own brain and search engines seem to immediately jump to Vichy France but that's just nonsesnsicial unless there's a really contrived analogy.


I'm a homosexual person that doesn't give a flip about pride month or anything that's "gay culture", but that's overtly hostile. That person needs mental help.


How did you get that? It sorta echoes with their choice as Peterson for the professor voice?

EDIT: they have responded it was text people tried. sorry for the link with Peterson in that case.


I also received a terrible "random" input. I'd remove the feature.

https://imgur.com/a/xVxdUF9




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: