Hacker News new | past | comments | ask | show | jobs | submit login
Yahoo, Bucking Industry, Scans Emails for Data to Sell Advertisers (wsj.com)
239 points by hemancuso on Aug 28, 2018 | hide | past | favorite | 109 comments



From the article - Rob Griffin, chief technology officer of digital-ad consultancy Tovo Labs, “is how do you monetize it without the icky factor?”

Uhhhhh how about don't read emails? If I have a fantasy football team, please sell me football stuff. If I click on an article relating to health concerns, sell me organic whole foods stuff.

Can you imagine someone 20 years ago throwing all your mail into a scanner and saying, "don't worry I'm only checking for keywords and receipts"?


This is the thing I find really weird about online advertising. Every other medium has ads based on the content or demographic that most likely reads that content. Those ads are always contextually relevant, sufficiently diverse (rarely will you see the same ad repeated everywhere) and few people find them invasive. Online, however, ads are tailored to you as a person. They chase you around the web, even for products you already decided you didn’t want, and are always vaguely creepy. Why is it that way? Is it really more profitable, or is personalized advertising google’s long con?


I worked on a team that did both direct-sold and programmatic ads for a digital publisher and direct-sold was way more profitable. Less headache, too. I'm in the "long con" camp.

It seems to me that programmatic ads are popular mainly because it's easier to measure engagement, not because it's more profitable for publishers, then at some point Google and Facebook got the advantage of network effects.


even for products you already decided you didn’t want

Or even more weird, for products you already bought and are unlikely to buy again.


That's the biggest downside. Showing me ads for products that I have already made a decision about.

I don't fully understand why they continued after seeing clickthrough numbers go down across the board. I guess it's easier to sell the concept; only show this ad to people who have already seen this product but they are usually late with the message. If I'm looking at something today chances are by next week/month I have already decided to buy or move on. How many engagement rings can one buy?


Once I was researching the specs for an item I was selling on EBay.

An ad for it followed me for weeks.

I didn’t have one too few, I had one too many.

I was the worst possible customer for that ad.


After buying a car I was getting actual mail ads to buy another car.


I wish there was a button on Google ads where I could just tell them, “Thanks - I already bought a car... no need to show me more car ads for a while”


I agree. In fact, I would go one better: Give me a button to click when I am actually shopping and interested in ads. I mentioned this before and someone said that Google had a store, but apparently it's pretty risky to try to buy stuff there.

I also thought this is something Duck Duck Go could implement to distinguish itself. It will result in a better experience for advertisers as clicks would be pre-qualified. It's the counter-move to Google's "quality score" which, when you think about it (or actually do advertising), is anti-pre-qualification.

What I mean is this: If people are just looking for information (free) and not wanting to buy, then if I put something in my ad that discourages people from clicking (by putting in a price, for example), then I get less clicks-per-display and my quality score goes down. However, only displaying these ads to people who are looking to buy would negate that.


Did you know that most Google ads have a button you can click in order to tell Google that you don't want to see the ad again. Even advertising videos on YouTube have this option. You can tell them when an advert is not relevant or you've seen it too many times. Oddly Google doesn't publicise this much even though its probably a feature which improves their profitability.


Do you realize that is tangential to the discussion? They shouldn't be invading privacy in the first place!

And what do you mean not publicizing it? It is quite visible on the ads they show it for.


> Why is it that way?

If the other methods could target you technically, they would


> Is it really more profitable, or is personalized advertising google’s long con?

Yes, it helps. Contextual ads help me find new customers to market to, hopefully through relevant content that overlaps with my product, but what do I do with all those visitors that stop by and don't convert? Retargeting lets me capture that interest and incrementally improve my media spend - and on tens of millions a year in media spend, this can be significant.

I wish I had better alternative but unfortunately these are the tools we have to work with, and creepy ads that follow you around the internet are going to last until we figure out better ways of spending investor money without feeling like we're burning cash to the Advertising Gods.


I guess it comes down to having less visitors click on an ad means fewer dollars spent. The few who do click are closer to being customers compared to contexual clicks which get more interest but more the browsing type.

It feels like we are wasting so much page space with ineffective ads and too few clickthroughs.


Which is why targeted advertising became so popular because it increases the chances that you will see an ad that is relevant to you.


I cannot recall an instance of ever discovering something I wanted, through any advertising, except in these exceptions:

- Game recommendations on Steam and other platforms.

- Highlighted apps on the App Store.

- Before the internet, games and apps in magazines.

Everything else that I have ever spent money on, I have discovered through word of mouth (including places like Reddit or HN), searched for it myself, seen it incidentally in the wild (e.g. hearing a song outside or seeing someone using something) or when browsing shops and stores in person.

How often do other people here remember buying something they saw in an ad (that they didn't already know of)?


If you include Amazon book recommendations, I have to confess I have found those to be eerily accurate. Now I buy almost all my books through work, so I don't actually purchase from amazon much, but I still get useful recommendations by adding titles to my shopping cart that I end up buying through another vendor.

Regarding ads on the internet, I really see very few anymore. The ad-blocking and privacy options for browsers seem to be exceedingly good at their job, and I automatically switch to reader view for any page that enables it. These days I am surprised to see an ad online.


> rarely will you see the same ad repeated everywhere

I rarely ever watch cable but I swear cable ads would repeat the same exact video ad multiple times per show but maybe it has changed nowadays.

> Why is it that way? Is it really more profitable,

Probably depends on the product. For google search where your search query is generally highly relevant to what you are looking for and therefore advertisers, probably not much. For facebook when I am scrolling through the news feed and I'm not looking for any product in particular, probably a lot because that is the difference between showing me a generic ad with low click through rates versus one that is tailored towards things I probably want and therefore much more likely that I will click on it.


Companies seem willing to pay more for more targeted advertising and if other mediums could switch over to it they would (the issue is that something like print can't). I'm not sure what other motivation they would have besides profit incentive for it.


> Is it really more profitable

It is. Online, you can use statistics and probability theory to actually test what's more profitable instead of relying on ad-hoc metrics about what "feels good".

> They chase you around the web, even for products you already decided you didn’t want

These are a small minority of online ads. They're displayed in places where nothing more profitable or contextual can be displayed. (I.e., most websites, since most websites are junk.)


The only ads that are ever actually relevant to me are the ones that are there because they're relevant to the page/app content.

If I'm looking at x, selling me something that makes x easier or otherwise compliments x makes a lot more sense than selling me something that I looked at 2 months ago and didn't intend to buy.


I searched Wayfair for nightstands once. They're seven months deep into a campaign to close that sale by any means necessary. I'm half expecting to find a salesperson inside my home someday.


> Can you imagine someone 20 years ago throwing all your mail into a scanner and saying, "don't worry I'm only checking for keywords and receipts"?

Yes, and actually Acxiom is a lot more powerful and comprehensive than your postulation, if you care to look into them.

Unregulated data mining is, in my opinion, going to lead to the next collapse of culture. The cow is already out of the barn and it's on the way to Chicago. I don't think it's an unfair analogy considering the experiments thus far in totalitarian environments.


I got my report from them, it's a mix of disturbingly detailed and laughably inaccurate.


Yes? Gmail was promoted on the grounds of "hey it's just ads based on automated scans of the emails, no human actually looks at any of it" and that was a big part of people accepting the tradeoff.

Kind of refutes the claim that it's just like showing specific emails with date/times to the general public.


Grocery stores were using loyalty cards to track purchases and profile shoppers 20 years ago weren't they?


USPS Mail Isolation Control and Tracking program, which was put into place following the post-9/11 anthrax mailings.

https://consumerist.com/2013/07/03/forget-the-nsas-hi-tech-s...


This is a big business for hedge funds too, not just advertisers. Every Uber receipt, every Amazon purchase, every receipt that hits your email inbox. Aggregated and tracked at the product SKU level. That is pure gold for investment research. Few sources of data can match what email inboxes provide.

E.g. Superfly Insights [1], Return Path [2]

[1] https://alternativedata.org/data_provider/superfly-insights/

[2] https://returnpath.com/downloads/know-email-data/


This is why Amazon no longer lists all of your purchases in your email receipt. It just says "your order of XXX and N more items".


Huh. I guess Facebook’s move away from including someone’s update or message in their notification emails had to do with more than getting you to go to their site...


Facebook still does notification emails?

I've been filtering them straight into the trash for the best part of a decade. It seems so backwards to send them when most people check their Facebook more often than their email.


> I've been filtering them straight into the trash for the best part of a decade.

Why not just turn them off? I haven't received a single notification email from then in years


Maybe I have, I'm not sure. Whatever I did to stop my inbox being flooded with emails from Facebook, I did almost a decade ago and I haven't touched it since.


Probably filtering them. Before I deleted my FB I remember it being a struggle to not have my inbox flooded. I would disable all the notifications and every few months when they updated their policies or changed the site those notifications would helpfully get turned back on.


And their support for OpenPGP encrypted email.


Likewise with Twitter


Try deleting your Yahoo account. It doesn't work. They say they delete your account but then it wills till be there getting emails. It's impossible to delete your account.


Honest question: is it better to delete the account or leave it as a placeholder to prevent the name being taken by someone else? I have an ancient yahoo address that gets a lot of spam and a few newsletters, but I'm 99% sure that there are some website credentials out there that still use the address and dozens of contacts who still have it their address book.

I started to (try to) delete the account several times and always hesitated out of fear. And when I ask myself "what's the harm in leaving it there?" I can't really come up with a good answer.

Also interesting aside, Yahoo/Oath nag me every couple weeks to agree to the new Verizon draconian privacy forfeiture terms. I wonder if they'll eventually block logins and cutoff email forwarding if I never click through. My cynical hunch is that they'll equate logins with consent after updating the TOS to explain this.


It is risk averse to keep it and functionally useless. Yahoo recycles unused handlrs after a timeout of a few years and scammers find these and still all accounts associated with them.

Usecase: friend dies, years later some turkey kid stole his facebook bv his login was yahoo email based and BOOM has access to a lot of social media data.


I predict this will have changed recently due to GDPR.


They violate multiple GDPR restrictions already by having all kinds of options already pre-checked for you when you click "allow", and hidden no less.

https://twitter.com/ow/status/999575371556294656?lang=en


Does anyone still use yahoo for email? It's a place where spam goes to die.


Oath (Yahoo) doesn't even use Yahoo Mail. We're all on GApps.


What exactly is Verizon's plan with your company if I may ask? Are they just looking to make money off of existing customers and the few good products they have, or are they going to try and make a move to become a serious silicon valley competitor?


I can't say what Verizon's plans on for sure, they are pretty hands off from what I can tell.

Oath's goal is to be the premier "AdTech" company. According to leadership we're only behind Google and FB. But the division I work in doesn't touch ads so I'm not the best person to talk to.


Hmm that's pretty interesting. Makes sense that they are trying to get in on the ad game, but I would expect them to rebrand to get the revenue on the search/mail ads and such since Yahoo has the infrastructure already for it.

I assume that if Verizon is being hands off Yahoo must be doing decently well. Otherwise I'd imagine they would be trying to move it in a new direction.


Verizon was fairly hands off with aol as well when I worked there. I think they just don’t know what the fuck to do with those companies now that they bought them, to be honest.


When the company I work for was bought out by Verizon they gave it to AOL for that reason.


Are you guys internally talking about bid caching/etc? Curious if its on your radar/in discussion or just a non-point.


I couldn't tell you, luckily I'm on a different team.


Me.


Doesn't gmail scan your mail for data to target ads?


In the past. We (I worked on gmail ads during this time) stopped doing it about a year ago.

https://blog.google/products/gmail/g-suite-gains-traction-in...


Thats good, but something tells me I still shouldn't fully trust Google/Gmail with things like this.


Honestly, that's fair. In my experience Google does a lot toward being trustworthy since it's in their long-term best interest. But at the end of the day I'm just some random guy on HN comments, so you're right to be skeptical.


You shouldn't trust anyone. There's nothing stopping them from doing it again, or continuing to do it and publishing that nice writeup. Of course when your data is on a 3rd party server, that 3rd party has access to it. Why do we keep pretending like anything else is the case?


> Of course when your data is on a 3rd party server, that 3rd party has access to it.

You realize that oblivious third-party storage can exist, right? For example, Tarsnap. It's the the durable-storage equivalent of end-to-end encrypted messaging. The client does all the encrypting; and the client is open-source; so you don't have to trust the third-party at all.

This is a bit more challenging when you want clients to be able to operate over an index of the data, rather than just the raw data (i.e. you want them to be able to search through their stored email)—but it's fully possible to architect a service such that all indexing happens on the client (perhaps distributed between many connected clients in the case of a shared data-source like a Slack workspace), where the backend is just obliviously storing and returning E2E-encrypted copies of generated index chunks, in the same way it obliviously stores and returns E2E-encrypted copies of the data itself.


I mean instead they're just training their 'auto-reply' AI systems which they probably expect to be more valuable than advertising anyway.

Every interaction you have through google gives their neural nets a little more data to predict everyone's next reply.


Now I get ads in gmail relevant to what I search on google.com


What is the monetisation scheme of Gmail now? Just a free enticement to buy the G-suite?


Going by the uncanny-valley accuracy of their suggested auto-replies, they're training their natural language neural nets with every message sent over their wires.


They still have ads for the free version, just not based on your emails.


Is any of the data used from that collection/analysis still around?


No.

I mean, maybe high-level aggregate data like "X% of users got emails about shopping" might still be in some spreadsheet somewhere, or in people's brains. But no, I can't think of a way any personalized data would still be around.


Yeah, we believe you. wink wink nudge nudge.


A couple of years ago a friend sent me a barbershop quartet video and I spent a few days getting hairdresser ads ...


Solution: use an adblocker.


No great surprise. The only people still using Yahoo aren't very tech savvy and don't know about the data breach, plaintext passwords, etc.. My guess is Yahoo can get away with this and they know it.


It's just a matter of time until all companies that sit on a ton of data start using it to make money. There is just no way around it unless they are being forced by law not to do it.


it's not just a matter of time. As stated in the article, Google actually reverted its policy on the issue and has since stopped collecting data from user email content.

There's nothing inevitable about this. Even apart from law employees can steer the direction of companies, especially in tech where businesses are so dependent on a relatively small number of them. If you think privacy practises at your place of work are bad, bring it up.


Are you suggesting that Google took a stand on principle and voluntarily gave up gathering more data on users of its free services out of respect for privacy?

Or is it perhaps more likely that Google gets enough info from users via others means that it simply wasn't getting enough value out of email scanning? Or maybe starting a trend against email scanning hurts competitors more than it hurts themselves?


It is probably all of those things at the same time, and nothing stops employees from wielding considerable power when it comes to influencing decision making on any of those issues.

Google was, only a few months ago, forced by its employees to turn down lucrative defence contracts, and has since committed to not participate in them.

I think if employees manage to accomplish that we can all manage to improve our privacy practises.


I tend to agree, fwiw. Scanning emails prevented Google from taking a stand on privacy without subjecting itself to easy potshots of hypocrisy, regardless of how many times they explained the methodology and use of keywords instead of profile building.


See this announcement[0]. While I can't say for sure, it's reasonable to conclude that email monetization was confusing business clients. (I work for Google).

[0]: https://www.theguardian.com/technology/2017/jun/26/google-wi...


I think you are right on. And nothing prevents them from using E-mails in the future again if it makes money.


GP clearly said monetizing, parent clearly disagreed with the monetizing part, and you clearly changed the subject to gathering. That's disingenuous.


Who said they were gathering less data? Do you understand how an email mailbox works? Those emails are still there, just like they are with any other email service.


The parent comment that I replied to:

> Google actually reverted its policy on the issue and has since stopped collecting data from user email content

So the 'who' would be the parent commenter.

> Do you understand how an email mailbox works?

That type of comment feels so out of place here. I've clearly deeply annoyed you, but it's hard to tell from your reply what exactly your issue is.


The issue is people not understanding the difference between data collection and data use for one or more purposes. This sloppy thinking (which to my mind is so obviously wrong that I can't understand how anyone would think it) shows up repeatedly on this forum, and you're just the latest example. My apologies.

Once the data is collected, it can be used at some later date for any purpose whatsoever. It's up to the user to decide if they like the purposes it is being used for today and if they either think the data will be used for purposes they like in the future or will be able to migrate their data off the service.


> Once the data is collected, it can be used at some later date for any purpose whatsoever.

INAL, but I'm fairly confident you cannot unless you want to get lawsuit. Granted, the terms that practically nobody looks at give companies pretty far reaching grants on how they can use the data but that does not mean they can change the terms on old data that was collected under a previous contract.

I work with personally identifiable information and I have to get approval from a privacy team and a lawyer team to utilize data in any new way because of said contracts.


So Verizon bought Yahoo a company where every single email account had been known to have been compromised[1], and now they are seeking to monetize those same accounts by scanning their their contents and selling that data to third parties? No wonder the Yahoo hacks and very late disclosure didn't derail the deal. These two basement brands deserve each other.

[1] https://www.oath.com/press/yahoo-provides-notice-to-addition...


Because they know everyone that would have ever left them has already done so. Nothing to lose...


Not all users are clued up on whats happening in the tech world, some users are just users who want an email account.


Sure, and Google is happy to do that, for the layman or the techie.

There's just no need, or space, for Yahoo anymore.


I think the OP's point is that a lot of "legacy" users on Yahoo -- or on Hotmail[1], or on the free email they got from their ISP when they signed up fifteen years ago -- aren't inclined to move to Gmail unless they're tech-savvy enough to understand the case as to why they should. On the surface, free webmail services seem to be pretty interchangeable.

[1]: While Hotmail doesn't technically exist anymore -- it redirects to Outlook.com now -- I know more than one person who got their email account there back when it was Hotmail, still use it, and still refer to it that way. And at least one of those people is someone younger than I am and who's tech-savvy enough to be putting together his own PCs. Anecdata, of course, but I think a good illustration of why services like Yahoo Mail are likely to exist way, way longer than we might think.


Yeah, tech in general and especially email is annoying to switch from. My parents still use a yahoo email account because they have had it for so long and don't want to get other people to update their contacts.


The more traditional email interface on Hotmail/outlook.com isn't bad either. Gmail doesn't even show me the subject line by default when I'm replying.


So are you saying everyone should just get with the program and use Gmail? If so, how would this monoculture make the world a better place? To me, having multiple competing providers to choose from sounds preferable.


Isn't Yahoo gone? Shouldn't headline be "Verizon-Oath"?


In the same sense that "Google" is gone and we should now be saying Alphabet. That is the holding company, but the brand remains.


That's different - Yahoo has completely different owners now whose main business is in a different sector.

Headline should read:

"Verizon after acquiring Yahoo disregards normal tech-company conventions and snoops on everyone's emails like a scummy ISP."


what's the point of finely tuning semantics when it's functionally identical?


Getting people upset with Verizon is a good goal.


Do brands decide to violate user privacy? Or do people make that choice?


Google LLC still exists.


They can read all the spam they want ... it's what yahoo email was built for .. of one your trash email addresses!


Today the lost hours maintaining my personal mailserver suddenly don't feel so lost!


Didn't Yahoo also promise a few years ago it wouldn't do that (only after they saw a backlash against scanning the emails, I believe).


I hate yahoo and all these evil companies with their personal data collection fetishes.


It's because you're not paying for their services.


Most companies sell your data even if you pay.


Ha, unfortunately I am.


I'm personally shocked Yahoo is still a company.

I personally don't know anyone daft enough to use their services up here in Canada - even in the US (where I have personally seen the 'average Joe's' tech knowledge - especially in central and southern states - shockingly much less than that in Canada) it seemed folks avoided it, except the lowest-of-the-low in terms of tech knowledge.

They have, for the years since AOL was known universally as the worst tech company, seemingly fought for that title.

EDIT: Surprised at the downvotes. I'd love to know why.


> since AOL was known universally as the worst tech company

When it started to fail, it was lamented, but technically? It was exploitive and full of dumb subscribers, but the tech was fine.

> except the lowest-of-the-low in terms of tech knowledge.

Used yahoo for decades, since I was around when it started. Still do, for one email address.


but the tech was fine.

Yep, AOLserver was one of the first “app servers” and it was well regarded. The other PG built his company around it.


Plenty of folks use Yahoo, especially for secondary accounts. Also, when the iPhone launched, the only outside email provider that had 'push' privileges was Yahoo. I remember having a Google > Yahoo email forwarding setup just so I could get my gmail forwarded to me in real-time.


This is a bit off topic from usual discussion but a few years ago a game I play launched, probably around early 2015 or 2016. The dev hosted the game on Yahoo servers, and their incompetency almost killed the game. Their services that dealt with game hosting were constantly going down, grinding to a halt, or just being unwillingly to resolve issues. At the time Yahoo was on its way out of that sector and planning on axing that service. I'm not sure if its closed down yet or not, but thankfully the dev moved over to aws, no problems since then.


What "Yahoo servers" are you referring to? As far as I'm aware, Yahoo never provided cloud compute services -- the only Yahoo game hosting I'm aware of was for their own in-browser games.


I don't know, I just have a vague memory of it. All I know is he used Yahoo in some form to help host his game. Whether it was the code itself or just some services, they were awful.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: