Hacker News new | past | comments | ask | show | jobs | submit | more mendicantB's comments login

Looks really awesome, I'll await it's arrival.


What I've learned in the months since is that in order to overcome the panic and anxiety you have to do something very simple yet seemingly incredibly difficult; you have to let it happen. Let the panic wash over you

I must not fear.

Fear is the mind-killer.

Fear is the little-death that brings total obliteration.

I will face my fear.

I will permit it to pass over me and through me.

And when it has gone past I will turn the inner eye to see its path.

Where the fear has gone there will be nothing.

Only I will remain


Thanks for the tip, I don't think I can change it now (correct me if I'm wrong), and will keep this in mind for future submissions.


In the same aggressive tone you're holding towards the article, you probably should have read the publication before criticizing the experiment.

Quick lesson on pharmaceutical studies: they are extremely expensive and often can only have a limited sample size.

Quick lesson on statistics: it was developed to deal with small sample sizes and allow you to generalize to larger populations.

As such, pharmacy has very strict statistical standards with how to setup, design, and analyze experimental results. I'm a statistician, and I was impressed by how well designed this experiment was.

More so, the study only aimed to find a more precise cause of the issues. And, even with the solid analysis they had, are simply saying, "Hey, we should question this anti-gluten thing, it might not be as big an issue as we think".

Raise more questions, that's good science.


This is not a good generalization. I've usually only seen this issue with optimization problems when:

1) You haven't played with parameters 2) Implementation is not correct (usually the case with genetic algos, since it requires a reasonable amount of domain expertise vs say GD)


Copped out when called out. Deleted comment said GAs were bad search algos and tend to get stuck at local minima.


+1 I'd also add that genetic algorithms are for optimization, and can't really be compared with most of the algorithms in that chart. It'd be a sub-level where different optimization techniques for finding model weights, for each type of approach (classification, clustering etc)are compared.


Most (all?) of the algorithms on the chart iteratively optimize an objective. However, most of the objectives are convex or otherwise admit an optimization strategy that performs better than a genetic algorithm.


I believe you are repeating what I said (?). All of the algorithms have different methods of arriving to an objective function and leveraging it's results. Yet, most share the same problem in terms of optimizing it, and yes, most choose other routes.


You're correct. It isn't different. Simpsons paradox is actually a key indicator of a confounding variable.


"Collect them all where it makes the most sense. If that’s iCloud, Google Contacts, Outlook.com or heck, even FullContact, great"

Completely discredit your post why don't yah?


That's quoted out of context - you left off the next sentence:

"Obviously we’d prefer that you use us,  but the main thing is to make sure that you get them in a place where the data belongs to you and not someone who wants to keep it under lockdown."

Sure there's a marketing message there but it's buried at the bottom of a long quite involved post, and fully disclosed. Hardly discredits the article just because he's an interested party, IMO.


The post was hardly worth writing in the first place. This is extremely standard social network (or internet company) practice. The privacy concerns of just giving a massive dump of all that data are very complex.

Given that, it's incredibly obvious the post was written as a self plug, and burying it was disingenuous. If that author had begun with, "this is an issue and we have a solution for you", it would have been much more honest.


My thoughts exactly. He's completely blind.


It's dangerous to assume an even spread. Numbers are reported like this because the distribution of site visits is usually confidential information. From experience, I'd guess it's something like 1 million sites holding ~70% of the visitors, with the rest spread along the tail.


The distribution of links pointing to a Web site follows a power-law distribution (few sites get most of the links) [1].

It is very likely that the number of visits follow a similar power-law distribution.

[1] http://www.sciencemag.org/content/287/5461/2115.full


I'm pretty sure this is exactly what I just said.


Thanks. I didn't assume they were evenly spread but even if we make it 175M x 70% / 1M we still have only ~120 visitors per month.


Remember that probably majority of those sites are either dead, not public and made when some people were learning how to make sites. You can put that number probably down to 10% active/public sites. If visitor number is 175m this might be unique visitors to the platform = one visitor might visit multiple sites.


Yes but in this case it would be much better to cite the number of "active sites" and not an almost meaningless 20M figure. Still, it's hard to define "active sites".


Then the number would be lower... Its all about marketing. "Unlimited", "hundreds of millions", "probably most of" sounds better than "5% of resources", "20 million active", "there is no data but we think a lot".


This


You're still assuming an even spread amongst the ones that are in that 1M. My point is it's usually hard to tell, and you should assume very few sites hold most visitors. Just keep re-applying the concept (of the 1m, 1000 account for 70% etc).


Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: