Hacker News new | past | comments | ask | show | jobs | submit login

Thats nothing compared to the work in China right now. In ten or 20 years, the west could be permanently behind.

From https://www.independent.ie/business/technology/data-sharing-...

> DeepMind, the AI lab of Google's Alphabet, has laboured for nearly two years to access medical records... Last month, the top UK privacy watchdog declared the trial violates British data-protection laws, throwing its future into question.

> Contrast that with how officials handled a project in Fuzhou... The summit involved a vast handover of data. At the press conference, city officials shared 80 exabytes worth of heart ultrasound videos, according to one company that participated. With the massive data set, some of the companies were tasked with building an AI tool that could identify heart disease, ideally at rates above medical experts. They were asked to turn it around by the autumn.




I don't think the issue here is so much the data (which is anonymized), but the conflict of interest. I know conflict of interest is par for the course in China, but for the individuals who made the decision that this startup should be given exclusive access to the data to then be given stakes in the startup is clearly a conflict. Doubling that conflict is the fact that they work for a non-profit, which has tax-preferred status and also received a stake in the startup.

Let me make an analogy here: I advise several startups and am given shares in those companies in return for my time. A few of them make software or sell solutions that the company I work for could use. At my company I can influence whether or not we use these startups as a vendor. It is clearly unethical for me to be involved in any decision that pertains to a company where I hold an ownership stake. It would also be unethical for me to take an ownership stake after I was involved in a negotiation with those companies. I would disclose and recuse myself from any such decision. That's without adding the extra (legal) wrinkle around non-profit status, which doesn't pertain.


> I would disclose and recuse myself from any such decision.

Disclosure is the key. As long as the people making the decision and cutting the check (investors, board, and officers) are aware of the conflict of interest, they can choose what your involvement should be.

For example, if you hire an AI expert as a consultant on evaluating AI companies, it may be with the explicit intent to buy the company for which he works as well as evaluate other companies in the space. This is one way to check people out before making an even bigger commitment.

In certain fields, there are so few "experts" that there may be no possibility of avoiding a conflict of interest because everybody is interconnected.


Where your analogy breaks down is that the company is a spin-out of a hospital lab. This is usually encouraged by academic institutions. For example, at my university, if I develop an algorithm that I want to commercialize then they might encourage me to found a startup. Then, they could take equity in the startup or give an exclusive license to the patent (where I am the inventor) to my startup. Many academic institutions encourage these activities to get research out of academic papers to create products with real world benefit.


> which is anonymized

How reliably is the anonymization? Based on very little reading, data can at least sometime be de-anonymized. If that becomes possible 5 years from now, what happens then to the data already made public? How will individual's privacy be protected?


Unless they are in the business of selling advertisements or life insurance, there is probably very little incentive for them to de-anonymize the data. The bigger risk with that data is that they don't secure it properly.


My understanding is that these are pathology samples. I don't know what metadata may be associated with them, of course. Just the samples and the diagnoses would give very little for re-identification to work on.


I actually think that the data is the major issue here. As @chriskanan mentioned, this is pretty standard for a company that was spun-out of a lab. The individuals involved are given equity in the company as is the institution. The conflict of interest is then recorded and these individuals won't be involved in vendor-related decisions for these companies. But, given that they were a lab-spin out, this start up probably provides some worthwhile information back to the lab. The individuals' conflicts of interest is pretty cut and dry (even given the recent MSK COI issues).

But, from the NYT piece, there are three issues related to the data that are at issue: 1) the dataset was generated over many decades by many pathologists who were not similarly compensated; 2) the company has an exclusive license to the data; 3) it is unclear if the patients were properly consented to allow their data to be used for commercial purposes.

The first issue is a question of money. MSK owns the data, but what about the people that generated the data? When an lab spins out a company, it's quite common for the creators of the IP get a chunk of equity (or proceeds or licensing depending on the institutional IP policy). In this case, there were dozens of pathologists that were key to building up that dataset and they were left out of the company. But this is just a money question... and could probably be solved by throwing more money at the problem. But it is important. Because, you want to encourage other doctors to contribute their findings to these types of databases to allow for future studies. Without their buy-in, these data would be lost. If you forget to include these people when that data is commercialized, you start to lose that buy-in.

The other two questions are more interesting in my opinion... Of course the company would want an exclusive license to any MSK dataset that was vital to the survival of the company. And I'm sure that MSK gets a royalty each time their data is used. But that doesn't limit the company from getting their data from another institution as well and not using the MSK data. In this case, what would MSK's recourse be? It's hard to say because MSK also owns a piece of the company... which is where the conflicts really start to raise questions. The MSK data should probably be available to anyone who wants access and has the ability to pay the licensing fee. This seems like the course most non-profits would take. But because they also own part of the company, they also want the company to have exclusive access to it. What's the proper course for a non-profit to take? What if this company could provide a truly valuable service to patients? What if the company was economically non-viable without the exclusive license? These are legitimately tough questions. I think the better solution would have a F/RAND license on the data, which may have lessened some of the above equity issues as well. However, I'm not sure the course MSK took is particularly bad. The article states that they had difficulties in getting the company funded, which underscores how difficult of a project this is.

The other side of this coin is that in most spin-out situations where the IP was created by the lab, granting the company an exclusive license to that IP (patent, etc) is extremely common. That's not entirely what happened in this case. (I don't know anything about the specifics of this case, just speculating here...) If the lab generated IP (software) that generated a model based on historical MSK data, then it's not just the lab's IP that was used, but institutional IP (generated by many other people). This makes the exclusivity a bit tougher to explain. (And is the cause of the first issue).

The biggest question to me is if the patients were properly consented for this type of data sharing. Given that the data was collected over decades, it's really difficult to know what the consent process was across the board. What they really want to avoid is the connotation that patient data is being used to fuel for-profit companies. Once that starts, you could see a slow erosion of trust of the patients... patients that have multiple choices for physicians in the NY area. Even if the patients in question were properly consented, MSK will want to avoid this type of PR to make current and future patients more comfortable.

So, in this case... the data is really the issue. Of the three issues I mentioned, I think the biggest question is that of the data access. If the license to the data wasn't exclusive, I don't know if this would have been as large of an issue.


Regarding consent, these are for image analysis, not bioinformatics (I can't ID the person from the information). As such this is a standard tissue repository use case. The courts have held, in cases (UCLA, WashU, and Florida come to mind) that the archive is the property of the institution and the patients have no claim to "remnant tissue". As DNA is considered identifiable, that would be a separate issue, but they're not going after genomics.


Regardless of if the data is identifiable, the patient would need to be consented for their data to be used for research purposes. This is a pretty standard clause, and I’d be shocked if MSK didn’t have this as part of their standard consent, but it may not have been done for all patients over the years. The patients would not have a right to reclaim the reminant tissue, but they may still have a say in how the data derived from their treatment is used, deidentified or not. Normally this is an IRB issue. I have no idea how you’d go though licensing these data commercially. It’s an ethical minefield.

For example, if patients knew their data would be licensed commercially, they may not have agreed to their treatment.

And even if it was legal do do this without consent, you’d still want to make sure you had patient consent just to avoid these issues at all. If they didn’t have consent, it’s a huge PR issue, even if not a legal one.


An IRB can waive the requirement for consent when the data is de-identified and the barrier to gain consent would be onerous. In this case, with 25 million slides, and quite a few of the patients are dead or lost to follow up, I'm fairly confident the IRB would waive the consent requirement.


That describes most professors in life sciences at top US universities, and they all seem fine with it...


The US has specific rules in HIPAA that allow for research, especially when using deidentified datasets. The company I'm working at (based in Berkeley, CA) has almost 400tb of radiology studies- and that's not even counting the data we've found in India. At Google Next just a few weeks ago there were representatives from NIH who were talking about the datasets they are building and making public as well.

Point being, I don't see us running behind on this.


There's going to be a Sputnik moment in the next decade or two where The West wakes up to how far it's fallen behind in this area.


Could the West not take data from China in the same way they've taken data from the West to get ahead (arguing the capability, not the morality)?


Funny, I've heard AlphaGo referred to as the Sputnik moment for China.


Any chance you have a source for that? AlphaGo seemed like a monumental achievement but I didn't hear anything specifically China related.



A company that I interviewed with in Shenzhen mentioned to me that they had acquired a contract with a central hospital there allowing for them to acquire patient data in real time. This was in service of developing DL algorithms for treatment and prognosis. The amount of data they are exposed to is amazing and certainly puts them at a huge advantage to even their biggest competitor in the West.


One person’s notion of “ahead” may not match another’s. You could also say that China’s lack of data protection laws and norms puts them way behind the West in terms of user privacy. Sure, “the West” is inconsistent in its protection of people’s data. There is no Gold Standard country or regulation to point to. But we are still unarguably way ahead of China in this regard.


He's clearly not talking about user privacy.

Chinese companies and government will have access to billions of medical records to train their algorithms on while their western counterparts are squabbling with individual hospitals over access to a few thousand records.

China has the talent, money and data to be the leader in this space.


I guess my issue is with the unstated, but negative, connotation of "being behind". If western companies being unable to get their hands on my medical data makes us "behind" then I think it's a good thing to be behind. As part of the West, I am happy to not be a "leader in the space" of AI if being such a leader necessarily requires careless sharing of vast amounts of private data.


We get to get spied on by Google and Facebook instead. Now those companies are using the money they made by spying on us to get exclusive access to patient data so they can expand their monopolies to healthcare.

https://www.healthcaredive.com/news/facebook-healthcare/5207...

https://healthitanalytics.com/news/facebook-nyu-will-use-art...

https://techcrunch.com/2018/06/15/uk-report-warns-deepmind-h...


> Thats nothing compared to the work in China right now. In ten or 20 years, the west could be permanently behind.

That's a completely acceptable trade off to me. China and the west have different perspectives on how users privacy should be handled.


In the west we have total surveillance of personal communications, but we can't datamine anonymized health information.

Under what ideology does that make sense?


Under the ideology that medical data in any form is the property of the patient and not an institution or government. I get it it’s a catch-22 situation. Without data there are no meaningful statistics. But are those who data is used to arrive at diagnostics and cures provided any benefit? I don’t think so - change that and you may see an uptick in data sharing.


There's a difference between having total surveillance and that surveillance being legal and generally approved of by the masses.


So the west would rather than millions die from cancer than make health information more accessible to researchers.


Yes, because throwing away patient privacy in the name of hopefully maybe someday being able to get better treatment is not a trade-off that we are willing to make.


The trouble with Chinese science is that it's so politicized and politics is so corrupted and face-saving that junk science gets officially elevated. "The West" has problems too, but there are still some reputable independent institutions.


Or is it a matter of time until a high-enough ranking cadre gets his medical records leaked and the Chinese government enacts their own form of HIPAA.



It’s not a free country, with no free press. In what universe could a lack of regulation on how health data is used in China wind up a net positive for the average Chinese?

For example, what if they augmented their social scoring system (a profoundly bullshit, evil enterprise) with this health data, and decide that you shouldn’t get whatever because your health data says so? What if the health data they choose to harass you with is just strongly correlated with being Uighur?

Also, it’s pretty presumptive that this data is actually valuable for the diagnostic task it’s claimed for. Biotech companies die all the time due to someone’s presumptions.


80 exabytes of heart ultrasounds? I don't believe that.


Nothing is permanent. Nothing stopping western companies from buying Chinese data. Also Sloan Kettering could be stuck if they already signed contracts promising the data.

Google deepmind should just have hospital’s bid for data prices... kind of like for google fiber or amazon headquarters shopping cities.


Mr. President, we must not allow... a privacy violation gap!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: