Why Scientists Need to Learn How to Share

kastnerkyle · on March 22, 2014

Open data and data sharing is one of the key components of the reproducible science movement. Some teams have even gone so far as to share their entire virtual image with data, code, and associated libraries (when data size is small). Docker would be a great choice for this, and some teams have moved to that as well from what I hear.

This is definitely NOT the majority, but I think the people who take the extra effort to help other researchers "stand on their shoulders" should be commended. However, in the era of "big data research", it is not always possible to share the data openly due to public file server size limitations.

Grieverheart · on March 22, 2014

The problem stems mostly from the way research funding works, which promotes a competitive stance instead of a collaborative one; scientists try to publish to the highest impact factor journals possible and often hide important details so that other research groups do not publish any new findings before them. On the other hand, I'm not sure what a good solution to this problem would be but I hope someone comes up with one soon.

gjuggler · on March 22, 2014

While this problem may be true of certain poor actors in some fields of science, I would argue that it's far more the exception than the rule.

My experience (in genomics / evolutionary genetics) was overwhelmingly positive, with huge amounts of collaboration all around. Scientists did try to publish at the highest-impact journal because they care about their career prospects, and there were often competitive labs racing to the finish line with a breakthrough publication.

But I never saw researchers withhold important details in order to accelerate their own publication or to hamper the efforts of others. By and large, I saw huge teams collaborating on large projects with a deep sense of purpose to move the field forward and improve understanding.

In fact research funding in genomics became so collaborative — with increasing portions of the research budget being put towards huge consortium-based projects — that smaller labs began criticizing funding agencies because their smaller projects weren't being funded. These projects are typically more competitive and higher risk, yielding potentially high-impact articles with a far smaller number of authors.

What's the answer in the end? As with many things, it's balance. Cooperation is great, but to a fault. Sometimes having small labs working in relative isolation, even competing against other small labs, can yield great innovation and progress. Other times you need a huge collaborative effort to do something big, expensive and important for the future of the field.

wslh · on March 22, 2014

I think ego and the patent system take a big part of it. Imagine a country funding research related to cancer. The best outcome will be to cure it. It doesn't matter if you share some of your research and scientists from another country or group solve "the last mile".

The issue in this scenario is how the patent system currently works. Instead of giving the ownership to the scientists who make the breakthrough, the discovery must be shared between the different research groups.

stargazer-3 · on March 22, 2014

Not sure how important is the role of a patent system here. In theoretical physics and astronomy, despite having no need for patenting, the problem still persists. I would say that the standard university publish-or-perish model seems to be the main obstacle in the road to open research.

svalorzen · on March 22, 2014

Do you remember the name of the third astronaut aboard the Apollo 11, aside from Neil Amstrong and Buzz Aldrin? If no, then you can imagine why for somebody may matter who solves "the last mile". Not saying it's not a problem, just that oversimplifying is probably not the best approach.

return0 · on March 22, 2014

I agree that forced sharing of data is an absurd requirement. It would make sense if the data itself was attributable and citable, so that scientists can get recognition, citations and attribution when their data is being used by other studies and when these citations have the same kind of impact to their career as do article citations. Journals try to enforce data sharing because they want to maintain their position as arbiters of academic affairs.

gjuggler · on March 22, 2014

It's important to recognize that PLOS' new data sharing policy is only about data directly tied to the publication of a traditional journal article. So while PLOS authors are being forced to share their data, it's only done in the context of a published article, which will accrue citations, recognition, and attribution in the standard way.

So any data shared by this policy WILL absolutely be tied to the traditional methods of scientific credit, via the linked journal article. To me, requiring that reasonable data is published and archived alongside scientific literature doesn't seem absurd at all.

> Journals try to enforce data sharing because they want to maintain their position as arbiters of academic affairs.

Is there any evidence behind this claim? PLOS seems to behave in exactly the opposite way. It's true that SOME journals use extreme selectivity or control over copyright to maintain their position as arbiters of science. But PLOS, whose largest journal PLOS One is both open access and makes no judgment on the impact of the science it publishes, seems to be actively reducing the amount of control it exerts over academic activities.

eshvk · on March 22, 2014

Except that publishing an article has way way more influence than someone acknowledging the work you did. Because it shows you are getting new shit done.

_delirium · on March 22, 2014

One problem in CS is that a lot of interesting datasets are encumbered. When work is done in collaboration with a company (which is common), the company may often not agree to publish the dataset, and retains a veto right on what gets made public.

Among many areas, this is getting quite normal in natural-language processing and machine translation, where a lot of the good datasets are owned by companies. A lot of research lately has come either from, or in collaboration with, Google in particular, because they have a treasure-trove of data that powers Google Translate. They are not likely to agree to release that, because it's one of the keys to their competitive advantage in that area.

Even for in-house data, there are increasingly explicit demands from universities that scientists work with the university technology-transfer office to commercialize their work. Everyone wants to be the next Stanford, with spinoff startups and a stream of licensing revenue. So even if you collected a nice dataset in-house, there are financial pressures that rather than just giving it away, you should instead talk to your local venture capitalist about your unique, hard-to-replicate dataset with promising commercial value...

kleiba · on March 22, 2014

While this may be true in some areas, I'd like to point out that a lot of data is already being shared amongst scientists. I know, for instance that in the field of natural language processing, there is a large number of data collections (corpora) available -- although not all of them free of charge. But they are there and people use them. Competing research is performed using the same corpora to compare the different outcomes, and you want comparability so that the impact of your work becomes evident to reviewers.

Some of these corpora came out of research projects that explicitly stated the creation of such a shared resource as one of its goals. So sharing is a topic that's well on the community's agenda. While it's certainly possible to improve sharing even more, it's not true that scientists are not already doing it.

Of course, there might be different attitudes in different fields.

Grieverheart · on March 22, 2014

I think what you're saying might be true for "newer" sciences, but other fields, like physics, are stuck in older research models and only recently have been starting to warm up to the idea of sharing (but still very cautiously).

return0 · on March 22, 2014

What you 're referring to is closer to materials, rather than the results of research.

kleiba · on March 22, 2014

That's certainly true in a number of cases, but I also think that distinction can sometimes be hard to make. And the mere sharing of results (in the form of papers) is basically the bread and butter of research anyway.

untilHellbanned · on March 22, 2014

Science publishing is not different from Hollywood movie making. The same self-serving human nature is the driver in both.

I work in the OP's field and appreciate his view. That being said, PLOS has been a considerable disappointment. Noble minded efforts like this without real rewards won't work. Just like nobody cares about some famous actor's/actresses' pet charity cause.

What will work in science publishing is a system that both incentivizes people BOTH economically and in terms of their reputation. That system doesn't exist right now.