Hacker News new | past | comments | ask | show | jobs | submit login

Linked data and SPARQL are definitely very possible solution and infrastructure can be decentralized. There is a bit of resistance in one part of the community from these technologies, because people are not used to them and compared to some other data storages they tend to be a bit slower, but thats other discussing. I do not have anything against these technoogies. What I currently don't like is that there are a lot of resources that are technically open source and free, but they are burried somewhere on the internet and sometimes hard to find and it takes quite a lot of time to review all existing resources. What I wanted to recommend is one central umbrella organization that will be (1) platform for collaboration in biomedical field, (2) central endpoint to all major existing project, possibly with some maturity level of projects and internal review in order to arrange projects into maturity levels, so it can be relatively easy to review how much you can "trust" that project of data, (3) central repository for open source NLP, data curation and semantic web tools, (4) some relevant body that would be able to propose and work on standards for data curation that would take in account all field specific needs.



You have seriously underestimated 1) efforts needed to develop and to maintain such resources – your best hope is to work with government-funded institutes; 2) resistance from the convention of a particular research field – you can rarely bend how people in a field work on things; 3) culture differences between biologists/doctors and programmers – biologists/doctors think very differently, which is frequently overlooked by programmers; 4) bureaucracy – everyone thinks he/she is the best; when you work with top groups to make things happen, you will find how problematic it is; 5) technical challenges – as you care about pheonotype data: there are no good ways to integrate various pheonotypes from multiple sources.

Everyone in biomedical research dreams about integrated resources. I have heard multiple people advocating SPARQL as well. If it had been that easy, this would have occurred years ago. In the real world, no one is even close. If you want to attract collaborators, learn Linus: say you have a working prototype and demonstrate how wonderful it is. Your ideas are cheap. The difficult part is a clear roadmap to make it happen.


I agree strongly with this. I started out in biomedicine many years ago with the same aims as the OP, but after a lot of experience, I think that announcing the database resource is just the first step, it's an easy one, and all the hard problems are the ones listed by x1k.

Based on what I see happening in large orgs with lots of machine learning resources is the development of new techniques to generate large amounts of homogenous phenotypic data across many measurement modalities. These large orgs have biologist/doctors: the small number of people cross-trained well enough to move between the two fields with ease. These orgs have gathered enough resources to compel the leading researchers to work with them, and they're starting to publish interesting papers.


Yes, I think those large companies may finally have a slim chance to revolutionize data integration, but it is too early to tell yet. We will see.


Could the largest funding organizations, in order to get a much greater return on their investments (i.e., much wider use and re-use of the results), require use of some standard data format for projects they fund?

(I know very little about the issue, but this seems to be a problem in many fields of academia.)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: