Hacker News new | past | comments | ask | show | jobs | submit login
More than a thousand scientists have built the most detailed picture of cancer (bbc.com)
160 points by hhs on Feb 5, 2020 | hide | past | favorite | 37 comments



How can a software developer contribute to cancer reasearch/treatment? Is there any hard problems? or manual work that could be automated?


Work for Universities (for P.I.s housed at universities) for less than you are worth on the open market. (And still be resented by profs making less)

Soylent data (it is made of people), is hard because it has all these lawyers attached to it. Figure out a trivial/foolproof way for EHRs to be satisfactorily anonymous for the legal departments to allow sharing but still be useful. (I won't be holding my breath)

A concrete thing without any "hard" bits There is an open source tool named Protege that is ... err... the best tool of its class ... fix it or make a replacement

[]https://protege.stanford.edu/


I worked for 5+ years for a PI at a prestigious institution.

Overall my experience there was great. It was a lot of fun too. I got to design and implement fairly large projects early on as a junior engineer. I could use any tech that I wanted. I had maybe two dozen users at best, but I interacted with them weekly if not daily.

That said, I was barely making 50k. Now I'm a senior engineer at a FAANG and though I am more financially stable, I am not having as much fun. I routinely push code to millions of users now, but I don't see any of them.

I suppose the FAANG is a different kind of fun, but sometimes I just miss being the guy who's coding this awesome tool that's going to further scientific research, and your post reminded me of that.

Maybe when I retire I'll go back to help some PIs with their software issues. I know they need the help.


To put you on the spot. It would be great if "maybe when I retire ..." was an actual plan you had. It sounds like you have the opportunity and ability to personally make a difference, I sincerely hope you do. :)


Thanks for the vote of confidence. I think it's early to make concrete plans though. I have a few other ideas to try out before I turn to academic research.


You’re EHR comment reminded me of this previous comment from another person here : https://news.ycombinator.com/item?id=18828777

Based on his comments, I wouldn’t hold my breath either. That market sounds incredibly expensive to enter.


There is a lot of data being produced in these studies and outside of them. There are so many areas that a software developer could really benefit these projects. For example, a lot of bioinformatic software that is released is designed poorly, non-functional or hard to adopt. I wish there was a better way to integrate bioinformatics research with better software development principles. Unfortunately, there isn't a lot of motivation to make good software as the end-goal in most cases is to just get a publication and move on. There are examples of groups that really do put a lot of effort into quality software such as Seurat [1] or scanpy.

Another area is trying to manage and process the datasets in large studies -- especially meta studies. Even for some of my small projects I dedicate a huge amount of time trying to curate random tables from publications, convert gene identifiers etc.

[1] https://satijalab.org/seurat/ [2] https://icb-scanpy.readthedocs-hosted.com/en/stable/


Cancer therapeutics development effectively boils down to a massive-scale optimization problem over a highly multi-dimensional surface. The status quo in the industry is to say "let's find a sufficiently good local optimum by having humans largely-manually try to make transformations to known molecules" in this space. But with software, we can address this issue way more systematically. Here are some questions we can ask (and answer with software) to drive this process more rationally:

* How might you represent the molecules that define this space? (lots of graph theory, linear algebra, vector math)

* How would you model their biochemical properties? (w00t supervised machine learning, but also physics simulations)

* Once you build a model for a properties, how can you deploy that model at billion-scale to start searching this space? (containerization, distributed systems, k8s)

* How do you iteratively improve your compounds (generative machine learning, evolutionary optimization, lots of search/RL methods)

* Once you choose a compound you like and get some experimental data about it, how does that new ground truth value change your model? (data warehousing, data pipelining)

* If that data is valuable, how do you keep it safe? (network security, ACLs, virtual private cloud configurations, VPNs, etc.)

* How can we visualize what our data and predictions look like? (visualization, endless types of front-end development)

So yes, if you want to attack cancer through a data-driven process, there are TONS of ways for any kind of software engineer to join a team that drives that forward.

(Disclaimer, I’m the founder/CTO of a YC-backed startup developing therapeutics in-house. We’re hiring people for all of the above roles to work on curing cancer. Check us out at https://reverielabs.com/careers)


I work at a start-up (Paige) working on creating an AI-powered software platform for improving cancer diagnosis and treatment. We have a bunch of job openings (both AI and non-AI) and have raised over $70M.

There are a handful of other companies out there working in this space, but there is a ton of opportunity for software developers to contribute.

Feel free to email me (see profile) if you would like to talk about it.


Lot's of ways - just look at the job pages of Cancer institutes/hospitals, Universities and research institutes or biotech/pharma companies.

There is a huge amount of scientific and clinical data to make sense of - a lot of it in the public domain.

Understanding the mutations at the molecular level is only a start - if you want to treat then you will likely need to understand how that mutation manifests into biology - what processes are affected and there you get into the very complex networks that is biology.

But if you don't have a science background, it doesn't matter as most of the actual work is data management, processing and wrangling, before you can get to the insight bit.

A lot of the data is being generated a huge scale as well - so making some of the existing analysis fast is also needed.

A lot of the scientific software in the bioinformatics area can be found on github - so there it's as accessible to collaboration as any other github project.


> Lot's of ways - just look at the job pages of Cancer institutes/hospitals, Universities and research institutes or biotech/pharma companies.

I'd be slightly weary of putting "pharma companies" in the same list, or at least not without the following warning:

Pharma companies are still companies. That means: research will be often killed because it's not in the company's financial interest to pursue it further: if your drug is better but the market for it is small, your drug won't make it. It is not unheard for a company's plan for cancer to be "we'll wait until someone else does it and then we'll buy them".

Pharma companies still fund their own research, and they are still putting out new, more effective drugs in the market. They probably pay better, too. But if you're making a career change because you're idealistic about "making the world a better place", a pharma company might not be the best move.


> if your drug is better but the market for it is small, your drug won't make it

As written, this is not what actually happens. The only way to know if a drug is "better" is to run expensive (and risky) clinical trials on it. Big Pharma wouldn't even bother going that far if the market isn't sufficient to make a profit - given the extremely high rate of failure in clinical trials, they never would have started the program in the first place.

I realize this may seem like a minor semantic quibble but there is an awful lot of disinformation like "Big Pharma could cure cancer but it's not profitable" floating around.


There are good and bad in every category - some academics make stuff up, and academia can be incredibly competitive, backstabbing and plain nasty, driven by large egos getting on hype bandwagons - but I wouldn't define all academia like that.

Academia is competitive - in faces the same stresses and temptations to create cosy cabals to protect from the winds of competition - just like, perhaps if not more so than say the pharma industry.

Similarly, there are pharma companies that are science rather than sales led ( ie driven by the belief that if you make a cure for cancer then the sales will take care of itself, rather than making a me-too and winning on sales strategy ).

And in both, the experience at the coal face is often the same - colleagues working together on a common goal, oblivious of the high level politics.

Also spivs that buy an off-patent drug and jack up the price because they know nobody else is making the drug - those are not 'big pharma' companies - those are the same type of cold eyed bean counters that buy companies for their assets and sack everyone.

> It is not unheard for a company's plan for cancer to be "we'll wait until someone else does it and then we'll buy them".

Yep - but not all - otherwise there would be nothing to buy. Back to the original point - by definition - the ones recruiting scientists are the ones trying....

Also let me nail the 'curing cancer' isn't profitable mis-conception.

Putting on a cold 'spock' logic - the great thing about the health care industry is that if you cure a patient of one disease they will come back eventually with something else - the only patients that don't come back are dead ones.

ie the reason health care in the US is so expensive is because it's unlimited post retirement spend, and health care is much better so people don't just die of an infection or heart attack, but live much longer, developing a large number of conditions that need to be managed as they slowly fall apart.

Saving people from one thing, means they will get two things later one - it is long term good business.

Note I'm not saying that's the explicit plan - just if you have a conspiracy theory these companies are driven by just profit, then curing cancer isn't a commercial problem at all.


If you make a lot of money, almost certainly the best thing you can do is earning to give¹. Your impact would be far greater again if you were to donate towards any cause other than one which mostly victimises wealthy people.² Instead of cancer victims, you could help poverty victims or animals in factory farms.

¹https://80000hours.org/articles/earning-to-give/#should-you-...

²https://www.effectivealtruism.org/articles/introduction-to-e...


That's not an answer to the question that was asked.

Some people might not want to help animals or people in the third world, they might specifically want to help cancer patients.

Furthermore if someone is looking for the most important thing to work on, it almost certainly isn't animals or third world poverty, though it also probably isn't cancer either (unless they have an unusually efficient way to help), because cancer is basically the least neglected cause in the world, as well as being highly intractable.

If you want something a lot more impactful than working on either of these things you have to think outside the box, do something that few others are working on and that could have a large impact in a way that is not well-known.


> as well as being highly intractable.

It varies a lot with the type of cancer, how fast it grows, how advanced it is, and a thousand of additional factors. https://en.wikipedia.org/wiki/Cancer_survival_rates

Cancer is not a single illness, but a family of related illnesses, the main problem is that we don't have a different street name for each one.

Imagine viral infections. What is the survival rate for viral infections? It can be a common cold, or Ebola, or somethin in between.


Cancer is a group of illnesses that share a key common trait of uncontrollably dividing cells.

But when I say intractable I mean that the amount of output in lives you get per dollar of input is small compared to other issues that you can help with.


There are many hard problems and some that can be addressed by algorithms/automation.

I think the best advice I can give is reaching out to a lab in your area that does cancer research. Most are happy to talk about what hurdles exist. We, for example, don't have the one true solution to cancer, but a whole lot of computational and algorithm needs that will improve our understanding of the disease.


Well, they are tackling the problem medically.

Why don't we try and engineer around the problem. All cancers exist for the same reason - our immune system isn't picking them up.

Ideas?

1. Nano bots to replace immune system

2. Blood monitoring kits like Theranos tried.

3. Inmune system training kit that tries to encourage attacking cancer cells ( huge backfire potential )

(Disclaimer - I have no idea what I'm talking about )


Your intuition here is great.

1. Nano bots -- no idea; outside my wheelhouse.

2. Liquid Biopsy -- many companies are working on diagnostic tests to detect cancer early, monitor treatment, detect recurrence, etc. Some of these products are already available but most are in R&D phase.

3. Immuno-Oncology -- the most promising development in cancer treatment in a long time. Successful tumors evade the immune system by essentially hiding themselves. Many companies are working on treatments to assist immune system in different ways. Some products have been in the market for a while now and have been pretty successful.


There is tremendous potential in Immuno-Oncology. I worked with umbilical cord blood [hematopoietic stem cell] transplants and this area really has tremendous potential for tech that only started being used in the late 80s to 90s. IMO much of the immediate potential for progress in software development will be with the legal and ux itself of ten tech though. Apart from a few companies the software is old, buggy, and has not or will be very slowly updated.

One idea that comes to mind is making vastly improved LIMS and electronic patient recodkeeping software, as what is presently being used [bbcs] is from the 80s although this could be accomplished with just a front end written for it.


Read up on the state of the bioinformatics field and get a feel for the tooling used in typical pipelines. There will be a lot of R and high performance computing. There's no shortage of work to be done.


Take a look at boinc there are many cancer related computation projects


I would think there’s a lot, especially with molecular biology taking a computational twist.



> Scientists also developed a way of "carbon dating" mutations. They showed that more than a fifth of them occurred years or even decades before a cancer is found.

> He added: "Unlocking these patterns means it should now be possible to develop new diagnostic tests, that pick up signs of cancer much earlier."

The journalist probably pressed this last point, because it can't possibly be true in a practical sense. Even if you can detect these changes early on in cells, we can't possibly test every cell in the body for mutations on e.g. a yearly basis, right?


Correct. And even if you could, you would find everybody has some of these potentially dangerous mutations somewhere.

The problem only comes if get the right combination in a single cell and then the cells starts to multiply.

At that stage you might then be able to pick up evidence from circulating tumor DNA ( https://ghr.nlm.nih.gov/primer/testing/circulatingtumordna ) from a blood test.


We'll hold on now. If we assume that mutations tend to cluster around cells exposed to particular carcinogens, and assume there is some number N>≈5 mutations required, in theory we could look for cases where many cells present with x% of the necessary mutations. And it becomes much easier to detect because you presumably have many cells which mutate stochastically together.


You are describing a cervical smear test - target cells more likely to be mutated - assume that if you find problem cells in your test ( after you have destroyed during the test ) that there are similar ones still left in the body ( not because they were mutated together - but because they are related - one cell inherited a mutation from another ).

So sure. That's done already and having a test which detects 'precancerous' based on genetics might be useful. One of the problems at the moment is the treatment is often almost as bad as the cure - so you might only be able to step up frequency of the tests.

The problem with this approach is only certain bits of the body are easily accessible in this way - that's why people like the ctDNA tests - but they have their own challenges.

Another non-destructive way would be imaging - either thermal ( cancer cells are more active and so hotter than normal ), or using some sort of labelling markers - however can't see a way to target arbitrary early genetic mutations with this.


Even if you could, it does not help if there isn't a good treatment.

That's why screening for some cancers (e.g. breast and prostate) is controversial.

First of all you might get false positives or negatives. False positives can harm patients since they get worried and might even get treatment for a disease they don't have.

Secondly even if the screening gives a valid result: In some cases (like mentioned before PC and BC) that does not mean the patients live any longer than the ones who where not screened.

There's a 5-year survival time metric which makes screening look very positive.

Let's say you and a friend have a cancer that will kill you in 10 years, no matter what. If you get screened after 4 years, you'll have a 100% 5-year surivival rate (so screening is celebrated as a success: more screening!).

However, you won't live a second longer than your friend. You will be worried though and might get treatment with serious side effects.

That's why the mortality rate is IMO an actually much more interesting metric.


I know that this is going to sound "flavor of the day," but seeing as how genetics is essentially a large data set of variables and we aren't certain about which of then are indicative of cancer, doesn't this seem to be very strongly correlated with the problem that most ML/deep learning tries to solve?


ML is fairly common in genomics, but for identifying predictive variables for cancer status, it's difficult. The training set is a matrix where rows are people (where some have cancer and some don't) and columns are genomic features (mutation, methylation, etc). You can easily have hundreds of thousands of features but getting even a thousand cancer patients enrolled in a study and sequenced is expensive and slow.

So, even though there are many "AI in biotech" companies out there, for predicting cancer status, most eventually end up hand crafting a small number of features based on extensive knowledge of cancer biology. The ML model tends to be simple and far less important than the features.


Genetics = statistics.


Now if we can get thousands of software devs to get together across orgs, and develop a detailed picture of all possible software bugs then imagine...oh wait we already do that...and we still have a zillion bugs on bug trackers the world over.

How come?

As Gandhi said we "attach an exaggerated importance to prolonging mans earthly existence"

The software industry has realized keeping old code alive (specifically by fixing ancient bugs in outdated systems) creates more problems than it solves. Sooner or later we will realize the same thing about Cancer.


Haven't heard that before...


"Please don't post shallow dismissals, especially of other people's work. A good critical comment teaches us something."

https://news.ycombinator.com/newsguidelines.html


Nice work, very comprehensive, ‘A’ for effort, good for citing in reviews. The results are well known from other studies (eg sequencing cancer precursor lesions) and theoretical predictions over 40 years old.

They try to elevate the significance by saying the results could be useful in early detection, but the companies doing early detection through circulating tumour DNA have already moved on from looking at mutations to methylation and other epigenetic changes.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: