I'm very interested in working in the genetics/biotech industry as a software en...

inetknght · on Nov 30, 2018

Specialist knowledge? I've never been to high school or college. I don't have a degree. I've been writing software for 20 years but couldn't even spell the words in DNA before I was hired.

It helps to have bio knowledge but biotech really just needs a lot more software tools. There's a lot of low-hanging fruit for automation and analysis. There's a hell of a lot of room for people smarter than me to innovate.

There's a ton of room for improvement in the privacy area. You can read what I'd said about that in previous [0] comments [1]. I very strongly disagree with some types of industry standard software [11] [12]. They're highly sensitive to input and timing and contribute to reproducibility problems in the industry.

I think that's a problem which plagues the industry right now: reproducibility. I can't speak for the UK (where your profile says you're from) but in the USA the FDA (in charge of medicine) recently had a "truth challenge" [3]. In it, participants were asked to analyse the same set DNA data. Nobody got the same answer and very few even got consistent answers [4]. That's Very F@#$ing Scary if you ask me since the DNA data starts with just text files [5] [6]. So irreproducibility of results is solely due to poorly designed software (and I can discuss at length about that if you want).

If you want to stay in computer science, then tackle that. A lot of the industry is based off of BWA [7] and GATK [8]. They're nowhere near as bad as IMPUTE2 or Admixture, but they're highly sensitive to parameter changes (and every analysis company uses different parameters). There are other open-source analysis tools as well but they're nowhere near as popular. There's of course not-so-open-source tools too which I don't feel like mentioning (I work on one such software as an internal tool for the company).

On a different note, one of the problems that underpins this whole technology is the way "Next Generation Sequencing" works: shear your DNA into small fragments so that sequencing machines can analyse each fragment in parallel (in contrast to linearly through the strand before it was cut up) [9]. Then the software analysis tools try to re-assemble all those pieces back into a single fragment.

If you were to take this comment and cut it up into words and sentence fragments, would you be able to reassemble the words back into the correct post? Of course not. So that's very much a limiting factor to analysis [10].

Realistically this forum is not ideal for communication. I looked at your profile but did not see a way to contact you. But feel free to contact me at the email in mine. I don't check it often though :)

[0] https://news.ycombinator.com/item?id=18196717 [1] https://news.ycombinator.com/item?id=16754393 [3] https://precision.fda.gov/challenges/truth [4] https://precision.fda.gov/challenges/truth/results-explore [5] https://en.wikipedia.org/wiki/FASTA_format [6] https://en.wikipedia.org/wiki/FASTQ_format [7] http://bio-bwa.sourceforge.net/ [8] https://software.broadinstitute.org/gatk/ [9] https://en.wikipedia.org/wiki/DNA_sequencing#High-throughput... [10] https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4531809/ [11] https://mathgen.stats.ox.ac.uk/impute/impute_v2.html [12] http://software.genetics.ucla.edu/admixture/

kothar · on Dec 1, 2018

Thanks! Excellent pointers, I had no idea the sequence fragments being aligned were so short in some cases.

I can't see your email either, so I assume the profiles don't publish them. You can reach me at mike at <username>.net however.