In the age of low-cost DNA sequencing, shouldn't this be simple to figure out?

abcc8 · on Nov 29, 2018

Yes, it is very easy to identify paternal mitochondrial inheritance in the current era. The catch is that a very low proportion of the population has mitochondrial genomic sequencing (currently). Typically, those who do have their mitochondrial genomes sequenced are those who are suspected of having mitochondrial disease and the prevalence of mitochondrial disease is very low. Of those who are sequenced, the overwhelming majority will have a mutation, that if inherited, is from the mother. I should also add that there are many 'common' mutations that cause the majority of mitochondrial disease and, as expected in a clinical diagnostic setting, these are the most frequently observed and identified molecular diagnoses. Paternal inheritance of mitochondria harboring a disease-causing mutation represents an extraordinarily rare etiology for an already very rare disease.

inetknght · on Nov 29, 2018

I work as a software developer at a DNA analysis company with an in-house lab. You might be surprised to know how much knowledge is regurgitated and how little actual research occurs.

kothar · on Nov 30, 2018

I'm very interested in working in the genetics/biotech industry as a software engineer - how much specialist knowledge do you feel is required to approach that career path? My background is computer science, so I have been looking around at biochemistry courses which might be helpful.

Would love to pick your brain if you have the time :)

inetknght · on Nov 30, 2018

Specialist knowledge? I've never been to high school or college. I don't have a degree. I've been writing software for 20 years but couldn't even spell the words in DNA before I was hired.

It helps to have bio knowledge but biotech really just needs a lot more software tools. There's a lot of low-hanging fruit for automation and analysis. There's a hell of a lot of room for people smarter than me to innovate.

There's a ton of room for improvement in the privacy area. You can read what I'd said about that in previous [0] comments [1]. I very strongly disagree with some types of industry standard software [11] [12]. They're highly sensitive to input and timing and contribute to reproducibility problems in the industry.

I think that's a problem which plagues the industry right now: reproducibility. I can't speak for the UK (where your profile says you're from) but in the USA the FDA (in charge of medicine) recently had a "truth challenge" [3]. In it, participants were asked to analyse the same set DNA data. Nobody got the same answer and very few even got consistent answers [4]. That's Very F@#$ing Scary if you ask me since the DNA data starts with just text files [5] [6]. So irreproducibility of results is solely due to poorly designed software (and I can discuss at length about that if you want).

If you want to stay in computer science, then tackle that. A lot of the industry is based off of BWA [7] and GATK [8]. They're nowhere near as bad as IMPUTE2 or Admixture, but they're highly sensitive to parameter changes (and every analysis company uses different parameters). There are other open-source analysis tools as well but they're nowhere near as popular. There's of course not-so-open-source tools too which I don't feel like mentioning (I work on one such software as an internal tool for the company).

On a different note, one of the problems that underpins this whole technology is the way "Next Generation Sequencing" works: shear your DNA into small fragments so that sequencing machines can analyse each fragment in parallel (in contrast to linearly through the strand before it was cut up) [9]. Then the software analysis tools try to re-assemble all those pieces back into a single fragment.

If you were to take this comment and cut it up into words and sentence fragments, would you be able to reassemble the words back into the correct post? Of course not. So that's very much a limiting factor to analysis [10].

Realistically this forum is not ideal for communication. I looked at your profile but did not see a way to contact you. But feel free to contact me at the email in mine. I don't check it often though :)

[0] https://news.ycombinator.com/item?id=18196717 [1] https://news.ycombinator.com/item?id=16754393 [3] https://precision.fda.gov/challenges/truth [4] https://precision.fda.gov/challenges/truth/results-explore [5] https://en.wikipedia.org/wiki/FASTA_format [6] https://en.wikipedia.org/wiki/FASTQ_format [7] http://bio-bwa.sourceforge.net/ [8] https://software.broadinstitute.org/gatk/ [9] https://en.wikipedia.org/wiki/DNA_sequencing#High-throughput... [10] https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4531809/ [11] https://mathgen.stats.ox.ac.uk/impute/impute_v2.html [12] http://software.genetics.ucla.edu/admixture/

kothar · on Dec 1, 2018

Thanks! Excellent pointers, I had no idea the sequence fragments being aligned were so short in some cases.

I can't see your email either, so I assume the profiles don't publish them. You can reach me at mike at <username>.net however.