Hacker News new | past | comments | ask | show | jobs | submit login
Should biologists study computer science? (arstechnica.com)
29 points by terpua on July 31, 2009 | hide | past | favorite | 41 comments



In case you guys are wondering about current college curriculum, I just graduated with a BS in biotechnology-bioinformatics so I can provide a little insight.

My core classes consisted mainly of biology, chemistry, organic chemistry and physics. Other classes that my degree required was advanced mathematics (with biological application), advanced statistics, and computer science.

About the computer science portion of my degree: I was required to learn C, C++, discrete math, Perl, and data structure/algorithm design. I chose to take machine language as an elective.

What I've learned in industry: The CS foundation I built in college was critical. Although Perl is widely used where I work, languages like R and C are used more often (for my particular projects). I've also learned that my job is to bridge the gap between biologists and computer scientists.

Biologists say what they want to get-> Statisticians/Mathematicians think up a procedure -> I make sure the formulas make sense with the subject at hand, program it in Perl or whatever -> CS people optimize it and do their magic to make it run super fast -> checked by everyone to make sure its okay. -> stats analyze and feedback to the biologists.

My point being, I always think everyone should learn more math, but the industry has found a way to get around everyone needing to learn everything (jack of all trades master of none) to having experts work together towards a common goal (an oceans 11 type set-up). Everyone has something special to offer. Personally, I think the current set-up is working fine. Although everyone should learn more advanced math (or biologists should learn more CS), not everyone is willing and/or capable.

I hope this was helpful.


Upvoted for comparing science work to oceans 11.


I'm conflicted about this. A little knowledge can be far more dangerous then no knowledge. I have seen things... things I can't unsee. Things done to software by biology and chemistry Ph.Ds that still give me nightmares.

But make no mistake, modern science is neck deep in serious computering. Not being computer literate is almost as bad as being just plain illiterate.

So here's what I think about this. Every scientists who's not a physicist, mathematician, or computer scientists needs to study more math, more stat, and more cs.

In fact I would go so far as to say everyone needs the equivalent of an associate degree in cs to get a Ph.D. in anything. For mathematicians and physicists this would happen almost without any extra effort, for biologists it might be quite a bit of extra effort, but well worth it.


Just a day ago I asked how HN participants learned biology

http://news.ycombinator.com/item?id=731362

in their higher education. I would think that with judicious course selection and choosing fitting internships it would be possible for a CS major to learn quite a lot of biology, or the other way around, but perhaps I am mistaken. A book Mathematics Unlimited--2001 and Beyond

http://www.amazon.com/Mathematics-Unlimited-Bj%C3%B6rn-Engqu...

included an article by a European author suggesting that all math majors should study a lot more science than they did as of a decade ago, and all science majors (in all sciences) ought to study a lot more math. That makes sense to me.


"A little knowledge can be far more dangerous then no knowledge. ... Not being computer literate is almost as bad as being just plain illiterate."

Need examples before I can know whether I agree or strenuously disagree.


Vis-à-vis A little knowledge :

One particular Ph.D who I worked with had created a much lauded database. It was done in MS Access. It stored natural numbers. Each sample contained a few gigs of natural numbers. Numbers could be anywhere form 0 to 99999. So each entry contained all possible numbers in increments of 0.02, with Null in all places where the sample did not contain that number.

This is what happens when people have never heard of foreign keys or a one-to-many relationship or a many-to-many relationship.

This could have been done so it's a lot smaller and faster to search. The place holder could have been a negative int, instead of Null, so that it would evaluate to something other then Null in searches. An infinity of things could have been done better.

But when you have Ph.D and you legitimately think of yourself as brilliant. AND you are used to doing a lot of HARD work, then heroics with MS Access just seem natural, right, that's what hard science is damn it. And this is why a little knowledge can be worse then no knowledge.

If that guy had instead hired almost anyone else to design the database, he would have been better off. It's hard to imagine anyone who could have screwed up worse AND had the tenacity to stick with it.

PhDs are not just smart, but they also are used to working hard so when things get hard they don't quickly perceive that as signal to try something else.

As to being computer illiterate:

A field biologists might get away with it, but anyone working in a lab will sooner or later measure something with an instrument entirely controlled by and accessed though a computer.

Let me say that again, a computer is the gate keeper.

And it's not enough to simply know how to use the GUI/API what ever. You have to also know at least something about the algorithms involved in the analysis. Because a lot of analysis of things that can't be seen with the naked eye is actually statistical inference of what's there.

Let me say that again, a whole of of stuff is not measured in the way a lay person thinks of measuring, it is inferred using fancy math.

But make your math a bit too fancy and you're just making stuff up now. Use a second order polynomial and you're good, use a 3rd or higher order and you can see anything you want. There's a reason we use cubic b splines for computer graphics, it's because we can fit them to any shape we want!

And oh yeah I've seen 3rd order polynomials used in science, no they were not used correctly.


"So each entry contained all possible numbers in increments of 0.02, with Null in all places where the sample did not contain that number."

OK. That is heinous. It makes me cringe. And if the person who made it thought that it was an example of brilliance, that's cocksure ignorance coupled with a needy ego.

___but___

There are a lot of biologists out there who run computers at the the level of how a business person deals with Microsoft Word: what the computer can do equals what paths are available through the GUI.

In the example you give, the biologist got the computer to do something it couldn't do before, something that made life in his domain easier, and presumably made science possible that would otherwise have been impossible.

Many biologists don't have the money or time to hire a consultant to do it right.

If a heinous kludge lets you do biology that you otherwise couldn't do, then by god that kludge has merit. You can make fun of the person for being proud of the kludge -- but you should admire them for their willingness and ability to make foreign tools do something new.

My work spans biology and CS, and unlike 99% of the other biologists I know, I have experienced what a well managed project feels like -- version control, a build process, bug tracking, etc. To me, the Mythical Man Month is not a novel concept; it's a given. The level of ignorance among biologists about how to get computers to do useful work in novel ways is stunning, and the biologists don't know how ignorant they are.

Again: ___but___

The computer scientists these biologists hire are often so averse to kludging their way forward, that the result is stasis. Adherence to notions of architectural purity, reusability, the 'right' libraries or platform, all result in long iteration cycles where the biologists get no feedback as to whether or not a given line of inquiry is promising or not. And the biologists don't get what's happening -- they can't say, "No, don't do a 'proper' object model and code a 'proper' solution yourself; instead, write a perl script to munge this other tool's backend XML to achieve a similar effect, so I can find out whether or not that functionality will be useful; and if it is useful, maybe then we can do it 'properly'." The biologist doesn't know what Perl is, and doesn't know what XML is.

And anyway, this approach is likely to be anathema to a good computer scientist. People don't get into computer science to make ugly kludges; they get into it to make things of beauty. Reusable libraries. Infrastructure. Clean GUIs. Excellent data structures. Etc.

So there's this huge tension between the biologist PI and the computer scientist he hires to build stuff, and neither one really speaks the other's language.

If a biologist knows a bit of computer science -- say, enough to understand his own limitations and ignorance, enough to communicate effectively with computer scientist employees or collaborators -- he can be tremendously effective. If he can write a little bit of perl and munge his own XML to find out if an approach is promising, he can show his computer scientist employee the kludge, and say, "Make this better."

So, a little bit of CS can be a very good thing. I suspect I'm violently agreeing with you. As you can tell, the topic is near & dear to my current life's work. :)


Indeed I think we are in agreement.

Definitely in the need for enough cs to understand your own limitations. That's actually quite a bit of cs, but then again obtaining a Ph.D. can take a while, so I'm sure a little more time for extra study can be found :)

And taking a look at your work... Are you really trying to create a high throughput electron microscopy workflow? Whoa! If you make electron microscopy even close to high throughput that would be awesome!

But how much of the brain structure are you able to preserve during sample prep? For that matter, how quickly does brain structure degrade after death? Minutes, hours, days, weeks, months? Are you imaging the sections in a light microscope first? Correlating the light and electron results of the same slice in software? Freezing the sample in liquid nitrogen so as to freeze it faster then ice crystal can form? Man science is fun, too bad it's not a good way to make a living.


"Are you really trying to create a high throughput electron microscopy workflow?"

Yes. We've increased acquisition rates by a factor of ~15-20x over what is available using commercially available TEM systems. We now have tens of terabytes of image data in the can, and when I'm not reading HN (argh!) I'm working on collating & analyzing these data.

Light level microscopy preceded embedding of the material for EM, the anatomy is correlated between the two modalities, and the sample preparation method (this is of mouse brain) is by perfusion with an aldehyde mixture, so there is essentially no deterioration of the ultrastructure.

So, I wonder what you do -- no home page in your profile -- but I do remember what it's like to make a living. I like biology better. :)


20x? That's awesome!

I used to work in a bioinformatics startup - tons of fun with cutting edge science. But the startup tanked, and I moved on to a high paying corporate software engineering job. But I hope to be back in bioinformatics before long.


Thanks. :) It's been a long road, hope to get a paper out in the next ~6 months or so. Shoot me an email (address on my home page) if you ever want to be in touch outside of HN.


Excluding the humanities, I agree with you. On a more personal level, learning CS skills is often a fallback for people who get a PHD and can't find work in their field.


This article is timely for me, having recently started a new job in bioinformatics. Specifically, building a centralized database (warehouse) for a variety of cancer research study data.

I'm coming from the opposite direction - a computer science background to the biology. A huge challenge for me is rapidly learning enough of the biostats and process to understand how to allow researchers to leverage having all this data in one place, easily accessible, and with a front-end that makes "sense" to the those MD/PhD types. A starting point is understanding what type of questions researchers can ask now that they have all the different data in one spot.

Fred Brooks said something like "computer scientists are toolsmiths." We build tools for user needs that simplify and strengthen the user's work. This requires the ability to somehow understand the user's needs, communicate with them effectively, and implement usable tools for them.

I sometimes feel like it is a failure on our part as builders to make it necessary for people who need software tools to build their own. I'm more than happy for other fields to add more CS type education to their required courses, but I'd rather be able to give researchers tools so that they stay on their critical path, rather than having to learn enough to hack together their own full solution.


This is a bit like the welder and the diver question, is it easier to teach a welder how to dive or a diver how to weld ?

For divers and welders the answer appears to be that it is easier to teach welders to dive than the reverse, even if both are far from trivial activities.

For biologists and computer scientists the answer is probably that it is easier to teach programmers to do biology than the reverse.

(good) Programmers have something universal about the way they apply themselves to problems and that way generalizes to problems in a different domain.


I admit I don't have much to go on, but it seems like biology is a full load of study, whereas at least undergraduate 'computer science' ie programming can be picked up by a smart person almost incidentally. I've seen that done, and I think it /is/ done more often than the other way around. As for writing /good/ code, that can come from practice and aesthetics. But knowing biology (or another science) takes real study.


I think (please correct me if I am wrong, always glad to learn) that one key difference would be that learning computer science would (mostly) need books ,the internet and the time and willingness to buckle down, while a proper study of biology would involve serious lab work, with a need for costly equipment and instructions.


Casual programming can be learned incidentally, but there is a degree of skill that can not. If you want to build large programs that actually stand a chance of being correct, or performant, you really can't just "incidentally" pick up that skill. That's in the "10 years to mastery" class (and 12 years in I'm still learning, honestly).

That said, does biology really need that? Maybe the incidental skill is enough for most biologists. There is definitely a core set of tasks that biologists would need computer scientists for, though. (And for once, probably actual computer scientists and not just "good programmers".)


The ease of learning biology depends on the level of abstraction you're aiming for. I had a summer job working in a proteomics lab during my computer engineering undergrad. My basic background knowledge of the relation between genes and proteins was good enough for me to contribute to published research. Of course you also need to learn the details of whatever experiments you're studying, but that's not too hard to pick up. Basically, progammers don't need to memorize all the amino acids to effectively contribute to bio research. You learn the details of whatever bio niche you're studying, and the rest is problem solving, analyzing and programming.


Modern biology is so specialized that learning the tools for some area is often far simpler than the track people take getting a PHD.


That's pure ignorant bigotry, although it might be well-received in a crowd of programmers who also know nothing of what biological research is like. You've got to keep in mind that the main task in biology is inference of biological structures and processes. This is complicated by a number of factors: (1) Biological systems are typically way more complex than computer systems. (2) They are comprised of components which are way too small to see in action or manipulate, so all inferences are from second-order effects presumed (often inaccurately) to arise from the subsystem under study. Designing a biology experiment to cope with these difficulties is an exceptionally difficult skill to master. Compare that to programming, where the central task is the construction of a procedure using well-understood components, the states of which can easily be queried.

Claiming that computer scientists would have an easier time learning biology than the reverse is like claiming that chess is a more sophisticated game than go because the lines of tactical analysis are often longer. It ignores the relative complexities of the systems, and the fact that you can usually say more about the simpler system, and usually say it more elegantly.

Incidentally, I am not a biologist. I am a mathematician who has moved into statistical genetics. However, my last postdoc advisor was a formally trained biologist who taught himself statistics and computer science, and has made his career in bioinformatics. But he could hold his own in both those fields.


Well, as a reformed biologist and current informatician, I certainly think that biologists should study CS. However, even more important than studying CS, they absolutely NEED to learn how to program. I've seen lab scientists use extremely convoluted and error-prone workflows to conduct their analyses and experiments- workflows that, if they knew just a little bit of Python, would have been much simpler. I'm actually teaching a class in the fall on "utility scripting" to a mix of molecular biology PhD students and informatics master's students for just this reason.

Regarding the age-old question of "should biologists learn CS or should CS people learn biology", I'm firmly in the camp of biologists learning to do their own CS, or at least learning enough CS to productively work with CS people. A little bit of CS really goes a long way towards improving a biologist's workflow. A little bit of biology, however, is almost completely useless for a CS person who wants to get involved in lab science. It really takes a surprising amount of domain knowledge to be productive in a laboratory, or even to understand the nitty-gritty details of an experiment at a deep enough level to write or modify an existing bioinformatics tool.


Haha, you have the opposite opinion of me, but you also pretty-much have the exact opposite experiences too. Maybe we both know a lot in our "main" subjects and think of all the underlying knowledge/related material is required to be useful, but in reality you can just pick up the knowledge you need on its own without any background knowledge on how it all works, you'll be confused when that stuff in mentioned or brought up, but if you stay in your niche you'll be fine.


One aspect of the article that I haven't seen much discussion of is the second part- about representing biological processes using an algebraic notation. While this might be really helpful for computational biology, it strikes me as a lousy idea for general work, because it presents an overly reductionist view of what's going on. Biomolecular pathways are almost never as simple as they seem at first, and they always interact in weird and complex ways. Presenting them as a big, gnarly, nasty diagram communicates this to readers... explaining them using nice neat equations makes the whole thing seem both simpler and better-understood than it probably really is.


EVERYONE should study Computer Science. The questions are, how much and what parts?


The article is about computers as "part of biological research". I wish it had been about the real place I want biologists: designing the computer systems themselves.

Biological systems have scaling and reliability that we computer scientists only dream about. (Can you name a self-repairing computer system that runs for 80 years?) I want computers with the kind of systems thinking that biological systems have, not just more x86 cores on a single chip.

The only biologist I know who switched to designing computer systems is Alan Kay. I think we could do with a few more like him.


short answer, yes. Otherwise how can anything useful or meaning be effectively done with the huge volumes data that biologists now have quite frequently. They should also work on their statistics background to so that they can do more sophisticated model / hypothesis testing, but that's a whole separate issue that gets into the matter of education and community incentives and this is not the appropriate forum for that latter topic.


A computer scientist can effectively analyse large volumes of biological data, I'm not convinced a biologist could do the same because there is just so much computer science related to visualisation, graphics programming and data modelling and their prerequisites.

A person who did 1/2 and 1/2 would likely not have enough knowledge or experience to do either the biology or cs side particularly well.

Not to mention, there are very few people I know who are good at both biology and cs.


I know of one good example, Alan Kay, the inventor of Smalltalk.


I briefly looked it up, molecular biology would be a good fit though its pretty-much chemistry. It entirely depends on which areas are studied in biology.


Sure, but what's more interesting is that he credits what he knows about biology as the inspiration for Smalltalk and its particular flavor of OO.


I agree, but only up to a point. Astronomers need to understand telescopes, but astronomy is about studying distant objects. The same applies to biology -- biologists are not bioinformatics specialists.


How about having biologists work side-by-side with experts in data analysis and statistics, rather than requiring the scientists be be experts in all fields?


Often leads to the statisticians giving the biologists tutorials in statistics and the biologists giving the statisticians tutorials in biology. Which is okay, but then you have to wonder if more formal study up front would have been more efficient. Which brings us back to this topic.


Useful links for those interested in this crossover:

Great Principles of Computing: http://cs.gmu.edu/cne/pjd/GP/GP-site/welcome.html

90 min talk by Peter Denning about Great Principles: http://www.youtube.com/watch?v=5a_pO3NYJl0


Corollary: should Computer scientists study biology? Yes


Agreed! If you can get your hands on a copy of a university level textbook in genetics that will be some of the best time you could invest in learning something about another field unrelated to the one you are currently in.


From the 10,000 foot level I think it is related, and will become more and more related as time goes by. DNA seems to be a pretty robust information encoding and executing system. There's a lot we could learn from it.


Absolutely.

The first time I read what a ribosome does my immediate thought was CPU/Turing machine. There are so many analogies it is scary.

Nano technology is here to stay, it's called life.


Not sure why you where getting downmodded - that's an interesting observation.


Some people seem to have a way to express their disagreement with the 'down' vote instead of saying what is on their minds. It comes with the territory it seems.


Probably both could do with more math.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: