Hacker News new | past | comments | ask | show | jobs | submit login
Open, rigorous and reproducible research: A practitioner’s handbook (2021) (stanforddatascience.github.io)
277 points by sieste 10 months ago | hide | past | favorite | 24 comments



> While recording the versions of all packages used is a step in the right direction, an even more comprehensive solution is to package up the entire environment using something like Docker, or to use online computation, such as Google’s colaboratory notebooks.

An even better way would be to describe the environment using Guix with channels or something similar accompanying the code, or a Nix flake or any similar environment descriptions with fully fixed dependency chains. Docker can be _forced_ to use a fixed version, but any `apt update` will ruin that completely, and both Nix and Guix are tools that on top of providing these environments for executing code with the same set and versions of tools, also provide the ability to generate container images that can be shared.


Very cool that universities like Stanford that usually costs a lot of money offer their study material for free


Anyone used it? Is it any good?


It feels like a great material at explaining the field to new joiners.

My team is currently migrating from software to data science and this publication feels exactly like something that would bridge the gap


What company do you work at where they're willing to train a bunch of developers up to data science from scratch? That's a multi-year investment if you want a team that can actually solve real problems without causing more problems.


data science isnt that hard!


It's not hard in the same way software engineering isn't hard. That is to say: it's hard to do well.


Well, it's mostly impossible -- i guess that's a certain sort of way it isnt hard.



Ok, we've changed to that from https://datascience.stanford.edu/programs/stanford-data-scie... above. Thanks!


Would be helpful to feed into custom chatgpt that ask the gpt questions directly


I just tried that but there is pagination. Would love to know if we can instruct GPT4 to click through the pages


Oh, thank you. The original covers almost 50% of the screen (iphone) with the static heading!


"In summary, this handbook is a guide to making science more open, transparent, and reproducible by presenting best practices"


[flagged]


I genuinely don't see how that's relevant here.

Are you saying that if a president of a university is caught doing something bad and fired/resigns for it then every piece of output from that university related to that field should be disregarded?


I'm saying the timing is not the best. Read the room first. And I'm not saying he "was caught doing something bad" I am saying he is the President of the university, the leader, and he is a fraud.

This isn't a one-time thing, and this fraud stretches back to before he was president, there were already rumors of his fraud in his community, and they hired him anyway knowing he could be a fraud. He also worked to stop this from coming to light, and to this day hasn't really admitted he was at fault except to say 'I could have done better', he is also stil a professor there. A 17 yo can figure this out but a university board can't? come on.

"Marc Tessier-Lavigne, who has spent seven years as president, authored 12 reports that contained falsified information"

What kind of legal threats did you receive? Stephen Neal, the chair emeritus of Cooley, one of the biggest law firms in the Silicon Valley area, represented Marc Tessier-Lavigne and sent a number of aggressive letters requesting retractions or seeking to block the publication of articles that detailed Tessier-Lavigne’s involvement in alleged incidents of fraud. Neal is also a former attorney for [disgraced former Theranos CEO] Elizabeth Holmes.

https://www.latimes.com/science/story/2023-07-21/how-stanfor...


As other commenters have noted, large universities are quite heterogeneous and their administrations are fairly decoupled from their research activity, except in a big-picture sense. The timing is basically irrelevant because you might as well regard Stanford Data Science and the Stanford University Administration as separate entities for all directly-research-related purposes.

I'm a researcher at a large university (admittedly a grad student, and therefore more insulated from university administrivia than a professor), and I don't even know who my university's president is.


> I'm saying the timing is not the best. Read the room first.

Instead of reading the room, perhaps read the page:

>> 2021-12-14

From the preface. This document is at least 2 years old, it wasn't just published.


I didn't know about the story when posting this. I also don't think it's relevant, even in a read-the-room sense. Bad press about the leader of an institution isn't a reason to stop discussing the institution's work.


Jan Hendrik Schön, a researcher at Bell Labs in the field of condensed matter physics, committed scientific fraud by fabricating and manipulating data in his research.

The fraud was discovered in 2002.

I'm thankful that most people can understand the difference between the institution from the fraudster, because Bell Labs has produced some amazing work since then.


For those unfamiliar with the story of Jan Hendrik Schön, I suggest watching this excellent series of videos by BobbyBroccoli on YouTube [1-3]. The series does a good job breaking down the context and the situation that Schön was in, along with the incentives that encouraged Schön to commit fraud.

[1] https://www.youtube.com/watch?v=nfDoml-Db64

[2] https://www.youtube.com/watch?v=Riio1eKOSKg

[3] https://www.youtube.com/watch?v=KsSuhP60qnI


Stanford is a big place. Would not disregard all the work from anyone connected just because of something the president did.


See my comment above, I am sure there are great ppl that work there but the timing?


should I wait 2 years then it's fine to use the data science handbook for my interview prep?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: