Hacker News new | past | comments | ask | show | jobs | submit login
Reddit source code (github.com/reddit)
127 points by anonfunction on Oct 1, 2014 | hide | past | favorite | 55 comments



What causes people to need to sprinkle license boilerplate everywhere, including in files which are otherwise completely empty (like r2/r2/config/__init__.py), and then to have to update them every year? See this commit: https://github.com/reddit/reddit/commit/90cfcaaecc56cf35e758...

It just seems to defy reason that we must make humans increment a number every year in every file in our projects, nevermind the fact that the top 20-30 lines of every file in our projects has been taken over by stuff most readers don't actually need to read (again and again).

Is this really the best we can do without somehow letting the bad guys take our home away due to some licensing gotcha? Like simply having this at the top of each file:

    # see the top-level LICENSE file


I do this too and so why do I do it?

In essence it's because there is a very large number of lazy programmers who live by cut and paste. We're not just talking the "I've found a solution on Stack Overflow and will use that", but more "I've searched Github for keyword + language and this file does what I need".

The files are copied into their projects in its entirety, sometimes whole libraries are, and those programmers never bother to check how a project is licensed.

Once this process has been repeated a few times the code is firmly detached from the licence and any original license is ignored.

If I use the suggestion you make, then by them copying files into their project they have changed the licence of a file (it now inherits whatever their project uses).

Though I do like the idea of a stub instead of the full thing:

    # Licence: BSD (3-clause) https://github.com/owner/project/LICENCE.md
That would be enough to describe the licence for the file in a way that survives cut and paste, whilst also providing a URL for the full licence details.

In fact, I will now probably shift to that.


> In essence it's because there is a very large number of lazy programmers who live by cut and paste.

I've had programmers copy and paste GPL'd code into proprietary projects I'm responsible for. It's not laziness it's ignorance. "What's a GPL?"


I've shipped hard product with GPL onboard and in use. Its great! I've also complied, 100%. Also great!

Laziness, ignorance, irresponsibility. Or: usage.


> I've also complied, 100%

The cards I'm dealt are "proprietary projects". I comply 100% too: we don't ship GPL'd code. We've gotten close, though (c/o what I mention above).


Programmers who copy and paste can always copy the fragment they need, no need to pay attention to headers. Ultimately licensing relies on people operating in good faith. Even without explicit licensing, there is an implicit copyright on works such as source code, so taking code from random places without checking out the license is never warranted either way.

> If I use the suggestion you make, then by them copying files into their project they have changed the licence of a file (it now inherits whatever their project uses).

No, the only one who can actually change the license of a file is the rights holder, so the person who copied the code while ignoring the license misrepresents matters but does not change anything about how the work is licencsed.


If you think that's bad, you should see the many people who tried to copyright an empty file: http://trillian.mit.edu/~jc/humor/ATT_Copyright_true.html



> What causes people to need to sprinkle license boilerplate everywhere, including in files which are otherwise completely empty

The Apache 2 license library has language that indicates the use is to put bits of the license in every file. That's why. It's easy enough to maintain a license at the top of files with an IDE like IntelliJ


>Is this really the best we can do

Perhaps in places like github. With central versioning systems where the server is under our control we simply run daemons that check the copyright. Each user can define how it should work for them. If copyright is not okay the user can either a) have the submit fail so he is notified that it needs fixing or b) let it be fixed automatically by the daemon.

This fixing also includes adding a copyright notice to new files that didn't have any. Nicely defined depending on the file type.

The implementation was a one time effort which now saves us from doing exactly what they are doing now. Manually going through thousands of files to fix a copyright.


It is an eye sore for me as well.

The CakePHP project has done away with yearly update by replacing the year(s) with '(c)'.

https://github.com/cakephp/cakephp/commit/7b860debe4731a9cbc...

I remember watching a Stephen Fry interview who mentioned that placing the Copyright symbol once on your piece of work is sufficient to claim Copyright. But is placing the symbol once on a book, the same as placing a Copyright/License block once in a project directory?


As a matter of fact, you do not need to even declare copyright anywhere in the text to claim copyright (at least in the US). Copyright exists from the moment of the work's creation. [1] And placing a copyright notice does not afford you any other benefits without registration anyway. Once you've registered with the US Copyright Office, you may place a copyright notice if you want, but your work is still protected even if you don't. [2]

[1] http://copyright.gov/help/faq/faq-general.html#register [2] http://www.copyright.gov/title17/92chap4.html#401


I'm dating a lawyer. According to them, what you've written is true, however, speaking practically, there is a significant advantage to be gained from presenting evidence. If two parties show up to a dispute with identical source code, the one that has a copyright in it has an advantage. Sure, it's easily faked, and that could be argued, however, it would be trying to argue away evidence that exists which is much more difficult than arguing in favor of something that does exist. So if you want to lock in a victory and reduce court time, use copyright notices (and other legal notices like trespassing signs, etc.) liberally.


It just seems to defy reason that we must make humans increment a number

Ah, but why do you assume a human did that? Writing a script to update the year in all files doesn't take more than a few minutes to write. Chances are he simply ran "update_license_year" and committed.

Likewise for having a license on every single file: it may simply be a git hook that preprends it to every file with a certain extension.


The legal department, not the programmer, probably dictates that the copyright headers must be in each and every file.


I know this ruins any potential value of the intellectual property but…

…why not just put a mention in the top-level license that all empty (0 byte long) __init__.py files are in the public domain?

(Yes, yes, it's less confusing to license the entire thing under one license. But attempting to assert copyright on an empty file is humorous.)


empty __init__.py declare a python module. see http://stackoverflow.com/questions/448271/what-is-init-py-fo...


I am well aware of why an empty __init__.py is necessary. It is less clear why one must add license boilerplate to such a file: https://github.com/reddit/reddit/blob/master/r2/r2/config/__...

> you may not use this file except in compliance with the License

This is patently absurd: the file contains no content other than the license itself, and arguably its name (which is shared by millions of other __init__.py files around the world).


Ah. Sorry, I didn't understand this point. I agree it is totally superfluous to declare a license on an empty file.

But I assume that they have this header on every file as part of their internal process. Hence, they don't make an exception for empty files. I would book it as a cost of this process.

Also, it is handy if somebody starts appending to it ( i.e. https://github.com/reddit/reddit/blob/master/r2/r2/lib/autho... ) , they don't need to take care of that the license is correct.


An empty file with no surrounding context is just an empty file, but is an otherwise blank file embedded within proprietary software whose presence is required for the software to function somehow public domain? The contents of the file are trivial, but it's existence may not be.


Wow I am impressed that a commercial social networking software company open sources their entire codebase.


This is not new, reddit has been open sourced since 2008 http://www.redditblog.com/2008/06/reddit-goes-open-source.ht...


They don't open source 100%; the anti spam and vote obfucation stuff isn't there


Recent changes nearly dropped the obfuscation stuff in its entirety, anyway.


What else is missing? It should be stated in the README.


To be honest their codesource isn't probably worth much, it's their userbase and traffic that makes up all of their value.


Dear lord that JavaScript is painful to read. I want to submit a pull request and fix all their semicolons.


Not for/against the code formatting, just giving you a target (just hope you know who you're going after) :D

https://github.com/reddit/reddit/commit/8e2737dab409c46d688c...


FWIW, there _is_ a proper styleguide now, it just hasn't been retroactively applied to older js. https://github.com/reddit/styleguide/tree/master/javascript


Nice, looks like they went with a slightly modified version of airbnb's js styleguide. That's what we use at work.


Would've been more interesting if they released the original, Common Lisp code.


It's a pain that they are stuck with pylons, an unmaintained framework.


Why would that be a pain? Apparently it works for them, and is stable enough.

Now, if pylons turns out to be a roadblock to an expansion they want to make, that'd be a reason to swap it out for something different.


It's a pain because they have no community support, new security bugs can go unnoticed and many other issues that can arise because of lack of maintenance.

Pylons most probably won't roadblock them but will definitely bring a lot more challenge.


Isn't reddit already a re-write of slashdot and digg and so on .. and on it goes?

Me, I see no difference between reddit now, and USENET of the 80's/90's. Except that reddit isn't distributed, by nature, but rather .. empirical ..

I still use USENET. Its a quite place now the kids have all grown up and left the basements...


I once was an intern in a company that wanted to rewrite the whole Reddit code in .net. The founder was a charming person and managed to raise a huge pile of money. "We can redo this with current technology and elegant design! We will run circles around Reddit!".

We had a great time. Free snacks, lot's of parties, luxurious office furniture, skateboarding in the hall... In the end, the company ran out of money before the product reached a useful state.

Good times. It has been some time since then and I have a "normal" job now. Last thing I heard about the founder is that he started a new vc backed company destined to run circles around something.


That’s odd. As if the programming language itself is what makes Reddit what it is :)


Unfortunately from my experience, people thinking that background technology choices can give them a consumer advantage is not that uncommon.


>FYI, that includes pg's thinking on the use of lisp for viaweb

Yeah, I for one think this was equally flawed. People have made succesful and quickly iteratable web services/apps in all kinds of languages, including Perl and PHP.

Plus, one single data source like pg had is never that accurate, plus the fact he and Martin were already Lisp guru s helped them in their use of it.


FYI, that includes pg's thinking on the use of lisp for viaweb

http://www.paulgraham.com/avg.html


Well, they're right pretty much by definition - otherwise it wouldn't matter at all what technology you use. You could code your CRUDs in Brainfuck connected to MongoDB. In real world, you gain consumer advantage by e.g. chosing right database for the problem, or playing to language's strengths (which is what pg done at Viaweb).


That or more hardware to overcome bottlenecks caused by bad code. "It's running slow, we need more memory!". After some investigation, really... you've got 8 joins without using keys, and you're getting paid more than us how?


So true, while neglecting UI and usability. I guess developers hate that for people UI is your product.

Spend months rewriting the most beautiful backend and nobody care, redesign a button and everyone is excited.


I must have spent my entire time between 2004-2008 working for companies that did that sort of thing (rewrite to .Net). Plenty of cash to burn with no real possibility of success and no business plan.

In two cases, the guys running it knew it was going to fail from day one and their business model was to do this in two year chunks, syphon the cash out of the VCs after talking the product up, live the high life and disappear for a bit.

I felt no shame working for them back then but I do now.


Honestly the premise, the model and database matter a whole lot more than the language. The premise is the most important as how it's presented to everyone is determined by it. The model is only to help you get your head around it; users don't (and shouldn't have to) care what you do in the backend. The database will dictate what you choose to store and how often.

Even rubbish code, in any language, can survive for a lot longer by shifting the spotlight of scrutiny to the biggest bottleneck, the database. Which will also be the deciding factor in reducing growing pains.


He was obviously obsessed by .net and circles. But what do you mean by "circles around Reddit" or "circles around something"? Like... g+ circles?


It's a figure of speech. See how people use it:

http://news.gnod.com/search?f=run+circles+around


Thanks! It would be interesting to know what made this guy think that "elegant design" and using .net is the way to run circles around Reddit but...


To be fair, you can build a decent social site out of .net; e.g. Stack Overflow does quite well. But its not a main factor.


To be fair you can build a decent anything out of any language / framework / environment; the point is, technology doesn't matter that much if your product is good. Look at Twitter or something for example, they built a product in a language they were comfortable with, and evolved from there.


I totally agree with that.


User first, then technology


Where's the Vagrantfile?


On your computer until you git push and submit your pull request? :-)





Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: