Hacker News new | past | comments | ask | show | jobs | submit login
Stack Overflow hit 1,000,000 questions last night (stackoverflow.com)
95 points by bjonathan on Oct 13, 2010 | hide | past | favorite | 44 comments



Stack Overflow is the ultimate long-tail resource. My favorite example of this is, what I consider to be, a very obscure question I asked then answered about getting (http://stackoverflow.com/questions/570076/netbeans-forwarded...) anti-aliased fonts and sub-pixel rendering working when running java software on a remote X11 server. This sort of thing is incredibly esoteric, yet, close to 1,000 people have looked at it in the last year. When I asked the question, and found the answer, there was no useful information that I could find about it, yet when you search on google today for 'x11 java anti-alias' the first result is my question, and the second is someone elses.

We've come a long way since expertsexchange.


And now that question would get closed and marked "Belongs on ServerFault" before anyone could answer it.


why would that get moved to serverfault? It's about an IDE. I wouldn't vote to move it to SF.

if anything it is more SuperUser-ish, but it's so developer centric that I don't think any off-topic votes are necessary.


Wouldn't it just get moved to ServerFault, and then someone would have answered it, and the Google result would have pointed at ServerFault instead?


No, unfortunately it wouldn't. The google juice is completely concentrated on the stackoverflow domain — I constantly find insta-moved SO questions at the top of the search results, with the moved question on SF or SU nowhere to be found. Yet another defect of their asinine community silo policy.

They don't do a 301 redirect — how would google know where the real question is?


we do a 301 redirect if the stub is deleted. We probably should be auto-deleting the migration stubs after a period of time but haven't gotten to that quite yet.


Ah, gotcha. I was real into SO for the first week, and then stopped contributing. I didn't realize that was what would happen...


its off the topic but i read expertsexchange as expert sex change


I actually have trouble contributing to StackOverflow as much as I'd like. The community has gotten so good that most questions with my realm (Scala & Java) seem to be answered within 30 minutes. Since I only look for questions to answer a few times a week, I never find anything.

At least I can get points by asking good questions.


This is great news, if for no other reason that prowling google for answers is 100%[1] more effective than it used to be, as most searches lead me straight to stackoverflow.

[1] well, 100%-ish. I've not measured this. Nor am I going to.


There seem to be a number of sites that just poorly mirror content from StackOverflow. They usually have the question and then maybe some answers, but don't usually include comments to answers or even which answer was marked as correct.

It's very frustrating when they come up above StackOverflow in Google searches. I end up slowing down a lot to make sure I don't click on those results. Sure I could just go straight to StackOverflow which I often do, but there are other good sources of programming knowledge on the internet that I want to be able to find too.

These mirror sites are also frustrating because they are blatantly leaching off of the work of both the StackOverflow developers and the StackOverflow community. Kind of makes me wish Google did have a blacklist for these cases.


When we find out about these cases, we ask them to conform to our creative commons license agreement by linking back to stackoverflow.com, in hopes that ultimately this will teach Google that they are non-canonical and rank them lower than us.


That might be so, but right now there's about three of them and they clog up the search results with essentially the same answer but lower quality for the reasons mentioned above. So far I've not seen any that outrank stackoverflow, but it's only a matter of time before one does some blackhat and jumps for certain questions.

You say that they link back, but in the cases I've seen, they link back with a slightly different Url form to the one you use.

For example : http://stackoverflow.com/questions/1037925/recreate-stack-tr... this question appears on scraping sites (not going to name them for fear of feeding them traffic), but they link back with this Url: http://stackoverflow.com/questions/1037925 - which doesn't resolve back to the longer version. I know there's a canonical link on it (for the longer url) but I'm wondering if this strategy is working or not.

Maybe it's time to start rejecting some bots crawling the site, or at least to slow them right down.

It's a problem for Google and for SO, so someone from both places should probably be talking about it. Because in the end, SO starts looking spammy because the others are spammy and scrape the content, which is exactly the opposite case of how things should be. The spam sites are just cloning SO and shoving crappy ads everywhere, it's not high quality content. When you ask a question, you want 5 different answers from 5 different site, not the same answer spammed 5 times back at you.


I haven't noticed too much change in Google; I generally just go straight to Stackoverflow first.


I m mostly always doing 'site:www.stackoverflow.com how do I do XYZ ?' in google ..


To conveniently search all of their stackexchange sites using Google including stackoverflow.com, try the search box on http://stackexchange.com/.

This somewhat helps with a problem they are facing which is fragmentation of questions and answers across their sites. For example, for Ubuntu you might need to search three or four of their sites before finding an answer to an issue you are having.


Duck Duck Go makes this a bit easier, just say !so xyz.


Even faster, I've setup "keyword searches" in Firefox (http://kb.mozillazine.org/Using_keyword_searches). I created the following bookmark:

  Name     : [so] Search StackOverflow     // the "[so]" here is just a mnemonic
  Location : http://www.google.com/search?q=site%3Astackoverflow.com+%s
  Keyword  : so
Then, when I want to search for "vim vs emacs" on SO, I open a new tab (Ctrl + T), type "so vim vs emacs", and press enter. This searches "site:stackoverflow.com vim vs emacs" on Google.

I have a similar smart bookmarks setup for HN searches, and you could get more creative if you want (DuckDuckGo search, or even a custom StackExchange search like rayvega mentioned, using "http://stackexchange.com/search?q=%s in the location field).

Chrome has something similar: http://www.google.com/support/chrome/bin/answer.py?hl=en&...


Just to clarify, in Chrome this works automatically. After performing one search on Stack Overflow, you can now start typing "stack overflow" into the address bar. As soon as you see that the first autocompletion is StackOverflow.com, you can hit tab and perform a search.

This works automatically with any site you perform a search on, with no need to configure anything. One of the main reasons I moved to Chrome after years of being an FF user.


Weird... it used to be on duckduckgo that if a Stack Overflow result was your first result, the answer would actually be displayed as well, but that doesn't seem to be working right now: http://duckduckgo.com/?q=Netbeans+Forwarded+over+X11+Font


You can always just search Stack Overflow directly (I consider it to have an adequate search engine), which is easily done in Chrome by typing stackoverflow.com (often only the first two characters), tab, and then your query.


you can drop the "www." prefix, btw.


While it's cool they've had a million questions, I feel more interested in seeing stats on things like:

# non-closed questions (broken out by answered/non)

# questions with an accepted answer

# questions forced to community-wiki vs those that were not


Just go here and type in your SQL query:

http://odata.stackexchange.com/stackoverflow/query/new


Thank you very much for making this data available Joel. This is really interesting stuff.


I think these are right.

# non-closed questions (broken out by answered/non) - http://odata.stackexchange.com/stackoverflow/q/12626/

# questions with an accepted answer - http://odata.stackexchange.com/stackoverflow/q/12627/

# questions forced to community-wiki vs those that were not - Don't know enough about their schema to answer this one. This could be the answer - http://odata.stackexchange.com/stackoverflow/q/12635/


>> # non-closed questions (broken out by answered/non)

Are the labels flipped? When AcceptedAnswerId is NULL, wouldn't that indicate a non-answered question?


Yes. I played with the order, but forgot to adjust the WHERE clauses.

Here's the right version - http://odata.stackexchange.com/stackoverflow/q/12795/


This could be done using their data dumps (however I don't have access to SQL Server so I couldn't tell you this).


The data dumps are in an xml format, you don't need SQL Server to work with the data.


My default google query for a programming problem has become.

[keywords] site:stackoverflow.com

And with 1M Q/A it looks like this isn't going to change any time soon!


Congrats and thank you for making the web a better place!


Wow, that truly is amazing.

On a funny note: I wonder how many of those questions were answerable by the first result of a simple google search? :P


My experience tells me that they usually are the first search result in google -- and when they aren't, I specifically look for them in the search results because the site is incredibly accurate and useful and I have come to trust it.


Just as a side note, please make sure to upvote both the question and any helpful answers if you've found the question using Google and it helped you as that's how Stackoverflow dynamics work.


It seems like very few people understand that it doesn't cost anything to upvote other answers and especially questions. Lots of good questions get around 0-2 votes, but clearly draw significant activity and interest otherwise. Are people afraid of point inflation?


On the StackOverflow podcast they had talked about some of the issues faced with awarding points for asking questions.

Apparently many people feel that asking a good question isn't worthy of points (or as many points) as answering a question.

They at one point (IIRC) were awarding points for asking a question, but that led to people spamming in low quality questions.

My personal rule is that if the question was interesting enough to make me read it, it should probably be upvoted.


They've also recently created a Gold "Electorate" badge, which is defined as "Voted on 600 questions and 25% or more of total votes are on questions." This should encourage people to vote up good questions as well as good answers.


I feel that since there are comments, answering a question should automatically upvote it. It depends on what the criteria for upvoting a question is - whether it's "not spam/trash or poorly worded" then this would work, I feel most people go on "it taught me something amazing/unique" however.


The problem is that it isn't clear WHY you should upvote a question. Upvoting an answer is helpful - I'm saying that I confirm that it's correct and you should believe it - but upvoting a question is just gold star time.


I upvote questions if I've wondered the same thing myself, and it's already been answered. I only use favourite to follow the discussion.


Ah dammit. I noticed a couple of days ago the number of questions was at 997k. I meant to keep an eye out and see if I could ping in number 1 million - but then promptly forgot.


How many of those are duplicates of existing questions — doesn't heralding growth contradict the original site goal of wiki-edited QA that stays fresh forever?


Bet they'd have more questions and answers without that stupid third-party login crap. (OpenID)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: