Could you get away with allowing folks to manually paste in the HTML of their own linkedin page, or perhaps upload a PDF version of their linked in page?
Are you scraping the page for input params? Would it be possible to just expose an API (or web forms that point there) to let folks put in those parameters manually?
oy, that would actually have been a much better way of doing it. I was trying to be clever and keep the UI simple, and totally didn't expect to hit rate limits so quickly.
You can always try I guess. Whether or not it works is another story :-p
I started using tor for this type of thing. It's very handy in that it generates a new circuit every 10 minutes or so and hence you get a new exit node.
To avoid the LinkedIn issue, I crawled job ads (about 10M uk ads over two years), and applied analysis to them. I love your approach - I'm going to have a play this weekend :)
LinkedIn has so many amazing datasets locked up in the platform. I was keen to see whether I could reproduce the skill graph that LinkedIn has, and apply it to other problems, such as valuing and comparing skill sets and knowledge from job titles alone.
Oops! LinkedIn error. Please check your url.
But I'm pretty sure my URL is correct. It's the gray little URL underneath my profile picture? Similar to the example shown in the text input on your page?
I'd love to talk more details with you sometime! We(PayScale) don't scrape linkedin, but we have been collecting, analyzing and predicting pay for over a decade.
One way to train your model would be to allow people to tell you their current salary in exchange for the analysis. Then you get more accurate salary data. Of course people could lie, but that's probably no better than the glass door data anyway.
(2) http://glassbowl.info/ is not leading for me
(3) If you did scrape LinkedIn, you probably violated their ToS, so you may not want to admit this publicly in case this goes anywhere :)