I 2nd this, minus Salesforce being hard to customize. It's probably too customizable.
At a certain stage of a company, sales reps expect Salesforce. So much so, that even when we finally caved and got it, I had reps specifically turn off Lightning and stick with Classic mode. It's like Bloomberg - it's what they know and can move fast in.
As much as it pains me to say this, it may make sense to have an option that can mirror the Salesforce UI vs reinventing the wheel. Or maybe even an integration / escape hatch to migrate everything off of Salesforce to this. Basically, if you want reps to adopt it, make it close as possible to what they know.
In terms of UI, Notion has been our main source of inspiration.
Targeting the bucket of Salespeople who enjoy the Notion UI is definitely a smaller one than those who know how to use Salesforce, but I feel like it might be a faster-growing one. It's a good point that it will become harder as we move to larger companies with a higher share of experienced sales reps that don't want to change.
Apologies. We've had a huge problem with bots, so we have a number of security measures in place. The Google Captcha component is probably flagging you as a bot.
Try disabling your VPN if you're on one, or use a different IP.
To be frank, we went back and forth on this, but in the end, thought the original title was ok. The only other large, "open source" dataset we could find was 9M. So after researching, we came to the conclusion that it sounded clickbaity, but was likely accurate.
And yes on "open source". We fully intend for this to be "open source" in the full meaning of the word, but it seems we were moving too fast and missed adding the formal license.
No, the source code is not available. This dataset is a subset of the raw data our system collects. Our final product made available via the API does a variety of processing steps on the raw data (dedupes, joins, ML predictions, etc). The final, processed data is the piece that is proprietary / subject to the terms.
We will update the site terms to reference this dataset, as we aim to continue releasing an updated version each quarter. I'll have to double check with the lawyers, but it will most likely be MIT licensed.
What exactly does "open-source" mean to you? Because it sounds like there's absolutely nothing open about this other than a small scraping of LinkedIn data (which you should probably ask your lawyers if you're even allowed to license out).
The wording isn't just misleading, it's a complete lie.
EDIT: Nevermind, the title has been updated to accurately reflect this being a small datadump.
Ok. I'm not sure how this happened, but I think the dataset was somehow mislabeled. It appears that this dataset is the Q1 version, not the latest Q2. Can you please try re-downloading it?
We're probably going to have to make an public announcement about this...
269708: London
124059: Paris
113135: New York
99314: São Paulo
75428: Los Angeles
69691: Madrid
67328: Toronto
63738: Dubai
63456: New Delhi
I would not have expected São Paulo to come fourth in this list, after New York but in front of Los Angeles. I just learned it's the 4th largest city in the world https://en.wikipedia.org/wiki/List_of_largest_cities - after Tokyo, Delhi, Shanghai - but I guess it has much more of a representation on LinkedIn than those other cities.
Hey HN, we're thrilled to announce our latest project - the World's Largest Open Source Company Dataset. Our team has been working hard on this product for the past few months, and we're excited to finally share it with you all.
We started off years ago trying to build a B2B app, but getting basic company data at scale was a huge barrier for us. This 15M+ record dataset attempts to solve that and has all the key company fields like name, industry, size, location, LinkedIn handle, etc. We aim to update it quarterly to ensure that you always have the most up-to-date information.
Disclaimer: Okay, we have to admit, we didn't exactly comb through every dataset out there to verify that ours is the world's largest, but we did our research, and we're pretty sure it might be. Whether or not that's true, we believe this dataset is a robust and invaluable resource for anyone interested in company data.
Data is very likely to be from LinkedIn if you look at the field descriptions and stats. The only field that is 100% available is the one based on the LinkedIn URL. I would guess scraping unless LinkedIn provides an API for this data that I can't find.
Most likely from scraping (crunchbase, yahoo,etc), unless they bought it from somewhere. In most countries you can get it from the chamber of commerce. Dun and Bradstreet and other similar companies. Some of these data aggregators will have partnership with other companies, and you can also (illegally) scrape it from there.
So this blew up today. Reviewing all the comments now.
1. Yes, this is scraped from public sources.
2. Yes, this is free to use / is open source in the broadest sense. Apologies for the confusion on the lack of a license and no mention about this in our TOS. We probably should update our TOS to be clearer here.
3. This is a raw dump of companies from all over the world by LinkedIn handle. The handles are deduped, but the website is not.
The data is not licensed as opensource (only non-commercial use): "The Service and its entire contents, features and functionality (including but not limited to all information, software, text, displays, images, video and audio, and the design, selection and arrangement thereof), are owned by the Company, its licensors or other providers of such material and are protected by United States and international copyright, trademark, patent, trade secret and other intellectual property or proprietary rights laws. You are permitted to use the Service for Your personal, non-commercial use, or legitimate business purposes related to Your role as a customer of BigPicture.io." - https://bigpicture.io/terms
This looks great. I'll have to play around with it.
Related, we built a developer oriented Zapier clone for event scale automations awhile back for our internal product. We've since pivoted and have been debating on potentially open sourcing the engine as well.
We built ours using Rust with a DSL for all the triggers, actions, and action inputs/outputs. The actions themselves are defined as APIs, which makes it easy to add functionality in any language. Most of our actions have been built in Typescript.
Is there interest from anyone in potentially using it?
Especially when it's described along the lines of "nearly limitless energy that will fundamentally change humanity". That's the sort of utopian promise that is automatically very suspect on its face. Combine that with "at some unknown point in the future", and the whole thing gets too ephemeral to get all worked up about.
Basically, we're addressing the community's feedback and providing general updates on the project. Open to any and all thoughts.