Segment Sources – Load Salesforce, Zendesk, Stripe into Redshift and Postgres

samcheng · on April 6, 2016

ETL-as-a-Service is a great idea, particularly one that is visualization/analytics-tool-agnostic!

However, there are so many data sources, and they all require different integrations with their different APIs or export mechanisms. A service isn't really useful unless it can import the lion's share of services that a given company uses...

pkrein · on April 6, 2016

You’re right. There are a lot of sources out there. It’s a ton of work for companies to build out their own pipelines and learn every new API. We want to save them from that burden so that they can focus on the analysis. We’ll be adding many more connections in the coming weeks and months, and also opening up the platform for cloud services to add themselves. Stay tuned!

rsobers · on April 6, 2016

Eh, you'd be surprised how much value a company can get just by marrying a few data sources (e.g., marketing automation + google analytics + CRM).

Doing this right now manually piping data into PostgreSQL via Heroku and using Chartio to visualize and query.

dwmintz · on April 6, 2016

I don't really agree. I mean, yeah, comprehensiveness is great, and it sounds like Segment is working towards it. But every integration they build is one less custom integration that your data engineers have to build.

georgewfraser · on April 6, 2016

My company (Fivetran, YC W2013) offers the same service and supports a lot more sources (https://fivetran.com/integrate), including relational databases.

far33d · on April 6, 2016

We have been very happy with sources - it doesn't cover EVERY service we use (yet) but taking even just one or two out of in-house ETL is a huge benefit.

dtjones · on April 6, 2016

Agreed. I don't use any of those services. Seems the product integrations limits the customer pool quite a bit

dan_ahmadi · on April 6, 2016

I wonder if this makes BI companies freak out a little bit -- because pushing this data into redshift and adding a visualization layer on top takes care of most smaller scale BI needs...

greggyb · on April 6, 2016

From a BI consultancy perspective, not in the slightest. The amount of time we spend on ETL is not because of the difficulty of piping data around from place-to-place. The difficulty is in modelling data appropriately to support ad-hoc analyses. Rather, the E and L portions don't provide much difficulty (hassle, frustration, sometimes time, sure, but they're not inherently difficult).

The T, transformation, is huge in many ways. Think of it this way: the data model is the primary UI for an analyst or any power user. It also dictates query performance.

Adding a visualization layer on top of Salesforce's schema, e.g. is not too helpful, regardless of where that data is living. You can answer trivial questions without too much difficulty, but the difficulty ramps up quickly.

The data access patterns, types of logic necessary, and end-user demands are hugely different between an OLTP and OLAP workload.

There's also potentially huge complexity in conforming dimensions across disparate source systems' data.

Master data management is another huge component that hits a lot of the ETL pipeline.

These concerns are all on top of hooking up the right ends of the hose to one another.

I don't mean to disparage the product or company and hope I don't come across as if I am. I just want to point out that they address only a small component of a large process, which in turn is only a segment of the BI lifecycle.

dwmintz · on April 6, 2016

Agree that it's only one part of the data lifecycle, but if you don't have your own team of engineers to build all those connectors, you can't even get to the point of being able to transform your data.

We at Looker (disclosure: work there), are totally focused on making the transformations fast, flexible and powerful. But without the data to transform, there's nothing we can do to help customers. So we're super psyched that Segment is stepping in to fill this void and get the E and L done, so we can T.

greggyb · on April 7, 2016

Absolutely. I was just trying to illustrate where a single tool doesn't pose a threat to BI companies, particularly from a consulting angle.

Tools like Segment and yours are invaluable. I work for a Microsoft partner, so often am stuck in that ecosystem for better or worse, but the problems and solutions in the BI space are largely universal and transcend specific tools. Even when the tools make individual tasks trivial, the overall architecture and design of a data pipeline and visualization solution leave plenty of room for companies like the one I work for.

pkrein · on April 6, 2016

Actually, we don't have any interest in being a visualization tool, and are super focused on building customer data infrastructure of the future.

This product release is in close partnership with our BI partners (Looker, Mode, Wagon, Periscope, BIME and Chartio). One of the biggest problems our mutual customers face is getting data into their warehouse so that they can use the BI tool in the first place. This launch significantly expands the possible audience for them.

Even better, all of our BI partners built out-of-the-box reporting and dashboards based on Segment's schemas for these new third party sources. So our mutual customers can get set up even faster.

ngould · on April 6, 2016

Hey pkrein, I am a Segment fan and startup-scale user of your warehouse service. The warehouse service is super convenient, but its main shortfall for us is that you don't offer historical backfilling of data more than 60 days before the warehouse was enabled. That is the data we need in our warehouse most.

There are other ways to load in Salesforce, Zendesk, or Stripe data. It is certainly nice to be able to do that all with Segment -- but it is not necessary. Sources is nice to have, but the core warehouse service is not really complete (for us, and I suspect others too) until you can seamlessly support data backfill at all price tiers. A one-time fee for backfill fee would be okay, but saying "no we don't support that" makes me sad.

pkrein · on April 6, 2016

Yep, as part of this launch we've revised our pricing. If you commit to an annual plan (even self-service plan) we will load all of your historical data, not just 60 days. Just email friends@segment.com after signing up for the paid annual plan and we'll make it happen on the next sync.

sandGorgon · on April 7, 2016

Interesting. The pricing is a little confusing for the rest of us who are considering Segment for the first time (the warehouse part..not the integrations) - especially the size of data and how we load it.

Maybe you are estimating data size based on known types of data in salesforce,zendesk,etc. But what if i want to load data from my internal dB as well? The 60 day notions,etc.

ngould · on April 6, 2016

Very cool, thanks for the update.

vyrotek · on April 6, 2016

Can someone use Segment to generate per-user reporting data?

So, instead of Total Company Sales this Month, I want Bob's Sales, Joe's Sales, etc. It feels like this is just a filter on top of what you already have. Almost like a parameterized query? (pass in @UserId for a where statement)

I originally thought maybe this is the job for one of your visualization partners but you really need to filter the results before you perform the aggregate.

n2parko · on April 6, 2016

hey vyrotek - you can definitely build that report using Segment sources + BI tool.

Our salesforce source pulls in the Salesforce `Opportunities` and `Users` table (your sales team members). So to get sales by sales rep, you can join the `Opportunities` table to `Users` table, and then aggregate by sales rep.

Once you get the raw data into your data warehouse, you have a ton of flexibility with how you aggregate and analyze it.

dwmintz · on April 6, 2016

That's kind of a broad question, but in theory, for sure. Once you've got the raw data in your own warehouse, you can roll up at any level of specificity. So that could be per salesperson, per product, per date, per office, whatever. The key is first getting the raw data out of the vendors' control and into your own warehouse.

dwmintz · on April 6, 2016

As somebody who works on a data exploration platform (Looker <- disclosure), I'd say that far from making us freak out, it makes us super happy (which is why we're so excited to partner with Segment).

I can't tell you how many potential customers are crazily excited about the idea of centralizing the data that all their apps produce into one central warehouse and putting Looker on top of that, but are stymied by the middle step of actually getting the data OUT of the vendors' APIs and IN to their own warehouse. Being able to point them to an off-the-shelf solution for that problem is a big win for us.

TheLogothete · on April 6, 2016

The BI space is absurd. BI viz is the photo apps of the b2b world. Everyone and their grandma thinks they can make one. Lots of VC money gonna get burned.

endlessvoid94 · on April 6, 2016

Agree. But Segment + Looker is a KILLER combo.

dwmintz · on April 6, 2016

Thanks endlessvoid94 :-) (disclosure: work at Looker and totally agree)

primeobsession · on April 6, 2016

RJMetrics has a similar product (ETL as a service) with 10x the number of rows for their free tier. https://rjmetrics.com/product/pipeline/

uberneo · on April 6, 2016

What a coincidence, today only I came across another ETL as a service from Pintrest - https://news.ycombinator.com/item?id=11438216

vyrotek · on April 6, 2016

Is this just running the SF SoQL statement for you and storing the aggregated result in a Segment table?

Or does Segment provide a way to completely clone entire SF tables such as Opportunities & Cases and then create the aggregate queries later in segment?

sperand_io · on April 6, 2016

Great question! When you enable a source, we begin running a job on an interval for you that pulls the data from the source, applies some light normalizations and transformations, and sends the data to our Object API (which is in charge of upserting the data into Segment and flushing it to your warehouse).

In Salesforce's case, we issue bulk queries to pull the complete collection on the first run, then modify the queries thereafter to request only data that's changed since the last run.

We don't do any aggregation of the data. We load it into a data warehouse (redshift or postgres) in its complete, raw form so that you can use SQL to aggregate/join to your heart's content. Here's an example: https://help.segment.com/hc/en-us/articles/208215583-Salesfo...

npace12 · on April 6, 2016

You should make this available for trial. No way I'm paying $449 just to try it.

I set it up with the developer account, but when I try to connect the integration, it says it's not available for the developer plan. I was like "ok, fine, i'll pay 50 bucks to try it" but then it says salesforce is only available on the Growth plan, which is $449.

Closed the tab, and moved on with my life.

sperand_io · on April 6, 2016

Sorry for the confusion there — our integration that sends data to Salesforce is on the integrations growth tier, but the pricing for data sources and warehouses is separate (and does indeed have a free trial :).

Your point prompted us to add a callout in the UI to prevent this sort of confusion going forward (see here: https://www.dropbox.com/s/yoc6a3nvsoupesa/Screenshot%202016-...)... and more generally we'll be working to make the distinction between data sources and data destinations clearer from a UX perspective in the near future.

joshdick · on April 6, 2016

This product has Salesforce replication, and it's free to try: https://rjmetrics.com/product/pipeline/

pinaceae · on April 7, 2016

just be careful with SFDC, bunch of harsh limits, killer one being 5k bulk api requests per rolling 24h.

if you think that's a lot you haven't seen big orgs with a shit ton of integrations and custom stuff on top of them.

josep2 · on April 6, 2016

Segment is always killing it.

grinich · on April 6, 2016

They're totally doing great! (But we should use less violent language to describe it...)

slachterman · on April 6, 2016

Are there any plans for Warehouses support for IBM's DashDB?

n2parko · on April 6, 2016

Hey slachterman DashDB isn't yet on the roadmap, mind filling this out so we can follow up?

https://segment.com/contact/requests/warehouse

vyrotek · on April 6, 2016

Please add options to use Windows Azure as a warehouse!

n2parko · on April 6, 2016

hey vyrotek thanks for the request! mind filling this out a request and we'll follow up? https://segment.com/contact/requests/warehouse