I'm the author of this piece. would really appreciate your thoughts and comments. I’d love to learn about other datasets that could really change the trajectory of society if made accessible. Also would love to read a counter-argument to this.
I'm one of the authors of this (along with Will Landing, CEO of FICO). would really appreciate your comments. I've been getting a lot of comments about good examples of standards that we forgot to include -- please list them here and I will update the piece.
I've been rereading DaaS 1.0 today. I like 1.0 because it explains so many aspects of DaaS in simple words.
My biggest takeout of 1.0 is about crossing the datasets (data enrichment), and an easy way to give a try of data for your clients.
I'm building my own DaaS startup[1] - B2B API to search through online news articles. And, every month after the launch more and more things from DaaS Bible start to make more sense to me.
The topic that I and our CTO would like to know more around DaaS:
Data agreements while selling our service. What/how can the derivatives be redistributed/republished? Should we always time-bound our data (if the client stops the subscription they must unload it)?
P.S. According to our data here's the latest article[2] mentioning Safegraph
I've sent DaaS 1.0 to lots of people, and tell them when you read it EVERYTHING is going to seem obvious to you. At the end ask yourself how many of those "obvious" things could you have articulated before reading. People who are honest will say they didn't know most of the main points before reading.
Working in an internal startup at a fortune 50 company, DaaS is a very good guide for me to the points that we're not quite getting right.
One of the most under appreciated aspects of data is the processes and devices that create the data. I've found that most people think that all you have to do is link data x to y and voila get on your way. The successful integration of data across systems is more often a challenge of processes not meshing. I see very little in the discussion about how to overcome this process impedance mismatch. There needs to more agreement on where a process ends and its output.
On the technology side where the systems are complex and there is a need for further understanding I've seen some success with RDF where there is more of description with of relationships and one can form a topology. RDF though requires a big jump in knowledge and has only been worth it when there is no other way. Superficially it is simple, but it introduces a lot more questions at times that can lead to a lot of complexity.
I've seen standards generated in the utility space through CIM and IEC via extensive collaboration between vendors and the utilities, but the environment is very different with a lot less competition and narrow scope. This has also been achieved at great cost that most companies won't want to bear. Further, it has still been extremely hard as utilities have different network standards around the world. Look at things like IEC 61968-9 or CIM for transmission and distribution networks for examples. Green button is another example, but I would suggest it is easier as it is more of a final node problem and not right in the middle of a more complex process.
If the focus is more on FICO like situations, it is likely a lot easier as it is more of a MDM like view where there is a strong centre and much weaker nodes around that centre that get good value from the conformity. But it isn't clear if there are a lot of opportunities like that and that those likely don't need much of a strong standards base they just need to hit the right problem at the right time and define it as they like.
In dealing with marketplaces and being in the position as a supplier I've found that the supplier position is pretty weak and the marketplace does whatever they want. When using old standards like EDI they just put in very dirty (poorly processed) data and let the supplier just deal with it. There is a lot of preference to work with better marketplaces, but sales teams really don't care and the cycle goes on. The incentive models can be quite skewed.
agreed Andrew. We we need to democratize access to data. Of course, that does not mean that the underlying data needs to free. It can be expensive. But the data needs to be available.
If you are a self-driving car start-up today with $120MM in funding, you still have no data on how humans drive or how pedestrians walk. That's something that is going to be really important. And Apple and Google (both working also on self-driving cars) have a massive amount of data on drivers.
> And Apple and Google (both working also on self-driving cars) have a massive amount of data on drivers.
That assertion may be correct regarding purchases, browsing habits (especially for Google) and places you visit via their respective map applications, but it doesn't compare to the staggering amount of raw data [0] that Tesla has on drivers.