In Our Thoughts

I have spent the last few years investing in Vertical Data Companies. Driving this is a belief that there will be a set of companies, built on a core set of insights derived from data, that will each come to dominate their respective industry. I wanted to take some time to explain in more detail and provide the reasoning supporting this thesis.

When I talk about data companies, I am talking about companies built around a core data pipeline. These companies typically fit the following pattern:

A team that has figured out a novel way to collect and analyze data, leverage cheap compute resources and machine learning, and derive insights that can be applied to solve an existing problem.

That entire data pipeline is important to me. Capturing the data without the ability to derive insights and solve problems, often doesn’t capture enough value in the system. Deriving insights from freely available data can be valuable but often lacks a competitive barrier and will attract fast followers.

By vertical, I mean companies that are focused on a particular industry or sector. A focus on vertical companies is nothing new in venture capital — large enterprise SaaS investors have seen big wins in this space in the likes of Veeva and Guidewire. Some of the benefits include higher market share, less competitive pressure, concentrated customers, lower CAC, and a lower cost to build a meaningful brand. My deviation from this thesis is that I think that there are additional ways to deliver that insight beyond the seats & tiers SaaS model. Software is always a component of the solution but I’m also happy to support companies that are monetizing their core insight via a broker or marketplace business model. The industry, end customer, and the expertise of the founders drive the product and business model.

Pulled together, the framework looks something like this:

Vertical Data Company — Insight & Action Pipeline

For my portfolio company REsurety:


Additional examples from my own portfolio include Impact HealthWunder CapitalGinger.ioCeres ImagingGridcure, and Mendel. You can read more about my latest investment in Mendel here.

Underlying Reasoning

At the macro level, my view is influenced by Carlota Perez’s Technological Revolutions and Financial Capital, which posits that the past technological revolutions have followed similar ~60–75 years cycles. These cycles are summarized in the graphic below:

Using this framework, the previous ~45 years have seen the installation of Information & Communications Technology (ICT). In 2017, we have reached a point where this technology is now available globally — the world is networkedinternet penetration is over 80% in most developed nationsmobile penetration is over 60% globally, global IP traffic continues to grow exponentiallythe cost of compute power has already dropped 100X+ since 2000, and a single platform has over 2B daily active users.

These dynamics create opportunities for companies that can leverage technology at the forefront of the ICT curve including data storage and processing, machine learning, open source software, connected hardware, and mobile. As compute power and software become commoditized, companies are increasingly relying on proprietary data and distribution to build lasting competitive advantages.

Of the two, building a competitive moat via proprietary data is often the easier and more capital efficient route for early stages. Distribution is still crucial but it is really hard and/or expensive to stand up a competitive barrier through distribution alone during the early days. I prefer the route of proprietary data > large market share in a vertical > established distribution channels for follow on products.

What this means for founders

If you think that you are building a company that fits this mold, I would love to connect. To get a jump start, here are the major questions that I always ask of potential investments:

  • What data are you collecting? How is it unique? Is it proprietary?
  • What insights have you drawn out from the data? Tell me more about how the technical team is using AI/ML technologies. Tell me more about the business / product team that is directing the focus and asking the right questions.
  • Tell me about the problem. Why is it important to solve? How big of an opportunity is it?
  • How well do you know your end customer / user? How does your knowledge of the industry, the problem, and the prevailing workflows inform your product decisions? How will you monetize that insight?
  • David J. Wilson

    Hi Tom
    I enjoyed your thoughts paper on vertical data companies. We have a data pipeline of mostly proprietary data focused on the HAI (Hospital Associated Infections) market where we are serving 500 hospitals currently. We are very early on in our business evolution with a unique offering to help address a preventable cause of death in the Healthcare arena.

    I am meeting Yaffet Menna tomorrow in a 6 minute pitch as part of the Funding Founders program at HLTH 2018 in Las Vegas. I look forward to a possible chat with you at some time in the future.

Leave a Comment