Finding a co-founder is hard

Finding a technical co-founder for a new startup is notoriously hard. After struggling with it for a while, I decided to figure out exactly how hard it is.

To do that I simply estimated how many needles there are in the co-founder haystack. The answer is discouraging but intuitively accurate: there are only about two thousand potential co-founders in the entire United States.

How hard? Let’s use Fermi Estimation

In Fermi estimation, we wave our hands and spin minimal knowledge into approximate truth. The goal is to quicky get an estimate that’s close enough to correct for our purposes. For a link-bait blog post, “close enough” is not very close at all. And that’s close enough®.

We do Fermi estimation by breaking down our final quantity (number of viable co-founders) into a chain of constituent quantities. I model the problem like this:

Viable co-founders =

(% of programmers who are 2 standard deviations above the mean)

* (total number of programmers in the world)

* (% of world population living in the USA)

* (% of Americans in a startup approriate age-range)

* (% of programmers who know a given technology)

Now we basically just guess each of those quantities (with a Google sense check if possible)

% of programmers who are 2 standard deviations above the mean = 2.5% (by math)

total number of programmers = 25 million (number of StackOverflow accounts)

% of world programmers in the USA = 5% (US pop as % of world pop)[1]

% of Americans in a startup approriate age-range = 20% [2]

% of programmers who know a given technology = 30% (guess)

Now run the numbers:

Okay, so that all adds up to:

2.5% * 25,000,000 * 5% * 20% * 30% = drum roll 1875

So that’s it! Out of a US population of 319 million, a mere ~2000 people are viable co-founders. That’s just 1 in 1.5 million people.

Dang that’s hard.

Then factor in the fact that the vast majority of great programmers are already employed (or 3 years into starting their own succesful company), and that only a small % will have your same interests and a compatible personality. It becomes easy to see why so many startups deliver bad technology or are torn apart by co-founder conflicts.

So…good luck!

P.S. If you’re looking for an opportunity and love music, metadata, and HCIR, shoot me an email at george j london on gmail. Or if you just like Fermi estimations, follow me on Twitter

Reflections on the Semantic Technology and Business Conference, SF 2012

In June, I attended the enlightening “Semantic Technology and Business” conference in San Francisco. These are my reflections.

First, a little background on myself – I graduated college in 2008 with a degree in philosophy and very little programming experience. I worked for three years at a very large, very technology-oriented hedge fund where I did macroeconomic modeling and built large statistical models. I left there in 2011 and ever since I’ve been pouring all of my energy into learning to program, learning semantic technologies, and learning the entrepreneurial ecosystem.

So, in short, I’ve come to the Semantic Web with a fairly non-traditional perspective compared to most Linked-Datanauts. For one thing, I could count on one hand the number of conference attendees under the age of 30. My informal impression is that most of the Semantic Web community consists of very experienced programmers with extensive backgrounds in either academia or in traditional large-corporation IT.

Some consequences of my relative youth and inexperience are that

1)   I’ve completely missed the decade long back-history of the Semantic Web;

2)   I have no experience in enterprise IT, and I don’t at all understand the technology problems large corporations face nor the politics that drive technology adoption;

3)   I’m easily befuddled by technical discussions that go deeper than my limited (but growing) technological understanding

I bother with the extended preamble because it explains why my impressions will have to be very…well…impressionistic. Eighty percent of the content at SemTechBiz was either about the (increasingly successful) war to convince enterprises to deploy Semantic technology, or else was focused on deeply technical low-level details of infrastructure technologies. In other words, it went above my head. Nevertheless, I do have a few hopefully at least mildly interesting remarks.

My impressions all center around one central theme – the Semantic Web is an adolescent industry. And that’s fitting, since it’s around 15 years old. All over the conference there were clear signs that Linked Data is out of its infancy and is being used in real enterprise settings to solve real pressing problems. For example, Merck is using Linked Data to power a widely used internal research tool. The U.S. intelligence community (seems) to be heavily using Linked Data for counter-terrorism and other purposes. Nearly all the major database vendors (Oracle, IBM, etc.) have released adapters to facilitate working with RDF. Google, Microsoft, and Yahoo all have semantic research teams and have come together to support the Schema.org metadata standard (elements of which purportedly now appear on a startlingly high percentage of all webpages.) Facebook has convinced webmasters to mark up something like 40% of all webpages with RDFa open graph tags.

The technology stack is also improving rapidly. There were a lot of technologies demonstrated at SemTech that I’m excited about, especially around the theme of “scalability.” There are now billions of triples in the cumulative open Linked Data Cloud, which is far more than existing single-machine Triplestores (Semantic Web databases) can handle. So a move toward distributed computing is going to be critical.  Sindice Tech is making RDF processing much more scalable by applying Hadoop batch processing. The Bigdata triplestore (which I’m using for a project) is able to handle unprecedented volumes of RDF on a single machine or easily convert to a cluster. Google Refine is being applied to make ingesting large volumes of “idiosyncratic” RDF more manageable.

There was also a great deal of presumably-exciting-but-hard-for-me-to-understand technology targeted towards various enterprise problems.

For all the great stuff happening, there are also clear signs that the industry is not yet fully mature. I am very optimistic about the Semantic Web (which is why I’m devoting so much of my time to learning about it), and I sincerely believe that all the problems can and will be overcome. But the problems do exist.

From what I’ve been told, much of the (ever more robust) open source Semantic Web technology stack has been built primarily with government research money. Now the tap of money has turned off, and longstanding projects are struggling to generate cash flow to support themselves.

This is leading to one of what I see as the industry’s biggest challenges, fragmentation. There are many different small companies building and trying to sell important but basic infrastructure components like Triplestores or Enterprise Knowledge Management Platforms. Having personally tried most of the Triplestore options, I can personally attest that the offerings are probably not differentiated enough to really merit the existence of so many choices (and if anything the panoply of similar and usually under-documented options makes learning about and adopting the stack even more intimidating.) “Forking” makes some sense in the Open Source world, but it makes a lot less sense in the commercial world where one team of twelve can accomplish a lot more than four teams of three.

The other major problem is the talent pipeline – there are clearly a lot of very smart people working on Semantic technology (I personally met many of them at the conference). But one of the Semantic Web community’s greatest strengths – it’s internationalism – is also a weakness because it makes finding and consolidating talent to work on a single project very difficult. It’s just not feasible to start a company with employees in 10 different countries. And because the Semantic Web community is still small and the total number of skilled Semantic developers in any given country can be measured in the low thousands, the problem is even worse. We’re caught in a catch-22 where even ifan enterprise has a great use case and permission to spend millions of dollars on a Linked Data approach, they may (quite reasonably) demure simply for fear of not being able to find 10 developers to work on the project.

And on the other side of the pipeline, we’re suffering from a distinct lack of “MBA” types who are experienced at making emotional compelling sales pitches or raising money from investors. Ironically, the rest of the tech world has an enormous glut of MBA types who are desperately scrounging for developers to latch on to, but none of them seem to have discovered the Semantic Web world yet. Perhaps because we don’t have the MBA types to sell it to them!

I think that each of these problems will work itself out over time, but the one guaranteed cure-all is a single highly visible home-run commercial success. And I suspect that the company that will hit that home run is being started right now in a garage. 

It’s an exciting time to work on the Semantic Web. The technology is increasingly solid, powerful and mature, and it’s very clear that the community is still strong and still full of energy and ideas.

Big things are about to happen.