tech, George London

In June, I attended the enlightening “Semantic Technology and Business” conference in San Francisco. These are my reflections.

First, a little background on myself – I graduated college in 2008 with a degree in philosophy and very little programming experience. I worked for three years at a very large, very technology-oriented hedge fund where I did macroeconomic modeling and built large statistical models. I left there in 2011 and ever since I’ve been pouring all of my energy into learning to program, learning semantic technologies, and learning the entrepreneurial ecosystem.

So, in short, I’ve come to the Semantic Web with a fairly non-traditional perspective compared to most Linked-Datanauts. For one thing, I could count on one hand the number of conference attendees under the age of 30. My informal impression is that most of the Semantic Web community consists of very experienced programmers with extensive backgrounds in either academia or in traditional large-corporation IT.

Some consequences of my relative youth and inexperience are that

1) I’ve completely missed the decade long back-history of the Semantic Web;

2) I have no experience in enterprise IT, and I don’t at all understand the technology problems large corporations face nor the politics that drive technology adoption;

3) I’m easily befuddled by technical discussions that go deeper than my limited (but growing) technological understanding

I bother with the extended preamble because it explains why my impressions will have to be very…well…impressionistic. Eighty percent of the content at SemTechBiz was either about the (increasingly successful) war to convince enterprises to deploy Semantic technology, or else was focused on deeply technical low-level details of infrastructure technologies. In other words, it went above my head. Nevertheless, I do have a few hopefully at least mildly interesting remarks.

My impressions all center around one central theme – the Semantic Web is an adolescent industry. And that’s fitting, since it’s around 15 years old. All over the conference there were clear signs that Linked Data is out of its infancy and is being used in real enterprise settings to solve real pressing problems. For example, Merck is using Linked Data to power a widely used internal research tool. The U.S. intelligence community (seems) to be heavily using Linked Data for counter-terrorism and other purposes. Nearly all the major database vendors (Oracle, IBM, etc.) have released adapters to facilitate working with RDF. Google, Microsoft, and Yahoo all have semantic research teams and have come together to support the Schema.org metadata standard (elements of which purportedly now appear on a startlingly high percentage of all webpages.) Facebook has convinced webmasters to mark up something like 40% of all webpages with RDFa open graph tags.

The technology stack is also improving rapidly. There were a lot of technologies demonstrated at SemTech that I’m excited about, especially around the theme of “scalability.” There are now billions of triples in the cumulative open Linked Data Cloud, which is far more than existing single-machine Triplestores (Semantic Web databases) can handle. So a move toward distributed computing is going to be critical. Sindice Tech is making RDF processing much more scalable by applying Hadoop batch processing. The Bigdata triplestore (which I’m using for a project) is able to handle unprecedented volumes of RDF on a single machine or easily convert to a cluster. Google Refine is being applied to make ingesting large volumes of “idiosyncratic” RDF more manageable.

There was also a great deal of presumably-exciting-but-hard-for-me-to-understand technology targeted towards various enterprise problems.

For all the great stuff happening, there are also clear signs that the industry is not yet fully mature. I am very optimistic about the Semantic Web (which is why I’m devoting so much of my time to learning about it), and I sincerely believe that all the problems can and will be overcome. But the problems do exist.

From what I’ve been told, much of the (ever more robust) open source Semantic Web technology stack has been built primarily with government research money. Now the tap of money has turned off, and longstanding projects are struggling to generate cash flow to support themselves.

This is leading to one of what I see as the industry’s biggest challenges, fragmentation. There are many different small companies building and trying to sell important but basic infrastructure components like Triplestores or Enterprise Knowledge Management Platforms. Having personally tried most of the Triplestore options, I can personally attest that the offerings are probably not differentiated enough to really merit the existence of so many choices (and if anything the panoply of similar and usually under-documented options makes learning about and adopting the stack even more intimidating.) “Forking” makes some sense in the Open Source world, but it makes a lot less sense in the commercial world where one team of twelve can accomplish a lot more than four teams of three.

The other major problem is the talent pipeline – there are clearly a lot of very smart people working on Semantic technology (I personally met many of them at the conference). But one of the Semantic Web community’s greatest strengths – it’s internationalism – is also a weakness because it makes finding and consolidating talent to work on a single project very difficult. It’s just not feasible to start a company with employees in 10 different countries. And because the Semantic Web community is still small and the total number of skilled Semantic developers in any given country can be measured in the low thousands, the problem is even worse. We’re caught in a catch-22 where even ifan enterprise has a great use case and permission to spend millions of dollars on a Linked Data approach, they may (quite reasonably) demure simply for fear of not being able to find 10 developers to work on the project.

And on the other side of the pipeline, we’re suffering from a distinct lack of “MBA” types who are experienced at making emotional compelling sales pitches or raising money from investors. Ironically, the rest of the tech world has an enormous glut of MBA types who are desperately scrounging for developers to latch on to, but none of them seem to have discovered the Semantic Web world yet. Perhaps because we don’t have the MBA types to sell it to them!

I think that each of these problems will work itself out over time, but the one guaranteed cure-all is a single highly visible home-run commercial success. And I suspect that the company that will hit that home run is being started right now in a garage.

It’s an exciting time to work on the Semantic Web. The technology is increasingly solid, powerful and mature, and it’s very clear that the community is still strong and still full of energy and ideas.

Big things are about to happen.

Tag: tech

Finding a co-founder is hard

How hard? Let’s use Fermi Estimation

Now run the numbers:

Dang that’s hard.

Reflections on the Semantic Technology and Business Conference, SF 2012