George London | George London

[Quickstart] Using Neo4j and Tinkerpop to work with RDF. Part 2!

December 20, 2011 George LondonLeave a comment

Another day, another reserve of patience to deal with trying to make this configuration work.

This is part 2 in my series of blog posts about how to get Neo4j with Tinkerpop running as an RDF triple store, assuming that you start not knowing/using Java.

When we left last-time, we had just downloaded the project source-code from a recent blog post by Davy Suvee (see my last post for the link). We opened it in Eclipse, and noticed that there are, like, 2000 broken dependency errors.

After a couple of hours of face-palming, I finally figured out what was going wrong and fixed it. Which brings us to…

STEP 7: Understand how Eclipse and Maven work together.

So you know in Ruby, you type ` gem install unicorn_magic `, some stuff downloads, and from then on your Ruby scripts just have to include ` require unicorn_magic `, and everything just works? Well, when you install a gem in Ruby, it downloads it into some magic folder (I’m not exactly sure where, but unless you care about which version of a gem you’re using, it doesn’t really matter since the programmer is well abstracted from the guts of the language), and from then on, the Ruby interpreter is smart enough to see ` require unicorn_magic ` and think “I should go look in my special magical jewel box for unicorn_magic, and then automatically make it available to my programmer friend so that he can live in the land of smiles and rainbows."

Java doesn’t do that. Java does not care about rainbows and smiles.

Instead, Java has "the classpath”. I’m probably butchering this concept, but as far as I understand it, individual Java projects have to explicitly tell the compiler where to go look for any external packages of code they want to include. They do that by specifying a classpath, which is a set of file-paths that Java will scan through to see if it can find the packages you specified.

So if you want to include packages (like, for example, the entire Tinkerpop framework), you have two options:

1) You download individual .jar files (i.e. “java archive” files, which contain compressed bundles of classes, which, remember, all Java code has to be contained within. For some reason.), and then you tell your project where to find those jars. To do that in Eclipse, you right click on your project, click Build Path –> Configure Build Path –> Add External Jars, then add packages one by one.

Fortunately, we won’t have to do that, because we can:

2) Use Maven. So Maven does a lot of stuff, but for our purpose here, it’s most important function is automatically managing dependencies.

The basic way that Maven works is by inserting a “pom.xml” (i.e. “Project Object Model”, since everything in Java is an object!) into your project. This xml file specifies the exact configuration of your project, most crucially all of the dependencies.

So if we take a look at the neo4j-sail-test project we have open in eclipse from the last post, and double click on pom.xml, eclipse will pop open a set of windows that walk you through the file. Let’s skip those and look directly at the xml, by clicking the “pom.xml” box on the bottom edge of the main sub-window.

Here you’ll see a bunch of <dependency> blocks that specify all the external packages this project is dependent on.

When you’re out on the internets, especially on github trying to pick up an open source tool to use, you’ll often see blocks that look like this:

That chunk of XML is a Maven dependency. So if you wanted to use this Sail Ouplemntation pictured above, you would just need to add this xml chunk into your pom.xml file, inside of the <dependencies></dependencies> tags.

Of course, it’s not quite that easy. You still have to tell Maven to actually go get those files so that your project can use them.

To do that, you command-line into your project’s main folder and type:

% mvn clean install

If you have Maven installed correctly (see details in the last post if not), you should see a big rush of text, which will end with something looking like this:

What Maven is (basically) actually doing here is reading through your XML, looking at all the packages you said you needed, finding them in a centralized online repository, and then copying them into a folder on your hard-drive.

By default, that folder is located at

~/.m2

(The “.” in front of a name in OSX means the file/folder is hidden, and won’t be visible in the Finder by default. You can override the finder settings to show hidden folders (Google to see how), but it does lead to a lot of visual clutter. You can also just command-line your way into any hidden folder.)

Now for the somewhat tricky part…

Remember that we confirmed early that Eclipse has a plugin called M2Eclipse, i.e. Maven for Eclipse. That plugin is supposed to tell Eclipse to automatically add the .m2 repository to your classpath, so that projects can automatically find the packages they depend on there.

But for me, Eclipse was not looking inside that folder, which is why I was getting all those dependency errors. You need to make sure Eclipse knows where to look. Inside your Preferences menu, you should be able to find this window (Java–>Build Path–>Classpath Variables), which should have the line you see at the bottom here, “M2_Repo” etc…

If that line isn’t there, or if it is and you’re still getting dependency errors, you need to figure out how make this work correctly. For me, the solution was to go the command line and type:

% mvn eclipse:clean

then

% mvn eclipse:eclipse

That seems to have resolved it, though I’m not actually sure why. If this doesn’t work for you, let me know and I’ll see if I can help.

Okay…getting close! Just one more irritating tweak.

STEP 8: Fix the memory allocation

So Java has built in limits of how much memory running processes are allowed to use. If you’re doing anything large scale with the semantic web, you will often hit these limits. And this case is no exception – when I ran the neo4j-sail-test program in Eclipse, I got a “Java heap space” error.

Luckily, there is a fix. You need to explictly tell the program you’re using that it’s allowed to use more memory, by starting it with the argument

-Xmx[HOW_EVER_MUCH_MEMORY_YOU_WANT]m

e.g.

% java -jar -Xmx5000m unicorn_hunter.jar

With a program you run from the command line, that’s easy enough. For a program you build in Eclipse, it’s less obvious. What we need to do is edit the configuration file Eclipse uses when it starts, which is called “eclipse.ini”. Where is this file? Not in the Eclipse root directory!

Instead, it’s INSIDE THE APP

To get to it, go to the directory where you’ve installed Eclipse. For me, it’s /usr/local/eclipse.

From there, type:

% cd Eclipse.app/Contents/MacOS

Your eclipse.ini file is in here. Open it with a text editor, e.g.

% emacs eclipse.ini

and find the line that says something like -Xmx384m, and change it to the biggest number your system can handle. (For me, -Xmx5000m).

Save that file. Now…

STEP 9: RUN NEO4J-SAIL-TEST!

If you’ve followed these directions (and don’t have any other random, random problems), you should see:

Congratulations! You’ve just used Neo4j and Tinkerpop to execute a SPARQL query!

My plan from here is to use the neo4j-sail-test project and tweak the code to do what I personally need.

Let’s see how that goes…

Hope you’ve enjoyed this intro, that it works for, and that it doesn’t obsolesce too quickly. If you run into any problems, leave a comment or shoot an email and I’ll see if I can help!

[Quickstart] Using Neo4j and Tinkerpop to work with RDF. Part 1!

December 20, 2011 George LondonLeave a comment

[Warning: This is another super-technical post. If you don’t know what the Semantic Web and RDF are, this will be incomprehensible.]

In my last post, I talked about my attempt, as a novice programmer currently capable of only rudimentary Python and not much else, to use Neo4j as an RDF triple store so that I could work with the DBpedia dataset on my laptop. Tinkerpop is an open-source set of tools that lets you magically convert Neo4j into a fully functional triplestore.

My conclusion from that attempt was that using only Python to set up and control Neo4j for RDF is basically impossible.

To reiterate why I’m doing this in the first place: the DBPedia dataset is fascinating and I want to explore it. But the web interface has frustrating limitations (especially the fact that it will simply time out for non-trivial SPARQL queries, and also that I can’t easily download the input to feed into other programs.) So, I want to host the data locally so that I can let my laptop chug away for as long as I damn please answering my queries.

I’m still determined to accomplish that goal, so my new plan is to just bite the bullet and teach myself “just enough Java” (JeJ. Palindr-acronym!) to make this all work. I’ve hesitated to learn Java, since it is, well…extremely daunting.

As of six months ago, I knew basically nothing about programming. Since then, I’ve taught myself rudimentary Ruby (+ Rails) and rudimentary Python (+ Django), both of which are very nice, syntactically simple languages with excellent online “getting-started” resources. For Ruby, I recommend The Little Book of Ruby, or if you’re in for a more psychedelic experience, The Poignant Guide to Ruby. For Rails, I used Michael Hartl’s online Ruby Tutorial (there’s a link to a free HTML version buried on that page somewhere.) For Python, you can’t go wrong with Learn Python the Hard Way. MIT’s Open Courseware Site also has an entire intro to CS class in Python. For Django, I’m working my way through the Django Book. Both languages also have strong, enthusiastic communities in New York which you can easily connect with in person through www.meetup.com. If I get a chance, I’ll write another post sharing all the cool resources I’ve found from trying to learn Ruby and Python.

Now for Java, on all of those points…not so much.

From my perspective as an outsider and a novice, the Java ecosystem looks huge, fragmented, confusing, and uninviting.

Now I will freely concede that I don’t know shit about Java (that’s why I’m trying to learn!), so many of things I say in this post may be deeply ignorant and wrong. If so, please point out any errors/idiocy to me and I’ll happily correct myself.

In this post, I’m going to try to walk you through the whole process of going

FROM: Knowing nothing but a simple scripting language like Python

TO: Knowing enough Java to set up and run a publicly accesible Neo4j server that uses Tinkerpop to process and serve RDF data.

I’m going to try to stick as few steps as possible so that you can follow along even if you’re a true beginner like me. I am going to have to presume that you know enough about the Semantic Web to know what RDF and SPARQL are and why you’d want to use them. If you don’t, that’s just too big a subject to tackle here, though I will try to eventually write an introductory blog post about those too. In the meantime, you can start with wikipedia for a brief overview of RDF and SPARQL, or learn the hard way by reading the W3C specifications for RDF and SPARQL.

So:

STEP 1: Make sure you have Java.

This post presumes that you’re using a Mac. Speaking as a long-time Mac-avoider who just recently ditched his Windows laptop for a new Macbook – if you’re using Windows and want to develop modern software, you need to get a Mac. Just do it.

(Protip: buy used. I got a five-day old Macbook Pro for $2k on Craigslist. It actually had a faulty battery, so the Apple Store gave me a brand new one, no questions asked. Ebay also has substantial markdowns. And if AppleCare is not included, SquareTrade warranties are apparently 90% as good for 50% of the cost.)

So, the basic way that Java works is:

1) You write some code, and save it in a .java file.

2) You compile your source code into .class files, which I presume are in byte-code.

3) A magical machine called “the Java Virtual Machine” magically translates your bytecode into binary which can be executed on whatever system you’re using. The JVM is what makes Java portable to so many different systems…you only have to write code that’s compatible with the JVM, which is the same on every system. Making the JVM compatible with the chipset in your refrigerator is someone else’s job.

So, from what I can tell, “having Java” on your computer means two different things:

1) You have “the Java Runtime Environment”, or “JRE”, which contains the JVM and lets your computer execute precompiled Java code.

2) You have “the Java Development Kit”, or JDK, which contains all the machinery to compile your raw Java source code into bytecode.

Some blogs are claiming that Apple has stopped shipping a JDK since Lion, though you probably have a JRE. I can’t honestly remember what was installed on my laptop when I got it, but to figure out what you have vs. need, just open a console and type:

% java -version

If you don’t have a JDK, you will apparently get explicit instructions on how to get one from Apple. (Oracle apparently just doesn’t feel like supporting Mac). You can also download the latest JDK and updates from the Apple Developer download site. I can’t find a static link but it should hopefully be obvious what to click. This stackoverflow post also has instructions. The latest version seems to be JDK6, though there seems to possibly be a version 7 on the near horizon.

STEP 2: Get Eclipse

Unlike Python, which is happy to run your hello_world.py script by itself in some random folder, Java has fairly rigid requirements for how the filesystem of your project has to be laid out. So while you probably could do everything in emacs, you can save yourself a lot of pain by using an IDE.

One of the most widely-used open source IDE is called Eclipse. In addition to being free, it has a plugin system that makes it (reasonably) easy to add in new functionality. Neo4j will ask us to install some plugins, so I recommend that you just use Eclipse for you development, unless you have a strong reason not to. You can download it here. Just unzip it and put the decompressed folder in whatever folder you want to keep your Java stuff in (for me it’s /Users/rogueleaderr/Programming/Java).

For some reason the drag-the-app-icon-into-your-applications-folder-to-install-on-Lion didn’t work for me (the app wouldn’t launch), but I was able to just put an alias to the app icon into the applications folder and thus add Eclipse to the launch dock.

STEP 3: Get Maven

Don’t you love how simple adding new packages in Ruby is? Isn’t “gem install cthulu-mod” easy and intuitive? Well, forget about that.

You’re going to be using Maven now. I’m still figuring out exactly what Maven does, but my understanding is that it’s a package manager on steroids. If you have Maven installed, you put an xml file “pom.xml” inside each Java project you do, and it specifies the complete structure and all dependencies of your project. So if you download someone else’s project, you can use Maven to automatically make sure that you have everything you’re going to need to run that project. I recommend scanning the wiki page for a quick overview of what Maven does.

To me, typing in “gem install XYZ” three times sounds easier, but hey…

You can download Maven from the Apache website here. Follow the directions on that page to install on Mac. Basically, decompress the file then put it where Apache tells you to, then add it to your shell path. (To add to your shell path, open your .bashrc or .zshrc file, which is a hidden file located inside your home directory “ ~/ ”. If this file doesn’t exist, just create it by typing “ % emacs .zshrc ” (or whatever your preferred text editor is). Then paste in the lines from the Apache install directions. Make sure you enter the right file locations, as I learned the hard way.)

STEP 4: Get Neo4j

As you hopefully know if you’ve read this far, Neo4j is a graph database. While I’ve been told that a graph database is theoretically formally equivalent to a relational database and can be used for almost all of the same things, graph databases are naturally particularly good at representing graph structures. RDF data naturally forms a graph structure, meaning that Neo4j is naturally pretty well suited for hosting RDF.

Neo4j is not as naturally well suited for RDF as a dedicated triplestore like Sesame or OWLIM. But it has one key advantage, which is why I’m testing it out in the first place:

The free open source version is apparently capable of working with billions of triples. Sesame works fine with up to ~100m triples, but even the pared down DBPedia dataset I’m trying to work with has around 1.5bln. My first attempt to “damn the torpedoes” and load everything into Sesame lead to some bizarre behavior. There are commerical solutions like OpenLink Virtuoso and Ontotext OWLIM which claim to work with 10bln+ triples, but those are rather expensive.

Hence, Neo4j gets my attention for now.

Neo4j comes in two forms:

1) A standalone server which you can get by clicking the download button on the Neo4j homepage. The upside of the standalone sever is that you can control it through REST. So if you want to stick with Python, this is probably the way to go. Neo4j does have some embedded Python bindings, but they’re fairly limited. The downside of the standalone sever is that, as far as I know, there is no way to use additional plugins like Tinkerpop, so you’re limited to what Neo4j can do out of the box.

2) A set of Java libraries. This is what we’re going to need, so that we get the full range of control and so that we can use Tinkerpop. Neo4j has a fairly extensive manual which explains how to get these libraries (the specific page is here.) Follow the directions there (including potentially installing an Eclipse plugin called M2Eclipse to let you use Maven directly inside of Eclipse. On my Eclipse install, M2E was already installed, but I’m not sure how to check the full plugin list (Eclipse is pretty freakin’ complicated). But if you open Eclipse–>Preferences and see a line for “Maven”, you’re probably good.

STEP 5: Learn Java

And this is where the paved roads end. From here on out, we’re going to be tying everything together directly in Java, and fighting bugs and dinosaurs as they attack.

User-friendly resources for learning Java seem to be rather scarce (please let me know if you find any.) My solution pro tems is to just go directly to the Oracle Java Tutorial and work through it. Obviously this leaves you about 3652 days short of the ten years you’re going to need to be any good at Java. But assuming you already know the basics of some object oriented programming language, it will give you just barely enough to muddle your way through getting this basic setup working. And crucially, it will teach you how the Java package system works, which is not particularly intuitive but will be crucial if we want to use Tinkerpop.

STEP 6: Get Tinkerpop

Well, I hope you enjoyed learning Java. That must have taken a while. You did go learn Java, right?

Well, just in case you didn’t – I’ll walk you through how to create a Neo4j interface using Tinkerpop. Most of this is ripped directly off of a recent blog post by Davy Suvee, found here. Davy provides some very helpful code, but he assumes a high level of Java fluency. I, on the other hand, will assume that you know no more than I do (i.e. nothing.)

So, start by reading Davy’s post. If you can follow and implement that, you don’t need me!

If not, then let’s start by downloading Davy’s code. Head over to the Github repository. If you don’t know how to use Github, Google yourself a tutorial…it’s pretty easy.

Now, within Eclipse, go to File –> Import. A dialog will pop up. Click Git –> Project From Git

Now click Next. Then copy in the URL of Davy’s project – https://github.com/datablend/neo4j-sail-test.git – last I checked. Now click “
clone.”

Make sure url is autopopulated in the next window, and click next again. You shouldn’t need to enter any github credentials to do this, but if you get an error, try entering yours (definitely worth signing up for a free account if you don’t have one.)

Just click next on the next screen:

And on the last screen, make sure you’re creating the repo where you want it. Then click finish. The repo should download. Eclipse will bring up the original import screen again, but just close it.

Now you have the files! But what to do with them?

For some reason, Eclipse does not let you open projects. ಠ_ಠ

So what you have to do is:

1. Create a new Java project. Make sure your Eclipse workspace is set to the same folder where you cloned the project off of Github (go to File – > Switch Workspace if it’s not). Give the new project the same name as the github repo you cloned. Click okay, and Eclipse should automatically open the neo4j-sail-test project.

2. Now you should have a project open in Eclipse, and you can get started trying to fix all the dependency errors and make this code run.

3. To do that, we’re going to have to get the actual Tinkerpop libraries, and add them to our “classpath”, which is what Java uses to figure out where to look for the files you tell it to import.

That’s hard. And I will try to figure that out tomorrow…stay tuned for part 2.

Using Neo4j Graph Database and Python to Work with RDF

December 19, 2011 George LondonLeave a comment

Hello hypothetical reader!

As you may know, I’ve recently gotten very interested in something called “the Semantic Web.” If you don’t know what that, is stop reading now because this post will be incomprehensible.

If you do know what the Semantic Web is and have tried to work with it at all, you know that it can be a bit less than user friendly. I’ve spent the last couple of weeks testing various tools (you can find a probably-complete list of all the available semantic web tools here, and have found most of them to

1) Be very hard to learn

2) Require that you interact with them through Java.

As a programming n00b who can barely scrape together some Python, the ubiquitous Java requirement is frustrating. So I thought I would document my latest attempt to make a Java-based tool work with Python, in case anyone else finds him or herself trying to do the same thing, and would like to avoid having to figure everything out from scratch.

So…

What I am doing: Trying to use Neo4j as a triple store to host the “DBpedia ontology-based mapping” on my laptop, and interact with and query it exclusively through Python.

Why I am doing this: The DBPedia dataset is fascinating and I want to explore it. But the web interface has frustrating limitations (especially the fact that it will simply time out for non-trivial SPARQL queries, and also that I can’t easily download the input to feed into other programs.) So, I want to host the data locally so that I can let my laptop chug away for as long as I damn please answering my queries.

The result: After a couple of days of trying to make this work, my tentative conclusion is that doing this in exclusively in Python is functionally impossible. Something like 80% of the basic low-level tools (e.g. triple-stores, query processors) I’ve encountered in this space are written in Java and intended to be custom-adapted by Java developers. And something like 80% of the tools that are usable through Python or Ruby are just shims for the Java tools (i.e. hacked-together-plug-the-AC-plug-into-the-DC-outlet adapters). In some cases, it is possible to loosely control the Java tools with Python, but:

1) In most cases, only a small subset of the functionality is accesible through Python

2) The shims are mostly poorly maintained and documented, and if you’re like me, you’ll spend nearly as much time trying to make them work as it would take you to just learn rudimentary Java. And if you stick with the shims, you’ll end up with a much more fragile system.

3) Shims are flat-out not available for some key components in the stack like Tinkerpop, which looks like the only viable way to use Neo4j with RDF without having to custom-write data-upload and querying functionality in Python.

The upshot is that many of these tools do have some form of REST interface (i.e. if you get the Java component running by itself, you can use Python to control it by sending it messages through http).

So my new plan is to:

1) Build a web interface in Django

2) Learn just enough Java (JeJ) to build a server to run the backend.

3) Figure out some way to put the Java server online

4) Control the sever through REST using Python, and do all the input/output on the Django webpage.

In my next post, I’ll walk you step by by step through my experience learning just-enough-Java to get started using Neo4j as a triple store to back a Django-powered website.

A holiday reminder of the dismal state of Healthcare IT

November 23, 2011 George LondonLeave a comment

I’m home in the Bay Area for thanksgiving with the fam’, which means my attempts to work from my old bedroom are periodically punctuated by that most classic of family rituals – driving my mom around to run errands.

Thus, I find myself tapping this out on my iPhone as I find myself stuck for the n-th time waiting FOREVER at a pharmacy as some mystifying process glitch has thrown the normally “well-oiled” @Walgreens machine into befuddled disarray.

You see, the Walgreens system can’t seem to figure out that my mother has actually been prescribed one of the medicines she’s getting a refill of. Or that a second refill has been authorized, even though she handed them a physical prescription slip yesterday. One suspects that dealing with such nonsense takes up a substantial portion of at least one of these pharmacists day.

Now let’s imagine a different world. A better world.

You walk into your doctor’s office, complaining of a sore throat and unsightly discharge. The doctor presses a button, and up comes your entire health history, including all the medicines you’ve ever taken and all the procedures you’ve had. She enters your symptoms, and an “expert system” like IBM’s Watson automatically checks data published by the Center for Disease Control and prepares a report showing the empirical probability you have various diseases given your symptoms and history, and proposes the most effective tests to narrow down the options. As a bonus, the system checks your personal genetic information to identify the ideal diagnosis and treatment.

The doctor appraises the automated results and decides that you must have…let’s just say…Lupus (in the spirit of House, MD). This can be easily treated with a few pills of Silver&Garliceral. The doctor presses a button, and the system, which has your address and insurance information, automatically connects with the nearest pharmacy to put in a prescription authorization. The pharmacy system automatically estimates the wait time, and sends you a text when your prescription is ready, or if it’s not urgently needed (which it knows based on the disease it’s prescribed for), just mails it to you.

This world saves a massive amount of time both for patients and administrators, and also substantially reduces the risk that patients will get the wrong medicine, or the wrong dose, or simply neglect to pick up their prescription at all.

And here’s the thing – the world I describe is not fantasy. We could do all of the things I describe just by cobbling together existing technology. In a world where healthcare costs are constantly rising faster than inflation, fixing this completely fixable mess (by fighting the institutional dynamics / inertia that allow it to continue) seems like a great place to start.

Forgive typos…I literally wrote this on my iPhone.

Advertising in online video

November 9, 2011 George LondonLeave a comment

I watch a lot of online video.

My schedule doesn’t really allow me to be available during “prime-time”, nor can I commit the every-week-consistency that following a serial TV show requires.

So streaming TV shows online is, theoretically, great for letting me watch 10 minute chunks of my favorite shows while I devour Honey Nut Cheerios at 1am. But by-and-large, the experience is just frustratingly awful, given how easy it is to imagine a better solution using only slight re-arrangements of currently available technology.

So let me throw a few exhortations out into the ether, for all you online video executives who surely have nothing better to do than read my tumblr.

1. The exact content I want must be available immediately, in high quality, exactly when I want it and in the format I want it. I know studios want control over how viewers receive their content so they can control brands or whatever blah blah blah. Too bad. That war is already lost. If your customer is sophisticated enough to find your official streaming version, they’re sophisticated enough to find a restriction-free pirated version. And within ten years, nearly every viewer will be that sophisticated.Pass PROTECT-IP or whatever Orwellian-appellated laws you want, you’ll just hasten the advent of strongly encrypted P2P, and it’s game over. Get on the right side of history.

2. I prefer official versions. I don’t want to pirate, and I do want the creators of the content I enjoy to get paid lots of money so that they keep producing those magical moving images. So make the official version clearly better than the pirated version! Make it easy to find! Make it high resolution! Build interactive content in or around the player-window (e.g. make it easy to tweet about an episode, or find facebook friends who are watching the same thing). Provide behind the scenes photos or directors commentary IN AN EASY TO FIND place.

3. And the big point. Don’t suck at advertising! If I try to watch three episodes of South Park online in a row, I’ll see probably five different advertisements. But I’ll see each of them repeated four times! Even if I wanted to buy Redbull, and even if I liked the ad, seeing it four times will make me hate Redbull, and hate you.

I can only assume that the extremely thin selection of ads in most online video is because advertisers just aren’t buying the spots in quantity. That’s mystifying to me. Online video seems like such a better place to advertise than TV.

You have:

1. Demographic information about the viewers (especially if they’ve used facebook to log into the viewer). You may have the site they redirected from. You know what time they’re watching the show, and probably their location.

2. You know they’re already at a computer, and are literally two lazy clicks away from buying whatever your advertising in the ad, instead of having to remember it until next time they’re at a computer (and actively track down and buy your random product.)

It seems pretty inevitable to me that within a few years, we’re going to have all of TV available instantly online, supported by ads that are delivered in HD, tailored to the audience, relevant and entertaining enough that users would rather watch them than find pirated sources, and full of compelling “oh I want to click that” elements that will directly drive sales. Cf. Spotify.

Whoever makes that happen is going to make some serious money, and make it seriously easier for me to hate myself for watching another episode of “The Only Way Is Essex.”

So I’m in the “due diligence phase” of starting-up a start-up…

August 5, 2011 George LondonLeave a comment

And in the process I’m learning a whole lot. And discovering many interesting resources which I wish someone had delivered to me in a neat little set of blog posts. So I’m going to just start posting resources as I find them to tumblr, so that one of my zero followers can come along and someday read them and perhaps find them helpful.

#1!:

http://www.startupcompanylawyer.com/

I just found these people, but they seem to have a pretty comprehensive set of blog posts about the many legal issues facing start-ups. I’ve been trying to give myself a crash course in said issues, and finding a comprehensive resource has been hard. I’ve tried book (including http://www.nolo.com/products/the-small-business-start-up-kit-SMBU.html, which is somewhat helpful but more geared towards people opening flower shops, and also http://www.amazon.com/Do-More-Faster-TechStars-Accelerate/dp/0470929839, which has a few bits of useful advice but not a real guide), and targeted Googling which has yielded mixed results.

But these guys seem to be both comprehensive and serious. I haven’t read enough yet to know if it’s really useful overall, but what I’ve read so far has been helpful.

A somewhat similar resource is http://startuplawyer.com/startup-lawyer/tech-wildcatters-fall-2011-deadline-approaching, also a blog run by a lawyer. He seems to have some good targeted advice on specific points, but is more oriented towards the “here’s what you should actually do, only briefly explained or defended”, which is harder to take on faith since he doesn’t review the overall issues as much.

July 19, 2011November 18, 2019 George LondonLeave a comment

Chart of the day: approximate value add by occupation!

I was wondering today what percentage of American’s time is spend doing different types of work. A quick Google search failed miserably, so I decided to just answer the question myself, with help from our old friend the Bureau of Labor Statistics!

The BLS publishes data showing how many people are employed in each industry, and how much those people earn on average per year. That’s enough to give us approximately what we want.

If we know how many people are employed in each job, and we assume that people in different jobs work approximately the same number of hours per week (which is obviously not true since an investment banker might regularly work 3x as many hours as a postman), then we could just divide employment in each respective industry by total employment to get % of work time spent in each occupation.

But that’s not quite satisfying, because it doesn’t deal with the fact that, in terms of economic value added, one worker’s time is not the same as another. In other words, if a teenager who spends an hour working at the GAP may be very diligent and hardworking, but he’s just not adding as much value as a surgeon who spends an hour removing a brain tumor.

One rough measure of how much value different occupations add per hour is relative wages. There are obviously huge distortions and inefficiencies in how much different workers get paid, but it is approximately true to say that “if Susie the Doctor earns 4x as much per hour as Johnnie the retail clerk, then Susie’s time is 4x as valuable to the economy”.

Now luckily for us, the BLS also reports average annual compensation by occupation.

So if we just multiply total employment by occupation by annual compensation by occupation, we end up with total yearly compensation by industry, which is a pretty good rough measure of how scarce time is allocated in the American economy.

I was actually a bit surprised by the result, though it makes sense when I look at it. There are an awful lot of people providing office support. And there are surprisingly few farmers given how much attention farming gets in the national discourse.

Electric-blue lobster. The 4th lobster of the apocalypse.

June 11, 2011 George LondonLeave a comment

Electric-blue lobster. The 4th lobster of the apocalypse.

In space, no one can hear you blast Nine Inch Nails

June 5, 2011 George LondonLeave a comment

In space, no one can hear you blast Nine Inch Nails

NASA Photos + Nine Inch Nails = Spacey Mashup Music Video
“From Digital Kitchen Seattle creative Chris Abbas comes this sweet mashup video featuring music from Nine Inch Nails and pictures from NASA’s Cassini mission. The Cassini Solstice/Equinox Mission is a joint NASA/ESA/ASI robotic spacecraft mission studying Saturn and its natural satellites….”

Have Miso Soup and Eggs for Breakfast to Beat Back a Hangover(via @Lifehacker)

June 4, 2011 George LondonLeave a comment

Have Miso Soup and Eggs for Breakfast to Beat Back a Hangover(via @Lifehacker)

Japanese breakfast cures hangovers