Future of the Economy #1 – What is An Economy?

Last time, I laid out my roadmap for describing my vision for the future of the economy. Now, again awaiting an algorithm’s end, I take the first step.

Today’s topic is simple – “what is an economy?”

Whether you’ve studied economics or not, (and I never have in any academic setting), that is a tricky question.

To some, the economy is “the intersection of supply and demand.” To others, it’s what falls out of comically complex equation like these. To the dictionary, it’s:

3. the management of the resources of a community, country, etc., especially with a view to its productivity.

4. the prosperity or earnings of a place.

My more modest proposal – “the economy is everything.”

Or, less dramatically, everything (even potentially) useful to human life collectively constitutes “the economy.”

Some examples, in order of increasing controversy:

Economy:

Gold coins

Bread

Rapper Mansions

Fighter Jets

Yellowstone National Park

Cats

DNA

Human Life

Not Economy:

Crab Nebula

As I’ll explain more clearly in subsequent posts, it’s significantly more cogent to include immaterial “objects” like interpersonal relationships, human knowledge, or natural beauty in our definition of “economy”.

Every useful thing, whether material or immaterial, can then be labeled a “resource”. The economy is merely the sum of all resources.

A brief aside – taking a broad view of the economy can be a dangerous game, and there are many very powerful ethical arguments to be made against viewing human relationships etc. through an economic lens. The core fear is that we will start treating human relationships with the sort of “non-ethical, anything-for-a-dollar” ethos that permeates the phrase “it’s not personal; it’s just business.” Instead, I am hopeful that we will instead do the reverse and allow human decency to seep back into traditional economics.

Now, one more step. The size of the whole economy is much less important than prosperity, which is the sum of all resources divided by the number of people utilizing those resources (i.e. Sweden is a very prosperous country, but if we divided the wealth of Sweden across the population of India, the resulting country would not be very rich at all.)

Good economic policy should have a single-minded focus on increasing prosperity (properly conceived, so that we aren’t bulldozing geysers to build factories.)

But how exactly do we increase prosperity? Tune in next time…

P.S. It makes me smile that “bulldozing geysers” formerly had no Google hits.

Setting up MusicBrainz Server on EC2 using Postgresql 9.1 and Ubuntu 11.10

[Warning: super technical post to follow]

I sunk a large amount of time this week into trying to get a MusicBrainz server running in the cloud. Since as of a week ago, I didn’t know jack about Postgres, Linux, Perl, or “the Cloud”, this was a rather large challenge for me. But I finally got through it with the help of a lot of scattered web resources and a bit of help from the most excellent “ruaok” in the #MusicBrainz-Devel IRC room (on FreeNode).

So I’d like offer a few pointers that might save a substantial amount of time for anyone else trying to do this, especially if you (like me) don’t know enough postgres or perl to fix things when they break.

I’m going to assume that you know how to set up a EC2 server running the latest (as of 3/9/2012) version of Ubuntu 11.10. You can find AMI’s linked on the official Ubuntu website. Building the database is a bit computationally intensive, so I recommend at least a large instance if you don’t want to wait around for a long time. I also recommend starting with a 20gb+ volume to be safe, so you don’t have to waste time resizing if you run out of space. I strongly recommend you make sure you use the latest stable version of Ubuntu and don’t (like I did) accidentally install an unstable beta of the next version, because that will lead to a lot of weird errors.

If you don’t know how to use EC2…consider learning. There are lots of good guides online, and it’s pretty powerful.  It’s straightforward but most of the procedures are fairly arbitrary, so there’s no super-easy way to just jump into it. Note Virtualbox MusicBrainz server (as suggested on the MB website) does not work by default on EC2 w/ Ubuntu 11.10, so don’t waste your time trying unless you’re already familiar with virtualization. It’s complicated enough to start with, even if you’re not trying to run a VM inside a VM.

So…

1.     Follow the instructions. The install instructions are good and will get you most of the way there. Find them here (https://github.com/metabrainz/musicbrainz-server/blob/master/INSTALL)

2.     As of this writing, the latest version of Postgres was 9.1; So type in 9.1 when the guide tells you to enter the version number. Postgres is kind of confusing, and unhelpfully they seem to have changed a lot of the directory names in 9.1 without updating the official manual, so if you try to google for the paths you want, you’ll often find the wrong ones. As of now, the key directory to care about is `/usr/lib/postgresql/9.1/bin` which contains the control commands for the server.  Some of these will be put on your path by default from the installation, but not all of them.

3.     By default, postgres keeps its data at `/etc/postgresql/9.1/main/` That’s where you can find the config files the INSTALL guide references. You can also use a different directory (e.g. if you want to put the data on a separate EBS volume so you can clone it easily), you can use the command controls and type `initdb –D /your/dir` to create a new directory with it’s own configuration. You can then start that server with `postgres –D /your/dir`

4.     Edit pg_hba.conf as recommended. Inside of postgresql.conf, change the line that says `listen_address = ‘’` to `listen_address=127.0.0.1` which allows you to connect to your server through TCP. A few lines below, uncomment the line that says `port = 5432`.

5.     Inside of your directory, edit `/lib/DBDefs.pm` and in the block that says READWRITE, change “schema” to “musicbrainz_db”, and username to “postgres” (changing the schema and username might not be necessary, but they helped with some errors I was seeing), and uncomment “port” and change it to 5432.  Below in the “System” block, make sure that the username and password are the same as your postgres account (by default it’s username “postgres” and you have to set the password with `sudo passwd postgres”. Also uncomment “port” and set it to 5432.

6.     Follow all of the other INSTALL instructions. But before building the database in the last step, make sure you log in to the postgres account by doing `sudo su – postgres`

Hopefully, everything should build for you the first time. If it doesn’t the script can get jammed by creating a musicbrainz_db before crashing, so clear any old databases by doing `dropdb musicbrainz_db` from the shell before running the build script again.

If that doesn’t work, feel free to comment here and I’ll see if I can help you. Or, better yet, ask the much more knowledgeable people in the #MusicBrainz-Devel IRC channel or on the musicbrainz-devel mailing list.

Good luck! (Also, if anything here is wrong or doesn’t work for you, please comment with what you had to differently so future readers can figure out what they need to do.)

The future of the economy, in bite-sized chunks.

I’m stuck waiting for some programs to execute, so now seems like a good time to set down my theory of what the economy of the future is going to look like.

“Whoa, George, that sounds like a big task” says the imaginary critical imp on my shoulder. “Why would you want to do that, especially since much of what you say will turn out to be wrong?”

Well, critical imp, I’m glad you asked. There are two reasons I want to do this:

  1. A recent TED talk has convinced me that it’s important to compellingly and concisely explain to the people around me why I’m doing the things I’m doing with my life. So if you know me and you’re curious, you should find most of the answer here.

  2. Despite being a 26-year-old ignoramus, I think I actually have something interesting and useful to say about this topic.

So here begins a sporadic series of posts, each totaling 400 words or fewer, that will explain my vision of whither and whence the economy is going and why. It’s impossible to adequately treat this topic in anything close to 400 words, so I will modularize my thoughts into semi-self-standing blocks that should eventually build a lovely abstract castle.

Here’s a preview (highly subject to change):

  1. The economy is everything around us, added up. “Growth” basically just means there’s more stuff. Total stuff / number of people = average individual wealth.

  2. In the long run, income growth is just productivity growth. Tautologically, everything man-made must be created. When we can make more, we can have more.

  3. Productive process are algorithms. Algorithms (loosely speaking) are just recipes for achieving outcomes in the world. Productivity can be improved either by executing an algorithm faster, or by selecting a more efficient algorithm (“work smarter, not harder”)

  4. Computers allow people to outsource both execution and selection of algorithms (I call this “Cognitive Augmentation”, or “CogAug”)

  5. Execution-based CogAug is tied to processing speed which is (effectively) anchored to Moore’s law, but algorithmic selection can improve without bound (…or can it?).

  6. Ergo, selection-based CogAug is the next great frontier or economic development, because better CogAug–>better algorithms–>greater productivity–>greater per capita wealth.

  7. The first step on that path to selection-based CogAug is structured “semantic” data.

Sound cryptic, confusing, or wrong? Then tune in next time.

Modular Subscription Models, Content Exchanges, and the Future of Media

Disclaimer: In this post, I’m going to take a stab at predicting the future of media distribution. Obviously, much of what I’m about to say will turn out to be wrong. I don’t care.  In any predictive activity, I can only ever trade in probabilities and never certainties, and I have no problem with being wrong as long as I’m wrong in interesting ways.

So, let’s talk about media. And let’s start with a bit of probably-obvious-but-nevertheless-critical context.

For a long, long, looong time, media “content” was inherently tied to a physical object, i.e. “the medium” (from the Latin for “middle.”) The medium, whether book or scroll or record, literally stood between the creator of content and the consumer.

The entire network of business models that make up the current “content industry” (e.g. Hollywood, the Recording Industry, etc.) was built around the notion of media-as-physical-object. When “an album” is not just a set of tracks but also a physical vinyl record, an enormous amount of logistics had to go into manufacturing and distributing that record. While “talent” was necessary, it wasn’t at all sufficient – the most talented singer in the world was unlikely to have the capital and experience in industrial operations to put out a physical record at any meaningful scale. And when the real bottleneck is manufacturing and distribution, it makes perfect sense to have large corporations handle distribution because they can benefit from accumulated expertise and economies of scale.

But an interesting thing is happening right now – the physicality of media is very rapidly disappearing. And as goes physicality, so goes the entire classical business model of the content industry. The complete digitization of media is here to stay, and it’s only going to accelerate going forward. Why? Because it makes sense – there’s simply zero reason to consume the resources to manufacture and transport physical objects when a tiny electric signal will produce the same effect. At least to me (and anyone younger than me), a CD already seems like a painful anachronism. And even though I still buy physical books, I freely admit that it’s mainly from a persistent collector’s aesthetic appreciation for the physical object. 

So what does this easily anticipated trend mean for the industry?

First, it means that the industry should shrink. The capital that’s being used for manufacturing and physical distribution is being deployed inefficiently and should be put to more productive uses. It’s not at all a bad thing if the revenue of the industry is falling – they’re selling a product that now cheaper to produce, and if we can all now get music for less than we had to pay before then we’re richer as a society.

Second, it means that the industry needs to really understand and focus on the areas where it is actually adding value: production, marketing and rights management.

The production and marketing functions of the content industry should be fairly obvious, so I’m going to finally get to the meat of this post and talk about rights management.

In the post-physical media world, the main scare commodities are rights, i.e. the exclusive power to control access to a creative work and the legal claim to profits generated by the consumption of that work. At the moment, the industry is doing a truly abysmal job of managing rights. While some companies are certainly much better than others, the by-and-large reaction of the industry to the dematerialization of media has been to try to use rights as a weapon to oppose progress by trying to force consumers back into buying CD’s or the closest digital analog. Witness the SOPA/PIPA brouhaha for the latest example of the industry’s ham-handed attempts to roll back the clock.

This method will not work. In a democratic society, the law cannot be used to hold back technological progress that unambiguously improves the life of the average Joe. Copyright is not a “natural right” like the right to free expression, but rather a granted license from the government for the purpose of enhancing cultural and scientific progress. In the USA, copyright comes from a clause of the constitution that reads:

“To promote the Progress of Science and useful Arts, by securing for limited Times to Authors and Inventors the exclusive Right to their respective Writings and Discoveries.”

Even if the content industry can ram through some sort of SOPA-style law, they’ll only be sticking their finger in the dyke and increasing the pressure behind the wall. As soon as the average person starts to feel the pain of laws that serve no purpose except to facilitate rent-seeking at the expense of progress, America will (eventually) vote in Congressmen who’ll overhaul the laws to align them back toward promoting economic and cultural progress.

So what’s a content industry to do? Simple – accept that no matter how long it takes for us to get there, we’re going to end up in a world where every form of media (music, movies, “TV”, books, games, etc.) is going to be immediately available to any consumer on any screen. And the notion of consumer “ownership” of media is going to disappear.

The media consumption of the future is going to be all about leasing access rights, (i.e. “renting content.”) Consumers will pay $X for the right to consumer content Y during Z period of time. This means the future of media is going to look a lot more like a financial exchange (e.g. the NASDAQ) than like a record store.  And it means that the content companies that are going to win the future are going to be the ones that master financial engineering.

There are a lot of ways that access contracts can work, but they all generally boil down to “how long are you allowed to watch the Little Mermaid, and how much do you have to pay for that?” At one extreme is the one-off rental (“pay me $2, watch the movie once”) while at the other extreme is the purchase (“pay me $10, watch it forever.”) Both are just access contracts with different terms.

From the consumer’s perspective, acquiring rights is currently an enormous, completely unreasonable pain in the ass. Nearly every contract has to be individually negotiated with different producers on different terms – want to watch Parks & Rec? Go to Hulu! Dr. Who? BBC, unless you’re outside the UK, then it’s Netflix. Beatles album? iTunes; certainly not Spotify. Like the band Ghost Mice? Mail a physical check to Plan-It-X records and get a CD. The whole thing is the pain in the ass, and the pain is main reason that people give up on the whole hassle of “following the law” and turn to piracy.

But I contend that 95% of consumers actually believe that the producers of content (i.e. the talent and crew) deserve to profit handsomely from their work, and they’re willing to pay a price that accurately reflects the cost of production + a reasonable margin that accrues to the people who actually provide a valuable service. Look, for example, at the recent Louis CK special (which I recommend – buy it at https://buy.louisck.net/). Many, many people (including me) happily paid $5 to download the special because we knew that the money was going to the people who created it, and because Louis didn’t try to impose technical inconveniences on us.

So how can we best ensure that content is easily available consumers, while producers of content still get fairly paid?

Why, with modular subscription models based on a brokered content exchange system, of course!

What the hell does that mean? 

It means that in the future, instead of picking up pieces of media ad hoc, consumers will simply subscribe to services like Netflix or Spotify or the future equivalent for books or video games. Netflix acts like a broker on a commodity exchange – it buys content access from producers (just like a commodity broker buying 10,000 bushels of corn) and then resells those access rights to consumers at a markup that makes the effort worthwhile. Everyone wins – producers get paid for content, brokers have jobs, and consumers have an easy “don’t make me think” way to watch whatever they want.

There are a few obvious problems with the classic subscription model, which is why I introduce the concept of “modular” subscriptions. In a modular system, a consumer pays for a “base” subscription that grants access to the most common content, and then can choose to buy additional subscription modules that grant access to even more specialized or expensive content. For example, Netflix could offer a “new releases” module, where for example the customer pays an extra $10/month and gets ability to stream Hollywood movies that are still in theaters. Or a consumer could buy an “Indie film” module that grants them access to independent films that require extra effort for Netflix to acquire. Or a Spotify consumer could buy a “Monsters of Rock” module that includes Led Zepplin and the Beatles for an extra $4/month.

Why is this system great? Because it solves two critical problems. First, it keeps things simple for the consumer – I only ever have to make ~5 economic choices (i.e. I want modules A, C & D but not B & E), and then I can consume whatever I want without any more mental effort. Second, it lets producers dynamically price content, which solves the problem that some content is a) much more expensive to produce and b) much more in demand.  A movie like “Transformers” was both way more expensive to make and much popular than a movie like “Thank You For Smoking,” so why does it make sense that a DVD for each movie would cost the same amount? The main reason why they actually do is “signaling;” producers don’t want to sell a movie more cheaply because it seems like an implicit admission that the movie is not as good, which they fear will drive away more consumers than it wins.  The modular subscription system fixes that problem by obscuring the price of individual movies from consumers, so there is minimal signaling risk.

In the modular subscription model, producers set “per-view” prices for each piece of content (say a movie). The brokers (say Netflix) pays that price to the producer each time someone actually watches the movie. The broker then has the (actually very interesting to nerds like me) task of sorting the movies into modules and setting the appropriate price for each module that will make sure the broker takes in more money than it pays out to the producers. And all that consumers see is that the price of a module subscription goes up or down a couple of dollars every January 1st (or quarterly, or whatever.)

This may seem a bit outlandish, but there’s actually a very large industry that already works exactly this way – insurance. When I buy a fire insurance policy, I agree to give the insurance company $X a month if they agree to completely rebuild my house if it burns down. The insurance company then goes behind the scenes and runs a huge calculation that figures out the likelihood of my house burning down and the cost of rebuilding, then sets a price for the contract that they think will allow them to profit on the whole across many contracts, even if they have to pay for a bunch of burned houses. And because different customers have different needs, insurance companies sell riders, which let me purchase additional chunks of coverage (for example, “this policy holds even if I accidentally burn down my own house” or “if my house burns down, you have to build be an even better one”) in exchange for a fee that the insurance company sets. This system is very well established, and has worked very well for hundreds (if not thousands) of years.

So, that’s where I think media is going to go. Everyone wins (except companies that used to profit from now completely obsolete distribution methods). Consumers get immediate, easy access to all the content they want and they know that the money they’re paying is really going to people who’ve created the content or added real value in the supply chain. Producers actually get paid for their content, get control over how much to charge, and don’t need to deal with the complexity of distributing to thousands of different outlets. Obviously there are powerful incumbents who lose under this model, so they’ll fight it tooth and nail. But I think they will eventually lose (as they always do) and the solution that’s best for society will win out.

What do you think?

Senator Kirstin Gillibrand responds to my open letter

My letter here.

Her response:

“”“

February 1, 2012

Dear George,

Thank you for writing to me regarding S. 968, the PROTECT IP Act of 2011.  I understand the concerns that have been raised over the original approach towards solving the problem online piracy poses to our overall economy and New York jobs. All New Yorkers should be able to agree on the shared goals of cracking down on the illegal piracy of copyrighted material without any unintended consequences of stifling the internet or online innovation. 

After working hard with my colleagues to make important changes and improve the Protect IP legislation, it became clear that a consensus on a balanced approach to achieve these shared goals could not be reached. I believe it is time for Congress to take a step back and start over with both sides bringing their solutions to the table to find common ground towards solving this problem.      

Thank you again for writing to express your concerns, and I hope that you keep in touch with my office regarding future legislation. For more information on this and other important issues, please visit my website at http://gillibrand.senate.gov and sign up for my e-newsletter.  

Sincerely,

Kirsten E. Gillibrand
United States Senator

”“”

Obviously it wasn’t on my account, but it does appear that she changed her position in direct response to the internet’s collective action.

My open letter to Senator Kirsten Gillibrand opposing the Protect-IP Act (PIPA)

Dear Senator Gillibrand,

As one of your constituents, I’m writing to urge you as strongly as I can to oppose the passage of the Preventing Real Online Threats to Economic Creativity and Theft of Intellectual Property Act of 2011 (a.k.a. PROTECT-IP or PIPA).

Let me clearly explain why I oppose the bill.

I am New York City-based entrepreneur. I am currently working to bring together information and metadata about music and to build a website which makes that information fun and easy to navigate and search. I am strongly resolved to only provide legal and non-copyrighted information (which means no pirated content, no enablement of piracy, and no intentional copyright infringement of any kind.) If I’m fortunate, my business will grow to not only provide an invaluable service to music lovers everywhere but also provide high paying jobs to hundreds (if not thousands) of people in the New York area.

So let me say unambiguously, without hyperbole or dramatization, that the passage of PIPA would very likely kill my business.

Here’s why:

1) I am financing this business using my very limited life savings. After years of working and saving, I have just enough to execute on my business plan.  But complying with PIPA would require me to hire and possibly retain a specialized lawyer. That act alone would consume enough of my capital to seriously threaten my plans.

2) A major part of my business plan is to allow users to contribute information, including links. Complying with PIPA would require either developing custom monitoring algorithms or manually visiting each and every link to investigate. Either would cost more than I can afford at this stage without sabotaging my core business. PIPA allows Hollywood and the recording industry to abuse the power of government to unfairly and inefficiently foist the cost of protecting their own commercial interests onto vulnerable entrepreneurs like me.

3) PIPA forcibly sets entrepreneurs apart from our users by casting us as “hall monitors.” Successful entrepreneurs build – and depend on – thriving, passionate online communities by developing deep personal and emotional connections with their users. Nobody connects with a hall monitor.

4) PIPA establishes an infringement notification process that grants plaintiffs a disproportionate and unnecessary preemptive power to interrupt the operation of my website, in direct violation of my fifth amendment right to due process.  The cost of fighting a groundless notification would almost certainly bankrupt me. And the RIAA has an established track record of abusing even the current better-safegaurded DMCA-based system by serving inaccurate takedown notices for the primary purpose of harassing and stifling legitimate competition. Please see this linked article for a specific recent example (http://bit.ly/A6vASt).

5) I may be lucky enough to successful start a business using my own savings, but scaling my business to become a large employer will almost certain require outside capital. This is difficult enough in the current dismal economic climate. But it will become nearly impossible if I’m forced ask venture capitalists or bankers to risk supporting a business that suddenly has enormous and unpredictable potential litigation liabilities. PIPA will almost certainly have a large chilling effect on investment in exactly the sort of technology our economy most needs to grow and to remain internationally competitive. Someone somewhere will build this business, but it won’t happen in America unless we take a much more balanced and forward-looking approach to protecting intellectual property.

The internet is a singular modern Wonder of the World, and it has brought America into an era of unprecedented and previously unimaginable intellectual and cultural richness. Even cultural realms which are supposedly threatened by piracy, like music and film, are experiencing a renaissance of creative output which is directly linked to the ever increasing availability of new (and old) ideas which are allowed to freely disseminate and cross-pollinate. Content providers absolutely should be fairly compensated for their labor, but with recognition of the simple reality that the internet has enabled an enormous cohort of new content creators (many of whom labor with no expectation of monetary reward) whose contributions also deserve to be respected and valued by our government.

Please do what you know is right and oppose both PIPA and any future version of the bill that contains provisions threatening the fundamental nature of the internet. Like so many parts of American society, the driving spirit behind the internet is sustained and nourished by creativity, ingenuity, openness, and good old fashioned American freedom. I cannot possibly support or vote for any representative whose actions betray those crucial American values.

Senator Gillibrand – please don’t kill my dream. Please don’t stifle the creativity of hundreds of thousands of your voting constituents. And please don’t threaten the monument to human progress by peaceful collaboration that so many millions of people around the world have spent so many years working together to build.

I hope you’ll seriously consider what I’ve written here. I am not an uninformed reactionary; I am a constituent with a direct, tangible, personal stake in the outcome of this legislative process. I know you’re extremely busy, but I would very much appreciate a direct response to my concerns.

Sincerely,

George London

Founder & CEO of HypeJet.com

[Quickstart] Turn your laptop into a public remote SPARQL endpoint (or pretty much any kind of public server).

[Warning to non-technical followers…keep on walking. This is another obscure hyper-technical post. Also, forgive the bizarre looking images, but it’s not worth the effort to force Tumblr to show them correctly.]

So…you, like me, have spent the last few weeks playing around with various Semantic Web triple stores, trying to figure out which is best suited for whatever particular quasi-mysterious application you’re trying to build.

After an exciting but awkward and unfulfilling first time with Sesame’s native repository and a briefly passionate but now apparently fizzled relationship with Neo4j, you’ve finally found that special store that you’re ready to settle down with, at least until it stops scaling gracefully or something faster and better documented enters your field of view.

Let’s say you’ve even built a bit of application code, and have a cute little toy process running on your laptop that executes SPARQL queries against a locally hosted server.

Now what?

Well, if you’re like me, your just might want to start showing off your ugly little duckling to those friends and family who don’t know enough about technology to laugh at the inadequacy of your architectural endeavor.

So, naturally, you’re going to want to make your application publicly accessible.

Now with a normal Django or Rails web app using a MySQL databse, deploying your demo is a snap using platform as a service (PaaS) solutions like Heroku, which let you publicly deploy your application from git by typing in about three lines of code. Heroku has even recently added basic support for Java, so you can build your app right out of Eclipse and onto a Heroku server.

But what if you’re using some adapted open source code that builds with Ant instead of Maven? And more importantly, what if your SPARQL server needs a 25GB data file to answer queries?

Well then, my friend, you’re pretty much out of luck on the PaaS side. As far as I can tell, you have two choices:

1) Use an infrastructure-as-a-service (IaaS) service like Amazon which lets you spin up your own cloud hosted servers. If you want to build a robust, scalable, secure solution, this is probably the way to go. And if/when I almost invariably move in that direction, I will try to write a blog post explaining how to do this. But it requires quite a bit of “upfront investment” in learning how AWS works and how to create, boot and administer a linux server that can run your code. Plus, it costs money if you want to use a non-trivial amount of computation or transfer and store a non-trivial amount of data.

2)  You can do things the old-fashion way, and turn your home computer into a web-server that can handle SPARQL requests from the open internet. This has a lot of disadvantages – it’s probably insecure as all hell, your laptop has to be turned on and connected to the internet for it to work, and if you end up getting any real traffic, you’re going to be clogging up your bandwidth and CPU cycles handling SPARQL requests (plus many ISP’s forbid you from running servers at home.)

But rolling your own has a few trump card advantages.

First, it’s relatively easy (at least if you don’t have to figure out how to do it, which is why I’m writing this guide.)

Second, it lets you run you server with basically no additional configuration or porting or data uploading or anything – if you can run a SPARQL query against localhost, you can use your server as a remote host.

Third, it’s (nearly) free. You may elect to pay for a dynamic DNS service that costs $30/year (though there are free alternatives), but everything else uses software/services you already have.

So, here’s how to do it:

STEP 1: Make sure you have the pre-requisites

In theory, you can probably make this work with just about any computer and any internet connection. But for my purposes, I’m going to assume you have a configuration approximately similar to mine, i.e.:

OSX Lion

Running a SPARQL endpoint through a Tomcat server

Verizon FiOS or similar “always-on” internet connection, via a home router

STEP 2: Setup a static IP on your laptop

For this to work, you do NOT need a static IP from your ISP (which apparently cost extra). We’re going to use a service called “Dynamic DNS” that will let the internet find your network even when your ISP changes your IP address. But you do need a static IP on your laptop so that you your router can figure out what to do with incoming traffic from the internet.

Here’s how to do this on Verizon FiOS if you have a standard Actiontec router. First, open your admin panel by going to 192.168.1.1 in your browser:

 

Enter your username / password (the default username is “admin” and the default password is, I think, the serial number of the router.) If you can’t remember your login, you can hard-reset the router by pressing the little reset button on the back for ten seconds. This will wipe your configuration, but these routers are pretty good at automatically setting themselves back up.

Now, inside your control panel, click “My Network”, then “Network Connections.” Find the entry for your local area network (in my case “Network (Home/Office), and click the little edit button in the rightmost column of the table. Scroll to the bottom and click “Settings”.

 

Now, find the line that says “End IP Address”. By default, this is set to something like 192.168.1.255. You need to set the last number to something less than 255 to give you some address space that’s not automatically assigned to devices connecting to your router. I set this to 192.168.1.100. Click “apply”.

For some reason, you can’t just give your laptop it’s own IP and expect the router to talk to it. So next we need to go into the “Advanced” heading on the router control panel and select “IP Address Distribution”.

 

Click “Connection List”. Then at the bottom of the table of connections click “New Static Connection”.

 

Type in a name for you laptop, the static IP address you want to use (should be something like 192.168.1.150), and the MAC address of your laptop. On OSX, you can find the MAC address by going to “System Preferences” -> “Network” -> “Wifi” -> “Advanced” -> “Hardware”. (I’m not going to show screens with my particular MAC and network details to try to make it slightly harder to hack me.)

Go back to your router control panel and click “Apply”. Now, your laptop should attach to the router using the IP address you specified. If it doesn’t, try refreshing your IP by going to “System Preferences” -> “Network” -> “Wifi” -> “Advanced” -> “TCP/IP” and clicking “Renew DHCP Lease”. If that doesn’t work, restart your computer.

STEP 3: Get a dynamic DNS provider.

You know those DNS servers on the internets that make it so that you can type www.google.com into your browser, and your computer magically starts exchanging packets with the servers at Google’s IP address, and the Google homepage magically loads?

Well, you can use that same basic technology to get around the fact that your ISP gives you an ever changing address on the internet. The trick is a dynamic DNS service, which gives you a standard “whatever.mysite.com” URL, and automatically handles the nasty business of routing anyone who visits that URL to your router’s IP address. There are free services that do this, but they’re harder to use so I’m just using a fairly slick service called DynDNS (www.dyndns.com)

They require you to sign up for a “Pro-Trial” account which will start charging you after 14 days, but you can apparently cancel the account after a few days and still use them to route to ~5 IP addresses. They’re pretty simple to set up, but this video (http://revision3.com/systm/dyndns) covers the signup/setup process in detail, so I’ll refer you to them instead of repeating.  At some point in the process, you’ll need to enter your router’s current IP address, and download a small client to your computer that will let DynDNS know if your IP address changes.

STEP 4: Setup port forwarding on your router.

Okay, so now the internet can find your router. But your router still needs to know what to do with income traffic. So if someone from the webs comes and gives the secret handshake, Mr. Router needs to send them to visit me. We do this with port forwarding.

Let’s go back to our router control panel. Click “Firewall Settings”. Click “Port Forwarding”. Pick your laptop out of the dropdown menu, and select “custom port” form the other menu. It seems that at least for me, Verizon blocks incoming traffic on port 80, the default HTTP port. But that doesn’t matter since it leaves high # ports unblocked. So just enter a random high number like 60000 under port.  Click add.

 

AND THAT’S PRETTY MUCH IT.

Now, anyone who visits “whatever.yoursite.com:60000/whatever” can access that resource on your local machine.

If you actually want to run a SPARQL endpoint, there’s a bit more work to do.  So,

STEP 5 (OPTIONAL): Configure Tomcat to deal with remote traffic

Most of the triple stores I’ve experimented with run as applications inside a Tomcat or Jetty servlet instance. If you don’t have one of those setup, you’re in for some not-particularly-fun work that’s way beyond the scope of this post (though you can try this post for a walkthrough of how to get started with a simple Sesame instance).

If you do have a Tomcat server running on your computer, you need one more step to actually use it as a SPARQL endpoint. Tomcat by default will run on Port 8080. We need to set it to run on whatever port we forwarded earlier on (e.g. 60000) so that traffic coming in on that port will hit the server and get a response.

To do this, you need to edit the “server.xml” file inside of your Tomcat installation.  For me, the path to the containing folder is: /usr/local/apache-tomcat-7.0.23/conf

Inside of server.xml, look for the block that says:

<Connector port=“XXXXX” protocol=“HTTP/1.1”

               connectionTimeout=“20000”

               redirectPort=“8443” />

And change XXXXX to whatever port you forwarded.

And that’s actually it. Now when anyone send a url-encoded SPARQL query to “whatever.yoursite.com:60000/sparql” or whatever the appropriate URL is, the server will send back an appropriate response!

What this is useful for:

This is actually a pretty cool result in my opinion. While it is almost certainly not a good idea to run an openly accessibly SPARQL endpoint off your home network because it could easily get hacked or flooded with traffic, you CAN use this method to make an endpoint available to trusted friends and rely on “security by obscurity”. As long as you don’t actually list the access address to your endpoint anywhere, you are probably not going to get bombarded with queries.

But the more cool result is that you can combine this architecture with a REST paradigm to build a fully publicly accessible application using a framework like Rails or Django, throw that up on Heroku, and route all the SPARQL queries to your server behind the scenes. If you have an always on broadband connection and an old laptop lying around, you can throw linux on the laptop, setup Tomcat, copy your data over there, and use that laptop as an always-on server to support your public facing application.

That’s obviously not a scalable solution, but it is free, and way easier than trying to set up a whole AWS infrastructure. And if you’re only getting a handful of visitors to your public site each month, even an old laptop should be able to handle the traffic reasonably well.

Anyway, this walk-through is pretty configuration specific, but I imagine the process should be at least loosely analogous on any other setup. So hopefully this post will save you the trouble of figuring out how to do all this (which is definitely the hardest part). If you have any questions / problems / suggestions for how to do this better, just leave a comment or send me an email and I’ll try to help out or update the post!