Using Neo4j Graph Database and Python to Work with RDF

Hello hypothetical reader!

As you may know, I’ve recently gotten very interested in something called “the Semantic Web.” If you don’t know what that, is stop reading now because this post will be incomprehensible. 

If you do know what the Semantic Web is and have tried to work with it at all, you know that it can be a bit less than user friendly. I’ve spent the last couple of weeks testing various tools (you can find a probably-complete list of all the available semantic web tools here, and have found most of them to

1) Be very hard to learn

2) Require that you interact with them through Java.

As a programming n00b who can barely scrape together some Python, the ubiquitous Java requirement is frustrating. So I thought I would document my latest attempt to make a Java-based tool work with Python, in case anyone else finds him or herself trying to do the same thing, and would like to avoid having to figure everything out from scratch.

So…

What I am doing: Trying to use Neo4j as a triple store to host the “DBpedia ontology-based mapping” on my laptop, and interact with and query it exclusively through Python.

Why I am doing this: The DBPedia dataset is fascinating and I want to explore it. But the web interface has frustrating limitations (especially the fact that it will simply time out for non-trivial SPARQL queries, and also that I can’t easily download the input to feed into other programs.) So, I want to host the data locally so that I can let my laptop chug away for as long as I damn please answering my queries. 

The result: After a couple of days of trying to make this work, my tentative conclusion is that doing this in exclusively in Python is functionally impossible. Something like 80% of the basic low-level tools (e.g. triple-stores, query processors) I’ve encountered in this space are written in Java and intended to be custom-adapted by Java developers. And something like 80% of the tools that are usable through Python or Ruby are just shims for the Java tools (i.e. hacked-together-plug-the-AC-plug-into-the-DC-outlet adapters). In some cases, it is possible to loosely control the Java tools with Python, but:

1) In most cases, only a small subset of the functionality is accesible through Python

2) The shims are mostly poorly maintained and documented, and if you’re like me, you’ll spend nearly as much time trying to make them work as it would take you to just learn rudimentary Java. And if you stick with the shims, you’ll end up with a much more fragile system.

3) Shims are flat-out not available for some key components in the stack like Tinkerpop, which looks like the only viable way to use Neo4j with RDF without having to custom-write data-upload and querying functionality in Python.

The upshot is that many of these tools do have some form of REST interface (i.e. if you get the Java component running by itself, you can use Python to control it by sending it messages through http).

So my new plan is to:

1) Build a web interface in Django

2) Learn just enough Java (JeJ) to build a server to run the backend.

3) Figure out some way to put the Java server online

4) Control the sever through REST using Python, and do all the input/output on the Django webpage.

In my next post, I’ll walk you step by by step through my experience learning just-enough-Java to get started using Neo4j as a triple store to back a Django-powered website. 

Leave a Reply

Your email address will not be published. Required fields are marked *