Today, I woke up at 7:30am, which is easily the earliest I can remember waking up on a Sunday in literally as long as I can remember. After a hearty and healthy breakfast of Frosted Mini-Wheats , I set out into the hoary cold and began the hour-long journey to the New York Botanical Garden.
Or, rather, an introduction to them lead by Barry Smith, of the National Center for Ontological research at SUNY-Buffalo.
I didn’t quite know what to expect. I was attending the event at the recommendation of the New York Lotico Semantic Web Meetup Group. I’m a fairly recent convert to the Semantic Web way of thinking, so over the last few months I’ve been trying to attend any event I can find and meet/talk to anyone I can in the field. And I’m now actively working on ontology construction for my current project LinerNotes.com, so I was curious to see how people in other knowledge domains are approaching ontological problems.
What I did get was confirmation of what I had long suspected, that poorly conceived and poorly labeled schematizations of data inside of traditional databases have lead to as much poor findability and poor interoperability in biology as they have in music or (in my past life) in finance.
Dr. Smith opened the talk with a discussion of his previous work with genetics databases that were full of confusing and under-documented labeling, wherein a single location a field specimen had been found could be labeled “Japan”, “Japanese”, “In Japan”, or “J.”. Obviously trying to query such a database to find all the specimens collected in Japan is going to be an exercise in frustration.
Dr. Smith then began to dive into the technical details of his project in biological ontology, which was followed by a deep discussion of a Plant Ontology project apparently being carried out in part at the Botanical Garden. At that point, unfortunately, I ceased being able to extract much from the discussion as they dove deep into I’m-sure-fascinating-but-way-beyond-me questions a la “is designating a capable_of relationship between a cell type and a plant organ going to cause conflicts with medically-oriented biologists’ intuitive notion of totipotency?”
A few meta-points I could glean from that are:
1) Even when restricted to a single subject matter domain, ontologies are tremendously complex if expressed down to anywhere near the “real individual instances” level. And so building out robust ontologies without a huge capital investment is usually going to require significant broad collaboration. Meaning that getting a large team of enthusiasts to work together is extremely helpful.
2) Even in biology, ontology is no more suitable to rigid exactitude than normal human language is. There is no “right” answer for what “capable_of” should express, and the best we can hope for is a level of clarity that makes it easy to choose the right ontology for our particular purpose, and when necessary to algorithmically convert between ontologies (which should, by and large, be functionally equivalent most of the time.)
All in all, I probably wouldn’t recommend this kind of event for those who don’t have a strong abstract interest in the process of ontology creation or in the specific subject matter being covered (in this case biology), because the talk was really a “by experts for experts” type thing.
But I think it was a worthwhile use of my Sunday morning. And certainly an entertaining contrast to my Superbowl afternoon.