Day 1: The Beginning

Today was the first day of a two-month sabbatical that I am taking, in order to get this plantdata project off the ground. I have decided that every day I work on this, I will write a short blog post outlining what I accomplished during that day, things I learned, and next steps. I’m doing this for two reasons:

  1. While I have spent most of my adult life as a full-time coder, at some point not too far away that majority share will be in *managing* full-time coders. It’s been a long time since I’ve had the opportunity to focus on building something, and to put it bluntly, I am so rusty I can actually hear creaking noises sometimes. I’m hoping that keeping decent notes on a schedule will help me get back into the game.
  2. Shame, really. Shame as a motivator. I fully expect to spend about two weeks flailing wildly with very little to show for it, and having to tell everyone about the whole thing should force me to document mental progress better than I would if left entirely to my own unobserved flailing.

I have decided to start this journey with an investigation of the relatively recent docker images and compose work that’s been going on. I knew it wasn’t going to be entirely smooth sailing for me, as I have never before used docker, or successfully installed blazegraph on anything. Nevertheless, I was able to use docker-compose to spin up some instances in a couple hours.

One early takeaway: Good grief, the Wikidata Query Service needs a lot of memory to start up! It wouldn’t run cleanly until I upgraded my docker box from 4GB memory to 8GB. This also doubled the monthly cost of running this little experiment with my web host, but… /me shrugs

Once my test box had a sufficient amount of RAM, the compose command ran cleanly, and I could load the frontends of both the containerized wikibase instance, and the containerized wikidata query service. I did not expect to get that far before lunch. Unfortunately, after lunch it rapidly became clear to me that the wikidata query service wasn’t *quite* connected up to the wikibase instance: Confusingly, the typeahead in the query helper UI could get objects and properties, but no data was ever returned upon running an actual query.

SPARQL isn’t exactly something I’m comfortable with either at this exact moment. Not knowing if it was misconfigured machines or my own inability to write a well-formed SPARQL query, I called in some expert help who was very helpfully watching his email (thanks, Stas!).

For the readers also uncomfortable with SPARQL, here’s an easy query to test if your wikidata query service is actually talking to anything or not:

SELECT * WHERE { ?x ?y ?z } Limit 10

Turns out that even though the query helper typeaheads work like everything is wired up correctly, my containerized wdqs instance isn’t loading data updates from the adjacent wikibase container. I destroyed those containers and remade them just for fun (isn’t that what containers are for?), and while it did not magically fix the issue, it was genuinely entertaining for a minute.

I did clear up a misconception I’d been carrying with me for a while: The query service gets its data by monitoring the recent changes feed on your target wiki. And here I’d been thinking there was some kind of db dump and import on a cron job I’d have to set up eventually. I’m honestly a little surprised that’s not the case, and now I’m wondering what options exist for recovery/rebuilding if your wdqs instance walks off into outer space…

Tomorrow’s Plan:

  • Learn everything about docker. Particularly, if there is a nice way to have containers land their logs somewhere easily accessible. But also everything else.
  • Get the test containers in the wikibase group to talk to eachother the way they are supposed to
  • Write my own compose file to forego the containerized wikibase instance, and instead point all the query service containers to my real pre-existing plantdata wikibase install
  • Verify that the instances are communicating the way you think they are
  • Start populating data in the real plantdata wikibase instance (Data In)
  • Get comfortable with SPARQL and write some sample queries (Data Out)

I’ll be delighted if I accomplish two of those six things tomorrow.

Today’s most useful links:

2 thoughts on “Day 1: The Beginning

    • K4-713 says:

      Thank you for the heads up! I’ll have a look at that today.
      I’m very grateful for the work you did getting these things containerized in the first place – it’s pretty cool to be able to spin these things up so quickly. It’s great, in this experimental phase, not to have to worry about messing up the one instance I could get installed.

Leave a Reply

Your email address will not be published. Required fields are marked *