Day 2: Test-driving the Tools

Today was spent largely playing with docker again, trying to debug the issues I was having with the wikibase and wikidata query service test instances I started playing with yesterday. I am pleased to report that I did manage to fix the problems I was having yesterday in getting the query service to import data from the adjacent wikibase install. Turns out the data wasn’t loading, because the timestamp of the newest data was older than a timestamp on the updater system. I got a lot of this in the updater container output:

20:51:47.216 [main] ERROR org.wikidata.query.rdf.tool.Update - Error during initialization.

wdqs-updater_1   | java.lang.IllegalStateException: RDF store reports the last update time is before the minimum safe poll time.  You will have to reload from scratch or you might have missing data.

Thanks again to the same very helpful expert from yesterday (Yay again, Stas!), I was able to fix this problem by running the script once with an additional -s parameter to reset the timestamp to something older. That way, it would rebuild more of the data, get a new timestamp from the latest update, and stop complaining in general.

To further complicate things, I quickly ran into another wall: I couldn’t reliably run a docker command on the updater container. Docker exec commands kept complaining that the container was in the process of restarting and advising me to try again, despite docker-compose ps telling me consistently that the container was up. I stopped and restarted that container and the whole set several times, and kept getting the same results. Instead of continuing to fight that one container, I ran the following command on the main wdqs container instance, which was stable and based on the same image as the updater container:

docker exec wikibasedocker_wdqs_1 ./ -h http://wdqs.svc:9999 -- --wikibaseHost wikibase.svc --wikibaseScheme http --entityNamespaces 120,122 -s 20180301000000

Fixed! This enabled me to spend some time today basking in the utter confusion of trying to understand what on Earth SPARQL is all about, but with data behind it this time. The fact that this is only Day 2 means I am pretty far ahead of the general schedule of events I had vaguely guessed at before I actually started poking around with these tools. I may even be in a place where I can usefully start importing data to the production wikibase instance before this week is out.

It’s probably worth noting, though, that the updater container being flaky and restarting all the time is probably something I’ll have to revisit later. I can certainly imagine that the instability in that container was what caused the timestamps to go funny in the first place. Then again, this may be completely expected behavior for that container, but it certainly does seem suspicious.

I finished out the day starting to convince the compose file to have the query service look at the production location of the plantdata wikibase, and got just far enough to generate several pages of errors to dig into tomorrow. Nothing like finishing with a small explosion.

Tomorrow’s Plan:

  • Continue learning things about docker, presumably by exploding and unexploding all the test containers until I stop being surprised by its behavior.
  • Finish writing my own compose file to forego the containerized wikibase instance, and instead point all the query service containers to my real pre-existing plantdata wikibase install
  • Verify that the instances are communicating the way I think they should be, or learn enough to alter my expectations
  • Start populating data in the real plantdata wikibase instance (Data In)
  • Get comfortable with SPARQL and write some sample queries (Data Out)

Same as yesterday: This plan is clearly too big for one day. If I can manage to successfully import data from outside the docker setup to the containerized wikidata service, I will be positively delighted.

Today’s most useful link: – handily explained some warnings I kept running into while building things in docker. Looks like we’re extending a base image with the same issue being described in that thread, but they’re just warnings (which I believe strongly in addressing vs ignoring), they are pretty easy to fix with one line of code, and they didn’t seem to be breaking anything on their own. It’s just… you know. Less red in the compose output means fewer immediate mysteries to follow up on.

Leave a Reply

Your email address will not be published. Required fields are marked *