A tale of two Prologs
It’s conventional wisdom in Prolog circles that you should, if you’re using free tools, do your development work in SWI-Prolog (because of its wonderful environment and tooling) and then grit your teeth and port your code to YAP for performance because YAP is just faster than SWI-Prolog. Now I have no reason to doubt this conventional wisdom, but being the kind of guy I am, I always have questions like “how much faster?” and “how much teeth-gritting?” These questions are actually astonishingly hard to answer.
I’ve looked under chairs / I’ve looked under tables
The problem, you see, is that finding a decent set of benchmarks for Prolog is fiendishly difficult (for me, I mean: I’m sure there’s people out there whose search-fu is so good they can find a million detailed Prolog benchmarking suites in only a few seconds). There are two main reasons for this:
- Prolog and portability are still not really on speaking terms after all these years. You know, despite there being an ISO standard for it.
- Much of the Prolog space is taken up with commercial offerings whose marketing literature boasts of performance; actual hard data could be embarrassing to these claims, so there’s not a lot of incentive to make a comprehensive suite of benchmarks.
Still, I did eventually find a benchmark suite for SWI-Prolog and decided to test two things with it:
- How hard is it to port code from SWI-Prolog to YAP?
- How much faster is YAP than SWI-Prolog?
I had my own guess for that second one: I figured that YAP was likely going to be about 50% faster than SWI-Prolog on average with perhaps a range of 10% faster to 70% faster. I thought, in short, it would be along the lines of various C compiler differences.
The benchmark suite is about 7000 lines of Prolog code doing a variety of micro-benchmarks. It turns out that to add YAP support, 11 lines of code have to be changed. (The code in question deals with the guts of the implementation—peformance statistics—so it’s quite reasonable to expect these not to be compatible.) To be specific, the following code:
ntimes(M, N, T, GC):- statistics(gctime, GC0), statistics(cputime, T0). ntimes(M, N), statistics(gctime, GC1), statistics(cputime, T1). ntimes_dummy(N), statistics(gctime, GC2), statistics(cputime, T2). T is (T1-T0) - (T2-T1), GC is (GC1-GC0) - (GC2-GC1).
…had to be changed to:
ntimes(M, N, T, GC):- get_performance_stats(GC0, T0), ntimes(M, N), get_performance_stats(GC1, T1), ntimes_dummy(N), get_performance_stats(GC2, T2), T is (T1-T0) - (T2-T1), GC is (GC1-GC0) - (GC2-GC1).
get_performance_stats(GC, T):- current_prolog_flag(dialect, Dialect), get_performance_stats(Dialect, GC, T).
get_performance_stats(swi, GC, T):- statistics(gctime, GC), statistics(cputime, T).
get_performance_stats(yap, GC, T):- statistics(garbage_collection, [_,_,TGC]), statistics(cputime, [TT,_]), GC is TGC / 1000, T is TT / 1000.
That is a rather impressively small change (a change, incidentally, that makes testing other Prolog environments that much simpler). Add one simple bash script and I’m ready to go:
#! /usr/bin/env bash scale=1 time yap -l run -g "run($scale), halt." time swipl -s run -g "run($scale)." -t "halt."
Note that I wrapped the whole execution up in a call to the
time utility as a redundant sanity check.
I found the results rather surprising, to be honest. As I said, I expected the average improvement to be perhaps 50%. Look at the real numbers:
|Benchmark||SWI-Prolog (s)||YAP (s)||Ratio|
|Total||SWI-Prolog (s)||YAP (s)||Ratio|
There’s a lesson to be learnt from all this, I think. For starters the big take-away lesson for me is that assumptions can be bad. I would never have guessed that SWI was on average 2.4 times slower than YAP with a range of 1.9 to 5.5 (!) times the speed by just switching compilers. I also have to take away the lesson that the talk of YAP and SWI working together to ensure as much compatibility as possible is not just empty talk. This porting job was miniscule.
Now I know these are microbenchmarks and thus not really worth much, but this leads to take-away lesson #3: TEST before you make any assumptions. It will save you unpleasant surprises in the future.
So it seems that conventional wisdom was right. For this set of benchmarks at least.