Node:Binary GCD, Next:Accelerated GCD, Previous:Greatest Common Divisor Algorithms, Up:Greatest Common Divisor Algorithms
At small sizes GMP uses an O(N^2) binary style GCD. This is described in many textbooks, for example Knuth section 4.5.2 algorithm B. It simply consists of successively reducing operands a and b using gcd(a,b) = gcd(min(a,b),abs(a-b)), and also that if a and b are first made odd then abs(a-b) is even and factors of two can be discarded.
Variants like letting a-b become negative and doing a different next step are of interest only as far as they suit particular CPUs, since on small operands it's machine dependent factors that determine performance.
The Euclidean GCD algorithm, as per Knuth algorithms E and A, reduces using a mod b but this has so far been found to be slower everywhere. One reason the binary method does well is that the implied quotient at each step is usually small, so often only one or two subtractions are needed to get the same effect as a division. Quotients 1, 2 and 3 for example occur 67.7% of the time, see Knuth section 4.5.3 Theorem E.
When the implied quotient is large, meaning b is much smaller than
a, then a division is worthwhile.  This is the basis for the initial
a mod b reductions in mpn_gcd and mpn_gcd_1 (the latter
for both Nx1 and 1x1 cases).  But after that initial reduction,
big quotients occur too rarely to make it worth checking for them.