Die Zukunft war früher auch besser (Karl Valentin)

A few very nice examples on how GCC optimizes code can be found at ridiculousfish.com.

An example of generalized tail recursion was given, the transformal of

int factorial(int x) {
   if (x > 1) return x * factorial(x-1);
   else return 1;
}

into

int factorial(int x) {
   int result = 1;
   while (x > 1) result *= x--;
   return result;
}

This is a well-known optimization and Iwas wondering whether SBCL also does this optimization. The second declaration becomes

(defun fact2 (x)
 (declare (type fixnum x))
 (let ((res 1))
(loop while (> x 1)
do (setf res (* res (decf x))))))

the firts one becomes

(defun fact1 (x)
(declare (type fixnum x))
(if (> x 1) (* x (fact1 (1- x))) 1))

Of course I have to declare the type here, otherwise it will put code to distinguish between several numeric types. And I did a
(declaim (optimize (speed 3)
(safety 0)
(space 0)
(debug 0)
(compilation-speed 0)))
just to be fair. The second one disassembles into

CL-USER> (disassemble #'fact2)
; disassembly for FACT2
; 248173A4:       BA04000000       MOV EDX, 4                 ; no-arg-parsing entry point
;       A9:       EB18             JMP L2
;       AB: L0:   8BC3             MOV EAX, EBX
;       AD:       83E804           SUB EAX, 4
;       B0:       8BD8             MOV EBX, EAX
;       B2:       895DFC           MOV [EBP-4], EBX
;       B5:       8BF8             MOV EDI, EAX
;       B7:       E8F18E7EFD       CALL #x220002AD            ; GENERIC-*
;       BC:       7302             JNB L1
;       BE:       8BE3             MOV ESP, EBX
;       C0: L1:   8B5DFC           MOV EBX, [EBP-4]
;       C3: L2:   83FB04           CMP EBX, 4
;       C6:       7FE3             JNLE L0
;       C8:       BA0B001022       MOV EDX, 571473931
;       CD:       8BE5             MOV ESP, EBP
;       CF:       F8               CLC
;       D0:       5D               POP EBP
;       D1:       C3               RET
NIL
Somehow I cannot make the GENERIC-* disappear, not even with a the or coerce declaration around the numbers. But the disassembly of the first one is

CL-USER> (disassemble #'fact1)
; disassembly for FACT1
; 24783F0D: L0: L1:83F904           CMP ECX, 4                ; no-arg-parsing entry point
;       10:       7F09             JNLE L3
;       12:       B804000000       MOV EAX, 4
;       17: L2:   8BE5             MOV ESP, EBP
;       19:       5D               POP EBP
;       1A:       C3               RET
;       1B: L3:   894DFC           MOV [EBP-4], ECX
;       1E:       8BC1             MOV EAX, ECX
;       20:       8BD0             MOV EDX, EAX
;       22:       83EA04           SUB EDX, 4
;       25:       8BDD             MOV EBX, EBP
;       27:       8D4424F8         LEA EAX, [ESP-8]
;       2B:       83EC20           SUB ESP, 32
;       2E:       8BCA             MOV ECX, EDX
;       30:       8918             MOV [EAX], EBX
;       32:       8BE8             MOV EBP, EAX
;       34:       E819000000       CALL L5
;       39:       8B4DFC           MOV ECX, [EBP-4]
;       3C:       8BD1             MOV EDX, ECX
;       3E:       8BF8             MOV EDI, EAX
;       40:       E868C387FD       CALL #x220002AD            ; GENERIC-*
;       45:       7302             JNB L4
;       47:       8BE3             MOV ESP, EBX
;       49: L4:   8BC2             MOV EAX, EDX
;       4B:       EBCA             JMP L2
;       4D:       8F4504           POP DWORD PTR [EBP+4]
;       50:       EBBB             JMP L1
;       52: L5:   8F4504           POP DWORD PTR [EBP+4]
;       55:       EBB6             JMP L1
NIL

From 10 to 1A, this seems to check whether the first argument is greater than something (1?), and if so, it jumps to L3, if not, it returns. In L3, it does a lot of things, especially it calls L5, which then jumps to L1, the first line, again. That is, it seems like it really does recursion here.

This is not very nice. Why is it that way?