Perl Compiler Kit, Version alpha2

		 Copyright (c) 1996, Malcolm Beattie

    This program is free software; you can redistribute it and/or modify
    it under the terms of either:

	a) the GNU General Public License as published by the Free
	Software Foundation; either version 1, or (at your option) any
	later version, or

	b) the "Artistic License" which comes with this kit.

    This program is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See either
    the GNU General Public License or the Artistic License for more details.

    You should have received a copy of the Artistic License with this kit,
    in the file named "Artistic".  If not, you can get one from the Perl
    distribution. You should also have received a copy of the GNU General
    Public License, in the file named "Copying". If not, you can get one
    from the Perl distribution or else write to the Free Software
    Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.

INSTALLATION

(1) You need perl5.002 or perl5.003.

(2) You need to apply a one-line patch to perl itself if you want to
compile and run programs with the C backend which undefine (or
redefine) subroutines. One or two of the programs in perl's own test
suite do this. The patch is in file op.patch. It prevents perl from
calling free() on OPs with the magic sequence number (U16)-1. The
compiler declares all OPs as static structures and uses that magic
sequence number.

(3) Type
    perl Makefile.PL
to write a personalised Makefile for your system. If you want the
bytecode modules to support reading bytecode from strings (instead of
just from files) then add the option
    -DINDIRECT_BGET_MACROS
into the middle of the definition of the CCCMD macro in the Makefile.
Your C compiler may need to be able to cope with Standard C for this.
I haven't tested this option yet with an old pre-Standard compiler.

(4) If your platform supports dynamic loading then just type
    make
and you can then use
    perl -Iblib/arch -MO=foo bar baz
to use the compiler modules (see later for details).
If you need/want instead to make a statically linked perl which
contains the appropriate modules, then type
    make bperl
    make byteperl
and you can then use
    ./bperl -MO=foo bar baz
to use the compiler modules.    
In both cases, the byteperl executable is required for running standalone
bytecode programs. It is *not* a standard perl+XSUB perl executable.

USAGE

With this alpha2 release, the CC backend now works as well as the C
and Bytecode backends (although CC doesn't yet support the whole of
the Perl language and may be less reliable than C or Bytecode). The
file TESTS shows that most of the standard perl test exercise programs
t/*/*.t work with C and Bytecode and about half work with CC.  In any
of the following examples of command-line invocation of perl you'll
need to replace "perl" by
    perl -Iblib/arch
if you have built the extensions for a dynamic loading platform but
haven't installed the extensions completely. You'll need to replace
"perl" by
    ./bperl
if you have built the extensions into a statically linked perl binary.

(1) To compile perl program foo.pl with the C backend, do
    perl -MO=C,-ofoo.c foo.pl
Then use the cc_harness perl program to compile the resulting C source:
    perl cc_harness -O2 -o foo foo.c

If you are using a non-ANSI pre-Standard C compiler that can't handle
pre-declaring static arrays, then add -DBROKEN_STATIC_REDECL to the
options you use:
    perl cc_harness -O2 -o foo -DBROKEN_STATIC_REDECL foo.c
If you are using a non-ANSI pre-Standard C compiler that can't handle
static initialisation of structures with union members then add
-DBROKEN_UNION_INIT to the options you use. If you want command line
arguments passed to your executable to be interpreted by perl (e.g. -Dx)
then compile foo.c with -DALLOW_PERL_OPTIONS. Otherwise, all command line
arguments passed to foo will appear directly in @ARGV.  The resulting
executable foo is the compiled version of foo.pl. See the file NOTES for
extra options you can pass to -MO=C.

There are some constraints on the contents on foo.pl if you want to be
able to compile it successfully. Some problems can be fixed fairly easily
by altering foo.pl; some problems with the compiler are known to be
straightforward to solve and I'll do so soon. The file Todo lists a
number of known problems. See the XSUB section lower down for information
about compiling programs which use XSUBs.

(2) To compile foo.pl with the CC backend (which generates actual
optimised C code for the execution path of your perl program), use
    perl -MO=CC,-ofoo.c foo.pl

and proceed just as with the C backend. You should almost certainly
use an option such as -O2 with the subsequent cc_harness invocation
so that your C compiler uses optimisation. The C code generated by
the Perl compiler's CC backend looks ugly to humans but is easily
optimised by C compilers. Anything involving any of the
perl operators goto, s//e or .. (flip/flop) will not work at the
moment.  To make the most of this compiler backend, you need to tell
the compiler when you're using int or double variables so that it can
optimise appropriately. You currently do that by naming lexical
variables ending in "_i" for ints, "_d" for doubles, "_ir" for int
"register" variables or "_dr" for double "register" variables. Here
"register" is a promise that you won't pass a reference to the
variable into a sub which then modifies the variable. The compiler
ought to catch attempts to use "\$i" just as C compilers catch
attempts to do "&i" for a register int i but it doesn't at the
moment. Bugs in the CC backend (and there may be plenty) will probably
make your program fail in mysterious ways and give wrong answers
rather than just crash in boring ways. But, hey, this is an alpha
release so you knew that anyway. See the XSUB section lower down for
information about compiling programs which use XSUBs.

If your program uses classes which define methods (or other subs which
are not exported and not apparently used until runtime) then you'll
need to use -u compile-time options (see the NOTES file) to force the
subs to be compiled. Future releases will probably default the other
way, do more auto-detection and provide more fine-grained control.

Since compiled executables need linking with libperl, you may want
to turn libperl.a into a shared library if your platform supports
it. For example, with Digital UNIX, do something like
    ld -shared -o libperl.so -all libperl.a -none -lc
and with Linux/ELF, rebuild the perl .c files with -fPIC (and I
also suggest -fomit-frame-pointer for Linux on Intel architetcures),
do "Make libperl.a" and then do
    gcc -shared -Wl,-soname,libperl.so.5 -o libperl.so.5.3 `ar t libperl.a`
and then
    # cp libperl.so.5.3 /usr/lib
    # cd /usr/lib
    # ln -s libperl.so.5.3 libperl.so.5
    # ln -s libperl.so.5 libperl.so
    # ldconfig
When you compile perl executables with cc_harness, append -L/usr/lib
otherwise the -L for the perl source directory will override it. For
example,
    perl -Iblib/arch -MO=CC,-O2,-ofoo3.c foo3.bench
    perl cc_harness -o foo3 -O2 foo3.c -L/usr/lib
    ls -l foo3
    -rwxr-xr-x   1 mbeattie xzdg        11218 Jul  1 15:28 foo3
You'll probably also want to link your main perl executable against
libperl.so; it's nice having an 11K perl executable.

(3) To compile foo.pl into bytecode do
    perl -MO=Bytecode,-ofoo foo.pl
To run the resulting bytecode file foo as a standalone program, you
use the program byteperl which should have been built along with the
extensions.
    ./byteperl foo
Any extra arguments are passed in as @ARGV; they are not interpreted
as perl options. If you want to load chunks of bytecode into an already
running perl program then use the -m option and investigate the
byteload_fh and byteload_string functions exported by the B module.
See the NOTES file for details of these and other options (including
optimisation options and ways of getting at the intermediate "assembler"
code that the Bytecode backend uses).

(3) There are little Bourne shell scripts and perl programs to aid with
some common operations: assemble, disassemble, run_bytecode_test,
run_test, cc_harness, test_harness, test_harness_bytecode.

(4) Walk the op tree in execution order printing terse info about each op
    perl -MO=Terse,exec foo.pl

(5) Walk the op tree in syntax order printing lengthier debug info about
each op. You can also append ",exec" to walk in execution order, but the
formatting is designed to look nice with Terse rather than Debug.
    perl -MO=Debug foo.pl

XSUBS

The C and CC backends can successfully compile some perl programs which
make use of XSUB extensions. [I'll add more detail to this section in a
later release.] As a prerequisite, such extensions must not need to do
anything in their BOOT: section which needs to be done at runtime rather
than compile time. Normally, the only code in the boot_Foo() function is
a list of newXS() calls which xsubpp puts there and the compiler handles
saving those XS subs itself. For each XSUB used, the C and CC compiler
will generate an initialiser in their C output which refers to the name
of the relevant C function (XS_Foo_somesub). What is not yet automated
is the necessary commands and cc command-line options (e.g. via
"perl cc_harness") which link against the extension libraries. For now,
you need the XSUB extension to have installed files in the right format
for using as C libraries (e.g. Foo.a or Foo.so). As the Foo.so files (or
your platform's version) aren't suitable for linking against, you will
have to reget the extension source and rebuild it as a static extension
to force the generation of a suitable Foo.a file. Then you need to make
a symlink (or copy or rename) of that file into a libFoo.a suitable for
cc linking. Then add the appropriate -L and -l options to your
"perl cc_harness" command line to find and link against those libraries.
You may also need to fix up some platform-dependent environment variable
to ensure that linked-against .so files are found at runtime too.

DIFFERENCES

The result of running a compiled Perl program can sometimes be different
from running the same program with standard perl. Think of the compiler
as having a slightly different implementation of the language Perl.
Unfortunately, since Perl has had a single implementation until now,
there are no formal standards or documents defining what behaviour is
guaranteed of Perl the language and what just "happens to work".
Some of the differences below are almost impossible to change because of
the way the compiler works. Others can be changed to produce "standard"
perl behaviour if it's deemed proper and the resulting performance hit
is accepted. I'll use "standard perl" to mean the result of running a
Perl program using the perl executable from the perl distribution.
I'll use "compiled Perl program" to mean running an executable produced
by this compiler kit ("the compiler") with the CC backend.

Loops
    Standard perl calculates the target of "next", "last", and "redo"
    at run-time. The compiler calculates the targets at compile-time.
    For example, the program

        sub skip_on_odd { next NUMBER if $_[0] % 2 }
        NUMBER: for ($i = 0; $i < 5; $i++) {
            skip_on_odd($i);
            print $i;
        }

    produces the output
        024
    with standard perl but gives a compile-time error with the compiler.

Arithmetic
    Compiled Perl programs use native C arithemtic much more frequently
    than standard perl. Operations on large numbers or on boundary
    cases may produce different behaviour.

Deprecated features
    Features of standard perl such as $[ which have been deprecated
    in standard perl since version 5 was released have not been
    implemented in the compiler.

Others
    I'll add to this list as I remember what they are.

BUGS

Here are some things which may cause the compiler problems.

The following render the compiler useless (without serious hacking):
* Use of the DATA filehandle (via __END__ or __DATA__ tokens)
* Operator overloading with %OVERLOAD
* The (deprecated) magic array-offset variable $[ does not work
* goto, s//e and .. (as a scalar flip/flop) are not yet implemented for CC

The following may give significant problems:
* BEGIN blocks containing complex initialisation code
* Code which is only ever referred to at runtime (e.g. via eval "..." or
  via method calls): see the -u option for the C and CC backends.
* Run-time lookups of lexical variables in "outside" closures

The following may cause problems (not thoroughly tested):
* Dependencies on whether values of some "magic" Perl variables are
  determined at compile-time or runtime.
* For the C and CC backends: compile-time strings which are longer than
  your C compiler can cope with in a single line or definition.
* Reliance on intimate details of global destruction
* For the Bytecode backend: high -On optimisation numbers with code
  that has complex flow of control.

There is a terser but more complete list in the Todo file.

Malcolm Beattie
22 August 1996