GCC (GNU Compiler Collection) is essentially divided into a front end and a back end. The main reason for this division is for providing code reusability. As all of us know GCC supports a variety of languages. This includes C, C++, Java etc.
If you want to introduce a new language into GCC, you can. The only thing you have to do is to create a new front end for that language.
The back end is same for all the languages. Different front ends exist for different languages. So creating a compiler in GCC means creating a new front end. Experimentally let us try to introduce a new language into the GCC.
We have to keep certain things in mind before we introduce a language. The first thing is that we are adding a segment to a huge code. For perfect working we have to add some routines, declare some variables etc. which may be required by the other code segments. Secondly some errors produced by us may take the back end into an inconsistent state. Little information will be available to us about the mistake. So we have to go through our code again and again and understand what went wrong. Sometimes this can be accomplished only through trial and error method.
Let me give a general introduction to tree structure and rtl. From my experience a person developing a new small front end need not have a thorough idea regarding tree structure and rtl. But you should have a general idea regarding these.
Tree is the central data structure of the gcc front end. It is a perfect one for the compiler development. A tree node is capable of holding integers, reals ,complex, strings etc. Actually a tree is a pointer type. But the object to which it points may vary. If we are just taking the example of an integer node of a tree, it is a structure containing an integer value, some flags and some space for rtl (These flags are common to all the nodes). There are a number of flags. Some of them are flags indicating whether the node is read only , whether it is of unsigned type etc. The complete information about trees can be obtained from the files - tree.c, tree.h and tree.def. But for our requirement there are a large number of functions and macros provided by the GCC, which helps us to manipulate the tree. In our program we won't be directly handling the trees. Instead we use function calls.
RTL stands for register transfer language. It is the intermediate code handled by GCC. Each and every tree structure that we are building has to be changed to rtl so that the back end can work with it perfectly. I am not trying in anyway to explain about rtl. Interested readers are recommended to see the manual of GCC. As with the trees GCC provides us a number of functions to produce the rtl code from the trees. So in our compiler we are trying to build the tree and convert it to rtl for each and every line of the program. Most of the rtl routines take the trees as arguments and emit the rtl statements.
So I hope that you are having a vague idea about what we are going to do. The tree building and rtl generation can be considered as two different phases. But they are intermixed and can't be considered separately. There is one more phase that the front end is expected to do. It is the preliminary optimization phase. It includes techniques like constant folding, arithmetic simplification etc. which we can handle in the front end. But I am totally ignoring that phase to simplify our front end. Our optimization is completely dependent on the back end. I also assume that you are perfect programmers. It is to avoid error routines from front end. All the steps are taken to simplify the code as much as possible.
From now onwards our compiler has only three phases - lexical, syntax and intermediate code generation. Rest is the head ache of the back end.