From Curtis J Blank on Mon, 04 Sep 2000
Answered by: Jim Dennis
Thanks for the answer, that did not dawn on me, I'm perfectly aware of how things exist in an environment and the need to export them. I guess I'd have to say it didn't dawn on me because of the fact that it works in a ksh environment on Solaris and Tru64 UNIX and I wasn't thinking along the lines of forked processes.
You were observing the behavior without understanding the underlying mechanisms.
I'm curious as to why it does work there though, what magic is the shell doing so that the variables exist that were used in the read when the forked read process no longer exists and control returns to the parent? Is the shell transposing the two commands and doing the read in the context of the parent and forking the function so that the variables remain? ...
You still don't understand.
A pipe operator (|) indicates a fork(). However, it doesn't necessitate an exec*(). External commands require an exec*().
In the cases of newer ksh (Korn '93 and later?) and zsh the fork() is on the left of the pipe operator. That is to say that the commands on the left of the operator are performed by a child process. In the other cases the commands on the right are performed by the child. In either case the child executes the commands and exits. Meanwhile the parent parent executes the other set of commands and continues to live.
Thus the question is whether the parent (current process) or a child will be sending data into pipes or reading data from each pipe.
Arguably it makes more sense for the parent to receive data from the children, as the data is likely to be of persistent use. Of course it also stands to reason that we may want to read the data into a variable --- or MORE IMPORTANTLY into a list of variables. This is why the Korn shell (and zsh) model is better.
In the case of a single variable we can always just restructure the command into a set of backtick operators (also known as a "command substitution expression"). For example:
foo | read bar
... can always be expressed as:
bar=$( foo ) # (or bar=`foo` in older shells and csh)
However this doesn't work for multiple variables:
foo | read bar bang
... cannot be written in any command substitution form. Thus we end up trying to execute the rest our script inside of the subshell (enclosing the 'read bar bang' command in a set of braces or parentheses to group it with a series of other commands in our subshell), or we resort of saving all of command 'foo's output into one variable and and fussing with it. That greatly limits the flexibility of the 'read' command and makes the IFS (inter-field separator: a list of characters on which token splitting will be done for the read command) variable almost worthless.
One way to handle this would be to write the output of 'foo' to a temporary file, and then read with with simple re-direction:
foo > /tmp/somefile.$$ ; read bar bang < /tmp/somefile.$$
... but this introduces a host of potential race conditions and security issues; requires that we clean up the temp file, suggests that we should create a 'trap' (signal handler) to perform the cleanup even if we are hit with a deadly signal, and is generally inelegant. We could also create a named pipe, but that has most of the same problems as a temporary file.
So we end up using the process subsitution expression as I described (and as you mention below):
... The real use of this technique is in the example script given that includes the function. I was able to get it to work when I did it per your suggestion:
read a b c < <( dafunc )
Of course. This is the same as 'read a b c < /tmp/somefile.$$' except that we are substituting a file for a filename. Thus the <( ... ) expression returns a filename. That file is a virtual file --- that is to say that it is a file descriptor connected to another process (just like a pipe) but it can be represented (on many UNIX systems, including Linux) as an entry under /def/fd/ (the "file descriptor" directory). Under Linux /dev/fd/ is a symlink to /proc/self/fd. Under other forms of UNIX it might have different underlying mechanics. It might actually appear as a directory with a bunch of character mode device nodes or it might be some sort of virtual filesystem (like /proc, /dev/pts, etc).
I still think that bash should switch to the Korn shell semantics. The <(...) is sufficient to provide the features. However, it seems to be unique to bash. For bash to offer the best portability it seems that it should conform to the latest Korn shell design. (BTW: If the switch was to break some script that depended on the old semantics, on the subshell "leaning" to the right --- than that script was already broken under different versions of ksh. However, I could certainly see a good argument for having a shell option (shopt?) that would set this back to the old semantics if that was necessary. I have yet to see a case where the old semantics are actually more desirable than the new ones --- but I haven't really tried to find one either.
|1 2 3 4 5 6 7|