Numdiff

by Ivano Primi <ivprimi (a) libero (dot) it>
Last Update: 2006-11-24



News

About

Numdiff (which I will also write numdiff) is a little program that can be used to compare putatively similar files line by line and field by field, ignoring small numeric differences or/and different numeric formats. Equivalently, Numdiff is a program with the capability to appropriately compare files containing numerical fields (and not only). By default, Numdiff assumes the fields are separated by white spaces (blanks, horizontal tabulations and newlines), but the user can also specify its list of separators through the option -s, see the User Manual.

When you compare a couple of such files, what you want to obtain usually is a list of the numerical fields in the second file which numerically differ from the corresponding fields in the first file. Well known tools like diff, cmp or wdiff can not be used to this purpose: they can not recognize whether a difference between two numerical fields is only due to the notation or is actually a difference of numerical values. Moreover, you could also want to ignore differences in numerical values as long as they do not overcome a certain threshold. In other words, you could desire to neglect all small numerical differences too. However, programs like diff and wdiff can not be used to ignore small numerical differences, since they do not even know what a numerical difference is. That is why I decided to implement Numdiff.

In writing this program I was inspired by ndiff, a GPL'ed software by Nelson H. Baabe of the Salt Lake City University, see

http://www.math.utah.edu/~beebe/software/ndiff

Its author had the same good reasons as me to write ndiff. ndiff is actually a good tool and I used it for a while. But I did not completely like the way it works and so numdiff was born. Although ndiff inspired numdiff, they are completely different from the viewpoint of the source code: numdiff has been entirely written from scratch.

I know that many people could find Numdiff simply useless. But people working in Scientific Computing or in Numerical Analysis could find it useful for their job. Since one might compare a file containing the output produced by a given numerical program, when it runs in a certain environment, with another file containing the output produced by the same program but in a different environment, which could mean a different operating system or a different compiler on the same system. Other times one has to compare the output of a numerical program, which is made to solve a certain problem, with the one produced by another program, which solves the same problem but using a different algorithm. Finally, one might compare the output of a numerical program with a sample file containing a list of expected data (which could have been computed theoretically). In all these situations Numdiff could turn out very helpful, since it also lets the user specify a tolerance for absolute and/or relative differences, then reporting only the fields which differ enough to exceed these tolerances.

Sample Output

Because a sample is often more useful than many words... Let us suppose that file1 contains the list of numbers:

  1.25	-3.45		1.23456789E-2   -5.98765432e+5  100.00

while file2 the following one:

  1.250001  -3.450003	1.23456788E-2   -5.98765431e+5  100.000022

We can compare these two files by calling numdiff (the name of the program must be written lower case !) and passing it file1 and file2 as arguments:

  numdiff file1 file2

The output of this command will be:

  ----------------
  ##1       #:1   <== 1.25
                  ==> 1.250001
  @ Absolute error = 1.0000000000e-6, Relative error = 8.0000000000e-7
  ##1       #:2   <== -3.45
                  ==> -3.450003
  @ Absolute error = 3.0000000000e-6, Relative error = 8.6956521739e-7
  ##1       #:3   <== 1.23456789E-2
                  ==> 1.23456788E-2
  @ Absolute error = 1.0000000000e-10, Relative error = 8.1000001393e-9
  ##1       #:4   <== -5.98765432e+5
                  ==> -5.98765431e+5
  @ Absolute error = 1.0000000000e-3, Relative error = 1.6701030958e-9
  ##1       #:5   <== 100.00
                  ==> 100.000022
  @ Absolute error = 2.2000000000e-5, Relative error = 2.2000000000e-7
  
  +++  File "file1" differs from file "file2"

This text should be self-explanatory. The tags ##l and #:f, where l and f are integer numbers, refer respectively to the line number and to the position of the field within the line. Then

  ##1       #:1   <== 1.25
                  ==> 1.250001
  @ Absolute error = 1.0000000000e-6, Relative error = 8.0000000000e-7

means that the first field of the first line is given by 1.25 in the first file, 1.250001 in the second one. The absolute difference between these two numbers is 1.0000000000e-6, while the relative difference is given by 8.0000000000e-7.

Numdiff can also print a sort of statistical report about the numerical differences discovered in the two files. To this end is sufficient to specify the option -S. The output of the command numdiff -S file1 file2 will be:

  ----------------
  ##1       #:1   <== 1.25
                  ==> 1.250001
  @ Absolute error = 1.0000000000e-6, Relative error = 8.0000000000e-7
  ##1       #:2   <== -3.45
                  ==> -3.450003
  @ Absolute error = 3.0000000000e-6, Relative error = 8.6956521739e-7
  ##1       #:3   <== 1.23456789E-2
                  ==> 1.23456788E-2
  @ Absolute error = 1.0000000000e-10, Relative error = 8.1000001393e-9
  ##1       #:4   <== -5.98765432e+5
                  ==> -5.98765431e+5
  @ Absolute error = 1.0000000000e-3, Relative error = 1.6701030958e-9
  ##1       #:5   <== 100.00
                  ==> 100.000022
  @ Absolute error = 2.2000000000e-5, Relative error = 2.2000000000e-7
  
  Largest absolute error in the set of relevant numerical differences:
  1.0000000000e-3
  Corresponding relative error =
  1.6701030958e-9
  Largest relative error in the set of relevant numerical differences:
  8.6956521739e-7
  Corresponding absolute error =
  3.0000000000e-6
  
  +++  File "file1" differs from file "file2"

You can specify an absolute error tolerance (or a relative error tolerance) by the option -a (-r). When the user specify an absolute error tolerance, numdiff only reports the absolute differences exceeding that tolerance. For instance, the output of numdiff -a 1.0e-5 file1 file2 will be

  ----------------
  ##1       #:4   <== -5.98765432e+5
                  ==> -5.98765431e+5
  @ Absolute error = 1.0000000000e-3, Relative error = 1.6701030958e-9
  ##1       #:5   <== 100.00
                  ==> 100.000022
  @ Absolute error = 2.2000000000e-5, Relative error = 2.2000000000e-7
  
  +++  File "file1" differs from file "file2"

Numdiff has many more options and features. You can find a detailed description of them in its User Manual. In particular, Numdiff can also recognize non numerical differences between the files given to it as arguments. If a certain field in at least one of the two files is of non-numerical type, then, instead of doing a numeric comparison, Numdiff will simply do a literal (character by character) comparison.

Installation

To successfully compile, build and install Numdiff some tools are required. The first one is an ANSI C compiler. This compiler should at least accept the option -o in order to place its output in a specified file. Numdiff has been successfully compiled and tested on Slackware GNU/Linux 10.2 with the version 3.3.6 of the GNU C Compiler (gcc), on Slackware GNU/Linux 11 with gcc 3.4.6, and on SunOS 5.8 with the version 2.95.3 of the same compiler.

Moreover, you need a POSIX implementation of the make utility (I used both GNU make and smake by Joerg Schilling to compile Numdiff) and a POSIX implementation of the commands rm and find. At last, you need a proper installation of GNU Texinfo (in order to install the info documentation) and a shell sh-compatible.

Configuration, building and installation of Numdiff can be performed through the standard three steps:

          ./configure
          make
          make install

If you leave enabled the Natural Language Support and you also want to install the localization files (at the moment only the Italian localization is supplied), then, after make, you will have to type and run

          make install-nls

By default, make install will install all the files in /usr/local/bin, /usr/local/info, etc. You can specify an installation prefix other than /usr/local using the option --prefix in the configure step, for instance --prefix=$HOME:

          ./configure --prefix=$HOME

For better control, you can use the options --bindir, --infodir, and so on. Type ./configure --help to obtain the complete list of all the available options.

Anyway, the documentation files, including a full User Manual available in several formats (HTML, PDF and plain ASCII text), will always be put in DOCDIR/numdiff, where DOCDIR is the path specified by the option --docdir or, if this option has not been given to configure, PREFIX/share/doc. Here PREFIX is the installation prefix specified by the option --prefix or the default /usr/local.

Once Numdiff has been installed you can remove all the files previously installed by a simple make uninstall. If you have also installed the localization files trough make install-nls, then, in order to remove also these ones, use make uninstall-nls in place of make uninstall.

Between the options accepted by configure there are --enable-mpa, --enable-hpa, --enable-ldpa, --enable-dpa, --enable-debug, --enable-optimization, and --enable-nls.

The option --enable-debug turns on debugging when compiling the source code. This is obtained by passing to the compiler the -g option, but you can change this default debugging flag (which could not even be recognized by your compiler) by setting the environment variable DBGFLAGS before calling configure.

The option --enable-optimization turns on basic optimization when compiling the source code. This is obtained by passing to the compiler the -O option, but you can change this default flag (which could not even be recognized by your compiler) by setting the environment variable OPTFLAGS before calling configure.

The option --enable-nls turns on Natural Language Support. But you do not need to use it explicitly, since Natural Language Support is enabled by default. However, you can disable it by using --disable-nls. Disabling Natural Language Support is suggested whenever you want to install Numdiff on a system where is not present the GNU gettext library. In this case the installation of Numdiff can be accomplished, for instance, through

          ./configure --disable-nls
          make
          make install

The options --enable-mpa, --enable-hpa, --enable-ldpa, and --enable-dpa are used to enable the support for, respectively, multiple precision arithmetic, high precision arithmetic, long double precision arithmetic and double precision arithmetic. By default, the support for multiple precision arithmetic is enabled if no explicit specification is given.

The support for high precision arithmetic requires the installation of HPAlib (version 1.6 or later), a free (LGPL-ed) library for high precision computations available at the web address

http://savannah.nongnu.org/projects/hpalib

Be careful ! Multiple precision arithmetic is better than high precision arithmetic. The support for high, long double and double precision arithmetic is only provided to allow running Numdiff on very slow computers. Moreover, some of the features of Numdiff, which can be activated through some suitable command line options, are available only if Numdiff has been built with the support for multiple precision arithmetic. In particular, when this support is available, the user can select at runtime, by the option -#, the precision which Numdiff will have to use in doing its computations.

TODO

At the moment Numdiff can only manage text files with an 8-bit encoding (ASCII and ISO 8859-* text files). Sooner or later Numdiff should support UTF-8 (Unicode) encoding.

License

Numdiff (also written numdiff) is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

Numdiff is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with the program; if not, write to the Free Software Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA .

Contact and Bug reports

Bug reports have to be sent to the address <ivprimi (a) libero it> . Please, put Numdiff in the subject and indicate the version of Numdiff you are using, the version of the operating system you are running and, if you know it, the version of the compiler used to build Numdiff. Moreover, you should specify whether Numdiff has been built with the support for double, long-double, high or multiple precision arithmetic.

Download and Documentation

The tar-gzipped archive with the source code of Numdiff can be downloaded from

http://savannah.nongnu.org/download/numdiff

The latest stable release of Numdiff is given by the version 4.0.0 . Together with the source code, the archive contains a very detailed user manual (in English). The manual, which has been written by using GNU Texinfo, is available in the following formats:

Permission is granted to copy, distribute and/or modify this manual under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation; with no Invariant Sections, with the Front-Cover Texts being "Numdiff User Manual, version 4.0", and with no Back-Cover Texts. A copy of the license is always included in the section entitled "GNU Free Documentation License". You can also obtain a copy of the GNU Free Documentation License by writing to the Free Software Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA.

The manual of Numdiff can also be browsed online here.

Acknowledgments

First I want to thank all the people till now involved in the Free Software community, starting from those ones directly involved in the GNU project (http://www.gnu.org). Without their great work, this little one would have never been done.

Moreover, I have to thank Aurelio Marinho Jargas (<verde (a) aurelio net>), author of txt2tags (http://txt2tags.sf.net), a free (GPL'ed) and wonderful text formatting and conversion tool, which I used in writing this web page.

I want to thank also Mr. Norman Clerman of Opcon Associates, Inc. for several suggestions he gave me to improve the readability and the effectiveness of the output produced by Numdiff. I have to give him credit for the urge to prepare the version 4.0.0 of Numdiff.

A special thank to my friend Mariapia Palombaro since she removed some errors while reviewing the first version of the Numdiff User Manual.