swconv is a utility to make moving large directory trees of web content from one virtual server (or directory) to another easier.
Every once in a while, the need arises to move web content from on virtual server to another. When that happens, the publisher or webmaster is left with the daunting task of changing/fixing/updating lots of hyperlinks that broke as a result of the move.
Unfortunately, the process of fixing those links is quite tedious and error prone when attempted manually. One must be sure to check (and possibly update references on the old an new virtual server. One must check all places where there might have been links to the moved content, the moved content itself may have broken links to content that used to be ``relative'' (because it was on the old virtual server), and so on.
swconv makes this process easier. It is designed to be run after the web content has been moved from the old virtual server to the new one and before any manual attempts to update the links have been attempted.
swconv scans for links embedded in these tags:
<IMG...> <A...> <LINK...> <FORM...> <INPUT...>
You will need to give swconv a fair amount of information about your setup. This can be accompished either with command-line options (see ``OPTIONS'' below) or via a configuration file specified with the ``--config'' option (not yet implemented).
One may simply invoke swconv with some command-line options (see ``OPTIONS'' below) to customize its behavior and the path to the directory which needs to be processed.
After reading through the available options, you should read the ``EXAMPLES'' section for ideas of how to use swconv when.
The available options are:
--logfile ``/path/to/file''
NOT IMPLEMENTED YET
The location and file in which you'd like swconv to put messages. If a log
file is specified, nothing will be printed to STDOUT unless a problem
creeps up. If you specify the ``--verbose
'' flag, it will increase the verbosity of the messgaes put into this file.
--debug
Enable the output of debugging messages to STDOUT. This is really only
useful for development or if you are trying to track down a problem. The
debugging messages should give you a good idea of what is happening as
swconv goes through its motions. Debugging is disabled by default.
Debugging is not automatically redirected to a log file when the ``--logfile
'' option is used. You must redirect debugging messages from the shell if
you wish to capture them.
--backup
Enable the creation of backup files. Every file that is modified will be
backed up by appending a backup suffix to the full name of hte file (see
the ``--backup-suffix
'' option, below). Backups are disabled
by default.
--backup-suffix "blah"
NOT IMPLEMENTED YET
Use this as the backup suffix. (See the ``--backup
'' option, above) for details about what this does. The default value of
backup-suffix is ``.bak''
--test
Tell swconv to run in test mode. It will go through all the motions, respecting other swtiches you've set, but when it comes to commit the changes to disk, it will not rewrite any files. This is useful for pre-scanning directories to see what the impact of running swconv will be.
Please use this flag when you're testing things. Depending on the ``--backup
'' option isn't always what you want to do.
--old-url-prefix ``http://www.some.thing/foo/''
Required
Tell swconv the URL prefix of the old content location. This defaults to null, so is required that you set this option.
--new-url-prefix ``http://www.new.thing/bar/''
Required
Tell swconv the URL prefix of the new content location. This defaults to null, so it is required that you set this option.
--dir-is-source
Tell swconv that the directory you've specified is the source (meaning that
it may contain links to the moved content). Source is where links start.
This cannot be set if ``dir-is-destination
''is set or an error will result.
--dir-is-destination
Tell swconv that the directory you've specified is the destination (meaning
that it contains the moved content). Destination is where the links will
go. This cannot be set if ``dir-is-source
'' is set or an error will result.
--help
Display a brief help message to remind you of command-line arguments. This won't happen by default.
directory
The directory name/path that you specify may contain either ``/
'' or ``\
'' as path separators (Unix or DOS style). They will be converted properly
internally.
/www/companyX/products
to
/www/companyZ/products
, which are hosted as http://www.companyX.com/products
and http://www.companyZ.com/products
respectively.
You move the content and remove the /www/companyX/products directory
.
There may be links in /www/companyX/about_us
which refer to pages in what is now /www/companyZ/products
, but the links don't know that.
To update any links in /www/companyX/about_us
, do the following:
cd /www/companyX swconv --dir-is-source \ --old-url-prefix "http://www.companyX.com/products/" \ --new-url-prefix "http://www.companyZ.com/products/" \ --backup about_us
There may also be links in /www/comapnyZ/products
which refer to pages in what is now on a different virtual server, the
stuff still sitting in /www/companyX/
.
To adjust those links, do the following:
cd /www/companyZ swconv --dir-is-destination \ --old-url-prefix "http://www.companyX.com/products/" \ --new-url-prefix "http://www.companyZ.com/products/" \ --backup products
Both of those sets of commands will produce backup files in those directories. Only files which change are backed up.
It's a bit too late in the evening for me to come up with one, so let me know if you think of anything.
I should add a ``--sleep-between-files
'' option that can call
sleep()
between processing files. This will make it less of a CPU hog at the
expense of the process taking a bit longer.
swconv can only process one directory tree at a time. You may need to write a small script to have it hit lots of peer directory trees (util I add the necessary code to handle multiple directory names on the command-line).
Also, this may or may not be a bug, depending on your point of view. swconv does not parse the HTML file(s).
It simply applies some clever regexes to the
file based on the command-line arguments you supply. It does not know about embedded code such as
JavaScript. Commented chunks of HTML may get adjusted when you don't want them to.
swconv may be used, copied, and re-distributed under the same terms and conditions as Perl.