Last updated November 12, 2000

NAME

DBD::Teradata - Perl DBI Driver for Teradata

SYNOPSIS


     use DBI;
     my $dbh = DBI->connect(
          "dbi:Teradata:some.host.com",
          "user",
          "passwd")
          or die "Cannot connect\n";
     # more DBI calls...

DESCRIPTION

DBI driver for Teradata (See DBI(3) for details).
BE ADVISED: This is BETA software, and subject to change at the whim of the author(s). While every effort has been made to assure conformance to DBI-1.13, some peculiarities of Teradata may not be 100% compatible. In addition, some niceties of Teradata not found in DBI have been included to make this library useful for common Teradata tasks (MACRO and multistatement request support, support for summary rows, etc.). Look here for updates and support.

NOTE: this driver is 100% pure Perl, and requires no CLI, ODBC, or other interface libraries, other than the standard DBI package.

The package is now available on CPAN.

CURRENT VERSION

Release 1.10

CHANGE HISTORY

Release 1.10:

Release 1.00:

DRIVER-SPECIFIC BEHAVIOR

DATA-SOURCE NAME

The dsn string passed to DBI->connect() must be of the following form:

     dbi:Teradata:host[:port]

where

Note that this driver will NOT perform the random selection algorithm for resolving the hostname(via the "hostnameCOPN" convention) when multiple paths are available. You should either use the full explicit hostname (e.g., "DBCCOP1"), the numeric IP address (e.g., "1.2.3.4"), create an alias for the explicit hostname, or, better still, use bind(3) the way God intended.

RunStartup execution is not supported.

CONNECTIONS, SESSIONS AND TRANSACTIONS

Multiple connections (aka sessions) to a Teradata database are supported. Note that the number of sessions in a single process instance is limited, due to the limitations of Perl's select() implementation. Large scale multisession applications should use fork() early and often to permit more than 10 or 20 sessions to execute concurrently.

Currently, the driver operates in Teradata mode (i.e., *not* ANSI mode). That means that DDL statements and multistatement requests implicitly finish a transaction, and AUTOCOMMIT is the default. We don't flag any non-ANSI SQL, either. See the Teradata SQL documents for all the differences between ANSI and Teradata behavior. Eventually, support for selecting a session mode may be added.

Cursor SQL syntax (i.e., ...WHERE CURRENT OF...) is not supported.

We don't cache statements here...the DBMS does that very nicely, thank you.

Session reconnection is not supported...and probably never will be.

Teradata account strings can be provided by simply appending a single comma, followed by the single-quoted account string, to the password string, e.g.,

     use DBI;
     my $dbh = DBI->connect(
          "dbi:Teradata:some.host.com",
          "user",
          "passwd,'\$H&Lmyaccount'")
          or die "Cannot connect\n";
     # more DBI calls...
HELP statements are not supported, due to an apparent bug in the data returned by the DBMS in RECORD mode.

DATA TYPES

The following list maps DBI defined data types to their Teradata equivalent (if applicable):

DBI
Data Type
Teradata
Data Type
SQL_CHAR CHAR
SQL_NUMERIC DECIMAL
SQL_DECIMAL DECIMAL
SQL_INTEGER INTEGER
SQL_SMALLINT SMALLINT
SQL_FLOAT FLOAT
SQL_REAL FLOAT
SQL_DOUBLE FLOAT
SQL_VARCHAR VARCHAR
SQL_DATE DATE
SQL_TIME TIME
SQL_TIMESTAMP TIMESTAMP
SQL_LONGVARCHAR LONG VARCHAR
SQL_BINARY BYTE
SQL_VARBINARY VARBYTE
SQL_LONGVARBINARY LONG VARBYTE
SQL_BIGINT N/A
SQL_TINYINT BYTEINT
SQL_WCHAR N/A
SQL_WVARCHAR N/A
SQL_WLONGVARCHAR N/A
SQL_BIT N/A

NOTE: INTERVAL types are not yet supported.

CAUTION: Floating point and DECIMAL support within the Teradata DBMS requires specific knowledge of the client platform. Platforms the DBMS currently knows about are:

This driver attempts to auto-detect the platform on which it is running as follows:

Character
Set
Integer formatAssumed
Platform
ASCIInot network-order
(MSB last)
Intel
ASCIInetwork-order
(MSB first)
SPARC/MOTOROLA
EBCDICdon't careIBM370/390

Autodetection can be overridden by setting the environment variable TDAT_PLATFORM_CODE to the appropriate platform code:

PlatformTDAT_PLATFORM_CODE
Value
Any Intel X86/Pentium8
Sun SPARC,
Motorola 68XXX,
ATT 3b2
7
VAX9
IBM 370/3903
AMDAHL UTS10
Honeywell4

If your platform's floating point (specifically, double-precision) format does not match that of one of those platforms, AVOID FLOATING POINT VALUES! Substituting DECIMAL or CHARACTER values will often work well, though loss of precision may be a problem.

For some platforms (notably VAXen and mainframes), Teradata returns DECIMALs as binary-coded decimal (BCD) values. For most workstation platforms (Intel and most RISC), DECIMAL is returned as a simple fix-point integer. Currently, DBD::Teradata only converts the latter format. BCD encoded values may eventually be supported if sufficient hue and cry is raised...but it's currently not a priority.

Finally, note that double-byte character sets (i.e, UNICODE, Kanji, etc.) are not supported. You can try 'em, but I wouldn't count on Perl handling them, much less DBI.

PARAMETERIZED SQL

This driver supports both USING clauses and placeholders to implement parameterized SQL; however, they cannot be mixed in the same request. Also, when using placeholders, all parameter datatypes are assumed to be VARCHAR(16) unless explicit datatypes are specified via a bind_param() call, or the environment variable TDAT_PH_SIZE has been defined to another value. However, the driver will adjust the parameter type declaration provided to the DBMS upon execute() so that parameters without explicit type specifications which exceed the VARCHAR(16) size, will be presented to the DBMS as VARCHAR(<actual-param-length>).

MULTI-STATEMENT AND MACRO REQUESTS

Multi-statement and MACRO execution requests are supported. Stored procedures may eventually be supported, but not until the Teradata V2R4.0 release (which adds stored procedure support to the DBMS) has matured.

Reporting the results of multi-statement and MACRO requests presents some special problems. Refer to the DRIVER SPECIFIC ATTRIBUTES section below for detailed descriptions of relevant statement handle attributes. The driver behavior is augmented as follows:

An example of processing multi-SELECT requests:


$sth = $dbh->prepare('SELECT user; SELECT date; SELECT time;');
$names = $sth->{NAME};
$types= $sth->{TYPE};
$precisions = $sth->{PRECISION};
$scales = $sth->{SCALE};
$stmt_info = $sth->{'tdat_stmt_info'};

$sth->execute();
$currstmt = -1;
while ($rows = $sth->fetch_array()) {
	if ($currstmt != $sth->{'tdat_stmt_num'}) {
		print "\n\n";
		$currstmt = $sth->{'tdat_stmt_num'};
		$stmthash = $$stmt_info[$currstmt};
		$starts_at = $$stmthash{'StartsAt'};
		$ends_at = $$stmthash{'EndsAt'};
		for ($i = $starts_at; $i <= $ends_at; $i++) {
			print "$$names[$i] ";
		}
		print "\n";
	}
	for ($i = $starts_at; $i <= $ends_at; $i++) {
		print "$row[$i] ";
	}
}

Yeah, this is kinda ugly...but it keeps DBI happy while providing support for important Teradata features, and lets applications either ignore or adapt to the specializations in a reasonably painless way.

SUMMARIZED SELECT REQUESTS

Like multi-statement and MACRO requests, reporting the results of summarized SELECT requests requires some special processing. Refer to the DRIVER SPECIFIC ATTRIBUTES section below for detailed descriptions of relevant statement handle attributes. The driver behavior is augmented as follows:

An example of processing summarized SELECT:


$sth = $dbh->prepare('SELECT Name FROM Employees WITH AVG(Salary), SUM(Salary)');
$names = $sth->{NAME};
$types= $sth->{TYPE};
$precisions = $sth->{PRECISION};
$scales = $sth->{SCALE};
$stmt_info = $sth->{'tdat_stmt_info'};

$sth->execute();
$currstmt = -1;
while ($rows = $sth->fetchrow_array()) {
	if ($currstmt != $sth->{'tdat_stmt_num'}) {
#
#	new stmt, get its info
#
		print "\n\n";
		$currstmt = $sth->{'tdat_stmt_num'};
		$stmthash = $$stmt_info[$currstmt];
		$starts_at = $$stmthash{'StartsAt'};
		$ends_at = $$stmthash{'EndsAt'};
		$sumstarts = $$stmthash{'SummaryStarts'}; 
		$sumends = $$stmthash{'SummaryEnds'}; 
		$sumrow = $$stmthash{'IsSummary'};
		for ($i = $starts_at; $i <= $ends_at; $i++) {
			print "$$names[$i] ";	# print the column names
		}
		print "\n";
	}
	if (defined($sumrow)) {
#
#	got a summary row, space it
#	NOTE: this example uses simple tabs to space summary fields;
#	in practice, a more rigorous method to precisely align summaries with their
#	respective columns would be used
#
		$sumpos = $$stmthash{'SummaryPosition'};
		$sumposst = $$stmthash{'SummaryPosStart'};
		print "\n-----------------------------------------------------\n";
		for ($i = $$sumstart[$sumrow], $j = $$sumposst[$sumrow]; 
			$i <= $$sumend[$sumrow]; $i++, $j++) {
			print ("\t" x $$sumpos[$j]);	# tab each column for alignment
			print "$$names[$i]: $row[$i]\n";
		}
	}
	else {
#
#	regular row, just print the values
#
		for ($i = $starts_at; $i <= $ends_at; $i++) {
			print "$row[$i] ";
	}
}

Yeah, this is ugly too...but, again, it keeps DBI happy while providing support for important Teradata features. Besides, you can always ignore the summary stuff if you want.

UTILITY SUPPORT

Some support for fastload and fastexport may eventually be implemented. Support for these utility interfaces should be useful for large-scale ETL pipelines, or just simply importing/exporting directly from/to other database systems.

DOUBLE BUFFERING

Double buffering (i.e., issuing a CONTINUE to the DBMS while the application is still fetching data from the last received set of rowdata) is supported, but not yet thoroughly tested. Use at your own risk by defining an environment variable TDAT_NO2BUFS=0.

ERROR HANDLING

DBI does not support the notion of warnings; therefore, the hashref provided by the driver specific statement handle attribute tdat_stmt_info provides a Warning attribute that can be queried to retrieve warning messages.

DIAGNOSTICS

DBI provides the trace() function to enable various levels of trace information. DBD::Teradata uses this trace level to report its internal operation, as well.

DRIVER-SPECIFIC ATTRIBUTES

There are some additional attributes that the user can query a statement handle for:

DRIVER-SPECIFIC FUNCTIONS

In order to make this driver useful for high-performance ETL applications, support for multiple concurrent sessions is needed. Unfortunately, native DBI doesn't currently support the type of asynchronous session interaction needed to efficiently move data to/from a MPP database system (hopefully, when Perl is more thread safe, DBI will remove its current single thread per DBD mutex restriction, and this special function won't be needed). To address this need, a "FirstAvailable" and "Realize" function have been provided.

WARNING: This functionality relies on the select() system I/O call to determine sessions that have pending responses. Perl's implementation of select() limits the useable file descriptor values to the range 0..31. If your application uses a large number of file handles, both for DBMS sessions and for other I/O, some of your sessions may be assigned file descriptors >= 32 and thus be unavailable for processing by FirstAvailable. A future version of DBD::Teradata may remove this limitation by using a multithreaded internal architecture. Until then, applications which intend to use this funtionality should logon all their Teradata sessions before opening additional files, and limit the number of sessions to less than 30.

In addition, some platforms (notably Windows 95 and 98) may limit the total number of TCP/IP connections which you can initiate.

An example use of these functions to bulkload a table:

my $drh;
my @dbhlist;
my @sthlist;
open(IMPORT "$infile") || die "Can't open import file";
binmode IMPORT;

for (my $i = 0; $i < 10; $i++) {
	$dbhlist[$i] = DBI->connect("dbi:Teradata:dbc", "dbc", "dbc");
	if (!defined($drh)) { $drh = $dbhlist[$i]->{Driver}; }
}
my @fa_parms = (\@dbhlist, -1);

for (my $i = 0; $i < $sesscount; $i++) {
	$sthlist[$i] = $dbhlist[$i]->prepare(
		'USING (col1 INTEGER, col2 CHAR(30), col3 DECIMAL(9,2), col4 DATE) ' .
		'INSERT INTO mytable VALUES(?, ?, ?, ?)', {
		tdat_nowait => 1,
		tdat_raw => IndicatorMode
	});
	sysread(IMPORT, $buffer, $len)) {
	$sthlist[$i]->bind_param(1, $buffer);
	$sthlist[$i]->execute();
}

while (sysread(IMPORT, $buffer, $len)) {
	$i = $drh->func(@fa_parms, FirstAvailable);
	$rowcnt = $sthlist[$i]->func(undef, Realize);
	if (!defined($rowcnt)) { 
		print STDERR " ** INSERT failed: " . $sthlist[$i]->errstr() . "\n";
	}
	$sthlist[$i]->bind_param(1, $buffer);
	$sthlist[$i]->execute();
}

while (some statements still active) {
	$i = $drh->func(@fa_parms, FirstAvailable);
	$rowcnt = $sthlist[$i]->func(undef, Realize);
	if (!defined($rowcnt)) { 
		print STDERR " ** INSERT failed: " . $sthlist[$i]->errstr() . "\n";
	}
	$sthlist[$i]->finish();
}

CONFORMANCE

DBD::Teradata requires a minimum Perl version of 5.005, and a minimum DBI version of 1.13.

The following DBI functions are not yet supported:

DBI->data_sources()
$dbh->prepare_cached()
$sth->table_info()
$dbh->tables()
$dbh->type_info_all()
$dbh->type_info()
Also be advised that using either selectall_arrayref() or fetchall_arrayref() is probably a bad idea unless you know the number of rows returned is reasonably small.

SUPPORTED PLATFORMS

The driver has been successfully tested against both MPRAS and NT based DBMS's up to V2R3. The following table lists the client platforms (hardware and O/S) that have been successfully tested to date.
Hardware OS Perl Version
Intel/AMD PC Win98 5.005
(ActivePerl)
Intel PC WinNT 4.0 SP6 5.6
(ActivePerl)
Intel PC Linux 5.??
NCR 4XXX (Intel) MPRAS 5.005
iMac DV (PowerPC) LinuxPPC 5.005
Sun SPARC Solaris 4.3 5.005
IBM RS/6000 AIX 5.005

KNOWN BUGS

Bug Number Report
Date
Release
Reported in
Description Status Fix
Release
1 7/16/2000 0.01 statement handle attributes report field values after preparing a DDL statement Open  
2 7/19/2000 0.01 Bareword symbol in STORE function when trying to set AutoCommit Fixed 0.02
3 7/21/2000 0.01 Undefined symbol value referenced when disconnecting; possibly related to CONTINUE'd datasets Fixed 0.02
4 7/25/2000 0.02 Can't connect on non-Intel (i.e., SPARC, PowerPC, PA-RISC) platforms Fixed 0.04
5 8/9/2000 0.04 Placeholders/USING clauses generate 2655 error on non-Intel platforms Fixed 0.07
6 8/10/2000 0.04 Can't connect on SPARC/Solaris or AIX Fixed 0.07nosocks1
7 7/25/2000 0.02 DBMS Errors not properly reported Fixed 0.05
8 8/16/2000 0.07 Parameterized INSERT of TIMESTAMP fields causes DBMS error. Fixed 0.09
9 9/4/2000 0.08 DECIMAL types with precision > 9 broken on non-Intel platforms. Fixed 0.09
10 11/1/2000 1.00 do() reports error for non-data returning statements. Fixed 1.10
11 11/1/2000 1.00 err() on database handles fails with undefined reference. Fixed 1.10

TIPS & TRICKS

TO DO List

ACKNOWLEDGEMENTS

Thanks to all the alpha testers whose patience and input were invaluable.

REFERENCES

AUTHOR

Dean Arnold

COPYRIGHT

Copyright (c) 2000, Dean Arnold, USA
Affiliations:

Permission is granted to use this software according to the terms of the GNU General Public License. This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

Please note that this license prohibits the incorporation of the source code of this product in publicly distributed, but proprietary products (e.g., something you're trying to sell, as opposed to selling support for). However, proprietary products may invoke DBD::Teradata library functions.

I'm offering the software AS IS, and ACCEPT NO LIABILITY FOR ANY ERRORS OR OMMISSIONS IN, OR LOSSES INCURRED AS A RESULT OF USING DBD::Teradata. If you delete your company's entire data warehouse while using this stuff, you're on your own!

I reserve the right to provide support for this software to individual sites under a separate (possibly fee-based) agreement.

Teradata® is a registered trademark of NCR Corporation.

!!!! PLEASE BE ADVISED !!!!

I am an independent software developer. I am NOT an NCR employee. DBD::Teradata was developed entirely independently of NCR. If you call Global Support about a DBD::Teradata problem, you're likely to get the bum's rush.