Title: Machine-readable debian/copyright
DEP: 5
State: DRAFT
Date: 2009-03-22
Drivers: Steve Langasek <vorlon@debian.org>
URL: http://dep.debian.net/deps/dep5
License:
 Copying and distribution of this file, with or without modification,
 are permitted in any medium without royalty provided the copyright
 notice and this notice are preserved.
Abstract:
 Establish a standard, machine-readable format for debian/copyright
 files within packages, to facilitate automated checking and
 reporting of licenses for packages and sets of packages.
  1. Introduction
  2. Rationale
  3. Compatibility and Human-Readability
  4. Implementation

Introduction

This is a proposal to make debian/copyright machine-interpretable. This file is one of the most important files in Debian packaging, yet there is currently no standard format defined for it and its contents vary tremendously across packages, making it difficult to automatically extract licensing information.

This is not a proposal to change the policy in the short term.

Rationale

The diversity of free software licenses means that Debian needs to care not only about the freeness of a given work, but also its license’s compatibility with the other parts of Debian it uses.

The arrival of the GPL version 3, its incompatibility with version 2, and our inability to spot the software where the incompatibility might be problematic is one prominent occurrence of this limitation.

There are earlier precedents, also. One is the GPL/OpenSSL incompatibility. Apart from grepping debian/copyright, which is prone to numerous false positives (packaging under the GPL but software under another license) or negatives (GPL software but with an “OpenSSL special exception” dual licensing form), there is no reliable way to know which software in Debian might be problematic.

And there is more to come. There are issues with shipping GPLv2-only software with a CDDL operating system such as Nexenta. The GPL version 3 solves this issue, but not all GPL software can switch to it and we have no way to know how much of Debian should be stripped from such a system.

A user might want to have a way to avoid software with certain licenses they have a problem with, even if the licenses are DFSG-free. For example, the Affero GPL.

Compatibility and Human-Readability

The file must be encoded as UTF-8 and strictly formatted as a superset of RFC2822 including significant newlines. Free-form text is not allowed.

The debian/copyright file must be machine-interpretable, yet human-readable, while communicating all mandated upstream information, copyright notices and licensing details.

For the sake of human-readability this proposal avoids any complex field names or syntax rules.

Implementation

Sections

Header Section (Once)

Examples:

Format-Specification: http://svn.debian.org/wsvn/dep/web/deps/dep5.mdwn?op=file&rev=135
Name: SOFTware
Maintainer: John Doe <john.doe@example.com>
Source: http://www.example.com/software/project

Format-Specification: http://svn.debian.org/wsvn/dep/web/deps/dep5.mdwn?op=file&rev=135
Name: xyz
Maintainer: Jane Smith <jane.smith@example.com>
Source: http://www.example.com/gitwww

Files Section (Repeatable)

The declaration of copyright and license for files is done in one or more stanzas.

Example:

Files: *
Copyright: 2008, John Doe <john.doe@example.com>
           2007, Jane Smith <jane.smith@example.com>
License: PSF-2
 [LICENSE TEXT]

Standalone License Section

Where a set of files are dual (tri, etc) licensed, or when the same license occurs multiple times, you can use a single line License field and standalone License stanzas to expand the license short names.

Example 1 (tri-licensed files).

Files: src/js/editline/*
Copyright: 1993, John Doe
           1993, Joe Average
License: MPL-1.1 or GPL-2 or LGPL-2.1

License: MPL-1.1
 [LICENSE TEXT]

License: GPL-2
 [LICENSE TEXT]

License: LGPL-2.1
 [LICENSE TEXT]

Example 2 (recurrent license).

Files: src/js/editline/*
Copyright: 1993, John Doe
           1993, Joe Average
License: MPL-1.1

Files: src/js/fdlibm/*
Copyright: 1993, J-Random Corporation
License: MPL-1.1

License: MPL-1.1
 [LICENSE TEXT]

Extra fields.

Extra fields can be added to any section. Their name starts by X-.

Fields Detail

Files

Format

The Files field contains a list of comma-separated patterns

Files: foo.c, bar.*, baz.[ch]

File names containing spaces or commas should be put within double quotes. The backslash character is an escaping character, be it inside or outside double quotes:

Files: "Program Files/*", manual[english].txt

Syntax

Patterns are handled as by the find utility’s -name option. Patterns containing a path separator (“/”) are handled as by the find utility’s -path option.

The following matches all Makefile.am files in the tree and all Python scripts:

Files: */Makefile.am, *.py

But this will only match the top-level Makefile.am:

Files: ./Makefile.am

For the first example, the equivalent find command would be:

find . -path "*/Makefile.am" -o -name "*.py"

It is quite common for a work to have files with copyright held by different parties and received under different licenses. To accommodate this, multiple stanzas are allowed with different Files declarations.

However it makes for easier reading if the copyright file lists the “main” license first: the one matching the “top level” of the work, with others listed as exceptions. To allow this, the following precedence rule applies for matching files: If multiple Files declarations match the same file, then only the last match counts.

As a result, it is recommended for clarity that the stanzas appear in order from most general (e.g. Files: *) first, through to most specific. In the following example, the file getopt.c matches both Files: * and Files: getopt.*; only the last match counts, so the file getopt.c has the license declaration License: BSD.

Files: *
Copyright: 2003-2005, John Doe <jdoe@xample.com>
License: [the main work's license]
 [LICENSE TEXT]

Files: getopt.*
Copyright: 2000, The Corporation Foundation, Inc.
License: BSD
 [LICENSE TEXT]

License

Short name

Much of the value of a machine-parseable copyright file lies in being able to correlate the licenses of multiple pieces of software. To that end, this spec defines standard short names for a number of commonly used licenses, which can be used in the first line of a “License” field.

These short names have the specified meanings across all uses of this file format, and must not be used to refer to any other licenses. Parsers may thus rely on these short names referring to the same licenses wherever they occur, without needing to parse or compare the full license text.

From time to time, licenses may be added to or removed from the list of standard short names. Such changes in the list of short names will always be accompanied by changes to the recommended Format-Specification value. Implementors who are parsing copyright files should take care not to assume anything about the meaning of license short names for unknown Format-Specification versions.

Use of a standard short name does not override the Debian Policy requirement to include the full license text in debian/copyright, nor any requirements in the license of the work regarding reproduction of legal notices. This information must still be included in the License field, either in a stand-alone license section or in the relevant files section.

For licenses which have multiple versions in use, the version number is added, using a dash as a separator. If omitted, the lowest version number is implied. When the license grant permits using the terms of any later version of that license, the short name is finished with a plus sign.

keyword meaning
Apache Apache license. For versions, consult the Apache Software Foundation.
Artistic Artistic license. For versions, consult the Perl Foundation
BSD Berkeley software distribution license
FreeBSD FreeBSD Project license
ISC Internet Software Consortium’s license, sometimes also known as the OpenBSD License
CC-BY Creative Commons Attribution license
CC-BY-SA Creative Commons Attribution Share Alike license
CC-BY-ND Creative Commons Attribution No Derivatives
CC-BY-NC Creative Commons Attribution Non-Commercial
CC-BY-NC-SA Creative Commons Attribution Non-Commercial Share Alike
CC-BY-NC-ND Creative Commons Attribution Non-Commercial No Derivatives
CC0 Creative Commons Universal waiver
CDDL Common Development and Distribution License. For versions, consult Sun Microsystems.
CPL IBM Common Public License. For versions, consult the IBM Common Public License (CPL) Frequently asked questions.
Eiffel The Eiffel Forum License. For versions, consult the Open Source Initiative
Expat The Expat license
GPL GNU General Public License
LGPL GNU Lesser General Public License, (GNU Library General Public License for versions lower than 2.1)
GFDL GNU Free Documentation License
GFDL-NIV GNU Free Documentation License, with no invariant sections
LPPL LaTeX Project Public License
MPL Mozilla Public License. For versions, consult Mozilla.org
Perl Perl license (equates to “GPL-1+ or Artistic-1
PSF Python Software Foundation license. For versions, consult the Python Software Foundation
QPL Q Public License
W3C-Software W3C Software License. For more information, consult the W3C Intellectual Rights FAQ and the 20021231 W3C Software notice and license
ZLIB zlib/libpng license
Zope Zope Public License. For versions, consult Zope.org
other Any other custom license. License notice text must be copied verbatim.

Exceptions and clarifications are signalled in plain text, by appending “with ‘’‘keywords’‘’ exception” to the short name. This document provides a list of keywords that refer to the most frequent exceptions.

The GPL “Font” exception refers to the text added to the license notice of each file as specified at How does the GPL apply to fonts?. The precise text corresponding to this exception is:

As a special exception, if you create a document which uses this font, and embed this font or unaltered portions of this font into the document, this font does not by itself cause the resulting document to be covered by the GNU General Public License. This exception does not however invalidate any other reasons why the document might be covered by the GNU General Public License. If you modify this font, you may extend this exception to your version of the font, but you are not obligated to do so. If you do not wish to do so, delete this exception statement from your version.

The GPL “OpenSSL” exception gives permission to link GPL-licensed code with the OpenSSL library, which contains GPL-incompatible clauses. For more information, see “The -OpenSSL License and The GPL” by Mark McLoughlin and the message “middleman software license conflicts with OpenSSL” by Mark McLoughlin on the debian-legal mailing list. The text corresponding to this exception is:

In addition, as a special exception, the copyright holders give permission to link the code of portions of this program with the OpenSSL library under certain conditions as described in each individual source file, and distribute linked combinations including the two.

You must obey the GNU General Public License in all respects for all of the code used other than OpenSSL. If you modify file(s) with this exception, you may extend this exception to your version of the file(s), but you are not obligated to do so. If you do not wish to do so, delete this exception statement from your version. If you delete this exception statement from all source files in the program, then also delete it here.

* Problematic Licenses*

The following license names are well known, but ambiguously refer to a number of different licenses, in ways not easily addressed by the use of version numbers or exception qualifiers. They are therefore not included as standard short names in this specification.

license name problem
MIT Several variants of the MIT license exist: (1) the standard version with three paragraphs (blanket permission, keep this notice, NO WARRANTY), (2) a version with a no-endorsement clause, and (3) other versions with slight wording differences.
PD Being in the public domain is not a license. See Linux journal article “Why the Public Domain Isn’t a License” by Lawrence Rosen. If the work is truly public domain, this should be stated in the copyright field.
PHP The PHP license contains terms which can only reasonably be met when applied to the PHP language itself. Since software other than PHP which uses this license will be rejected from Debian, there’s no need for a shared standard keyword.

Syntax

License names are case-insensitive.

In case of multi-licensing, the license short names are separated by or when the user can chose between different licenses, and by and when use of the work must simultaneously comply with the terms of multiple licenses.

For instance, this is a simple, “GPL version 2 or later” field:

    License: GPL-2+

This is a dual-licensed GPL/Artistic work such as Perl:

    License: GPL-2+ or Artistic-2.0

This is for a file that has both GPL and classic BSD code in it:

    License: GPL-2+ and BSD

For the most complex cases, the comma is used to disambiguate the priority of ors and ands: and has the priority over or, unless preceded by a comma. For instance:

A or B and C means A or (B and C). A or B, and C means (A or B), and C.

This is for a file that has Perl code and classic BSD code in it:

    License: GPL-2+ or Artistic-2.0, and BSD

A GPL-2+ work with the OpenSSL exception is in effect a dual-licensed work that can be redistributed either under the GPL-2+, or under the GPL-2+ with the OpenSSL exception. It is thus expressed as GPL-2+ with OpenSSL exception:

    License: GPL-2+ with OpenSSL exception
     This program is free software; you can redistribute it
     and/or modify it under the terms of the GNU General Public
     License as published by the Free Software Foundation; either
     version 2 of the License, or (at your option) any later
     version.
     .
     In addition, as a special exception, the author of this
     program gives permission to link the code of its 
     release with the OpenSSL project's "OpenSSL" library (or
     with modified versions of it that use the same license as
     the "OpenSSL" library), and distribute the linked
     executables. You must obey the GNU General Public 
     License in all respects for all of the code used other 
     than "OpenSSL".  If you modify this file, you may extend
     this exception to your version of the file, but you are
     not obligated to do so.  If you do not wish to do so,
     delete this exception statement from your version.
     .
     This program is distributed in the hope that it will be
     useful, but WITHOUT ANY WARRANTY; without even the implied
     warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR
     PURPOSE.  See the GNU General Public License for more
     details.
     .
     You should have received a copy of the GNU General Public
     License along with this package; if not, write to the Free
     Software Foundation, Inc., 51 Franklin St, Fifth Floor,
     Boston, MA  02110-1301 USA
     .
     On Debian systems, the full text of the GNU General Public
     License version 2 can be found in the file
     `/usr/share/common-licenses/GPL-2'.

Implementation

It is proposed to implement this proposal in pseudo-RFC-822 format (the one of debian/control). However, other syntaxes could be used, such as YAML.

Examples in pseudo-RFC-822 format

Simple

A possible copyright file for the program ‘X Solitaire’ distributed in the Debian source package xsol:

    Format-Specification: http://svn.debian.org/wsvn/dep/web/deps/dep5.mdwn?op=file&rev=135
    Name: X Solitaire
    Source: ftp://ftp.example.com/pub/games

    Copyright: 1998, John Doe <jdoe@example.com>
    License: GPL-2+
     This program is free software; you can redistribute it
     and/or modify it under the terms of the GNU General Public
     License as published by the Free Software Foundation; either
     version 2 of the License, or (at your option) any later
     version.
     .
     This program is distributed in the hope that it will be
     useful, but WITHOUT ANY WARRANTY; without even the implied
     warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR
     PURPOSE.  See the GNU General Public License for more
     details.
     .
     You should have received a copy of the GNU General Public
     License along with this package; if not, write to the Free
     Software Foundation, Inc., 51 Franklin St, Fifth Floor,
     Boston, MA  02110-1301 USA
     .
     On Debian systems, the full text of the GNU General Public
     License version 2 can be found in the file
     `/usr/share/common-licenses/GPL-2'.

    Files: debian/*
    Copyright: 1998, Jane Smith <jsmith@example.net>
    License:
     [LICENSE TEXT]

Complex

A possible copyright file for the program ‘Planet Venus’, distributed in the Debian source package planet-venus:

    Format-Specification: http://svn.debian.org/wsvn/dep/web/deps/dep5.mdwn?op=file&rev=135
    Name: Planet Venus
    Maintainer: John Doe <jdoe@example.com>
    Source: http://www.example.com/code/venus

    Copyright: 2008, John Doe <jdoe@example.com>
               2007, Jane Smith <jsmith@example.org>
               2007, Joe Average <joe@example.org>
               2007, J. Random User <jr@users.example.com>
    License: PSF-2
     [LICENSE TEXT]

    Files: debian/*
    Copyright: 2008, Dan Developer <dan@debian.example.com>
    License:
     Copying and distribution of this package, with or without
     modification, are permitted in any medium without royalty
     provided the copyright notice and this notice are
     preserved.

    Files: debian/patches/theme-diveintomark.patch
    Copyright: 2008, Joe Hacker <hack@example.org>
    License: GPL-2+
     [LICENSE TEXT]

    Files: planet/vendor/compat_logging/*
    Copyright: 2002, Mark Smith <msmith@example.org>
    License: MIT
     [LICENSE TEXT]

    Files: planet/vendor/httplib2/*
    Copyright: 2006, John Brown <brown@example.org>
    License: MIT2
     Unspecified MIT style license.

    Files: planet/vendor/feedparser.py
    Copyright: 2007, Mike Smith <mike@example.org>
    License: PSF-2
     [LICENSE TEXT]

    Files: planet/vendor/htmltmpl.py
    Copyright: 2004, Thomas Brown <coder@example.org>
    License: GPL-2+
     This program is free software; you can redistribute it
     and/or modify it under the terms of the GNU General Public
     License as published by the Free Software Foundation; either
     version 2 of the License, or (at your option) any later
     version.
     .
     This program is distributed in the hope that it will be
     useful, but WITHOUT ANY WARRANTY; without even the implied
     warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR
     PURPOSE.  See the GNU General Public License for more
     details.
     .
     You should have received a copy of the GNU General Public
     License along with this package; if not, write to the Free
     Software Foundation, Inc., 51 Franklin St, Fifth Floor,
     Boston, MA  02110-1301 USA
     .
     On Debian systems, the full text of the GNU General Public
     License version 2 can be found in the file
     `/usr/share/common-licenses/GPL-2'.

Appendix: Note about the use of this format in Debian

The Debian Policy (§12.5) demands that each package is accompanied by a file, debian/copyright in source packages and /usr/share/doc/package/copyright in binary packages, that contains a verbatim copy of its copyright and distribution license. In addition, it requires that copyrights must be extractable by mechanical means. This proposal for machine-readable copyright and license summary files has been crafted for Debian’s use, but it is our hope that other software distributions, as well as upstream developers will adopt it, so that review efforts can be easily reproduced and shared.

The copyright of the Debian packaging and the history of package maintainers is simply indicated in a Files: debian/* section.

The Policy section §12.5 demands that packages distributed in the ‘’non-free’’ and ‘’contrib’’ sections of the Debian archive carry a disclaimer in debian/copyright that reminds that these packages are not part of the Debian operating system, and explain why they can not be distributed in the ‘’main’’ section. The Disclaimer field was created for this purpose.

For a ‘’non-free’’ package to be autobuilt, debian/copyright must contain an explanation that autobuilding is not forbidden (see 20061129152824.GT2560@mails.so.argh.org). It is proposed to use an extra field in the header, with name X-Autobuild, that would contain yes in the first line and the explanation in the others.