Network Working Group D. Brasher Internet-Draft Interlinux LTD Intended status: Informational November 26, 2009 Expires: May 30, 2010 Distributed Internet Archive Protocol (DIAP) draft-brasher-diap-11 Abstract DIAP has been created to solve mid-range and below, long term archiving requirements of the small medium enterprise. Where tape has been deployed in the past, DIAP now offers an alternative solution designed to be more robust and manageable in the long term than network attached storage devices or simple disk storage alone. The system provides a well defined structure for storing and managing long term archives. Status of this Memo This Internet-Draft is submitted to IETF in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. This Internet-Draft will expire on May 30, 2010. Copyright Notice Copyright (c) 2009 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Brasher Expires May 30, 2010 [Page 1] Internet-Draft Distributed Internet Archive Protocol November 2009 Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the BSD License. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 2. Architecture . . . . . . . . . . . . . . . . . . . . . . . . . 3 2.1. Disk storage structure . . . . . . . . . . . . . . . . . . 3 2.1.1. DIAP storage units . . . . . . . . . . . . . . . . . . 3 2.1.2. Replication . . . . . . . . . . . . . . . . . . . . . 4 2.1.3. Node roles . . . . . . . . . . . . . . . . . . . . . . 4 2.1.4. Archive date range . . . . . . . . . . . . . . . . . . 4 2.1.5. Storage structure tables . . . . . . . . . . . . . . . 4 2.2. Data transfer mechanisms . . . . . . . . . . . . . . . . . 6 2.2.1. Lowest maximum bandwidth (LMB) . . . . . . . . . . . . 7 2.2.2. Data transfer timing . . . . . . . . . . . . . . . . . 8 2.2.3. Phases . . . . . . . . . . . . . . . . . . . . . . . . 8 2.2.4. Data flow, table . . . . . . . . . . . . . . . . . . . 8 2.2.5. Hyper virtual autochanger (HVA) . . . . . . . . . . . 10 2.2.6. Copy types . . . . . . . . . . . . . . . . . . . . . . 11 2.2.7. Leap years . . . . . . . . . . . . . . . . . . . . . . 11 2.2.8. Fill mechanism . . . . . . . . . . . . . . . . . . . . 11 3. Security considerations . . . . . . . . . . . . . . . . . . . 11 3.1. Passwords . . . . . . . . . . . . . . . . . . . . . . . . 11 3.2. User space . . . . . . . . . . . . . . . . . . . . . . . . 12 3.3. Application layer . . . . . . . . . . . . . . . . . . . . 12 3.4. Checksum . . . . . . . . . . . . . . . . . . . . . . . . . 12 3.5. Virtual private network . . . . . . . . . . . . . . . . . 12 3.6. Encrypted partitions, logical volumes and volumes . . . . 12 4. Community project and UK trademarks . . . . . . . . . . . . . 12 5. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . 12 6. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 12 7. Change log . . . . . . . . . . . . . . . . . . . . . . . . . . 13 8. Informative references . . . . . . . . . . . . . . . . . . . . 13 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 14 Brasher Expires May 30, 2010 [Page 2] Internet-Draft Distributed Internet Archive Protocol November 2009 1. Introduction The architecture of DIAP has been designed to; maximise archive storage capacity, availability, restoration and recovery speed, scalability, modularity in code and storage resilience; to minimise operating resource overheads, impact of network outages and management overheads; to simplify the code development cycle, deployment, data recovery and integration with existing systems. DIAP architecture consists of two main parts: The disk storage structure and the data transfer mechanisms. The data transfer mechanisms in turn consists of four algorithms. An algorithm responsible for collecting data and three that are collectively known as the HVA (hyper virtual auto-changer). Data transfer calculations are based on LMB (lowest maximum bandwidth) between storage nodes. An DIAP node can be located on any computer, usually a server. Nodes are named and allocated roles which are abstract and can be changed. DIAP is designed to operate on three distinct operating system nodes and at the same time allowing more than one instance of DIAP to exist on each node. DIAP I/O operation is asynchronous (non-blocking) but the architecture design co-ordinates and records transfer activity. 2. Architecture 2.1. Disk storage structure 2.1.1. DIAP storage units DIAP storage units are approximately equivalent to the volume created on tape.Most backup software products generate something equivalent to the tape volume on disk. These disk volumes are organised to minimise the duplication of data to conserve space into full, differential and incremental types. For constant data streams, for example CCTV footage, the volumes are all effectively type full. DIAP is designed to store full and differential types. The disk storage structure is designed to store a full once a month and its corresponding differentials or a full followed by x number of months of matching differential volumes. Some control is available when choosing a name for newly generated volumes. If the type of data is streamed and cannot be differentiated against a full then the option to not collect full volumes is available. Then all volumes are full but archived in the same way as differentials. Brasher Expires May 30, 2010 [Page 3] Internet-Draft Distributed Internet Archive Protocol November 2009 2.1.2. Replication The disk structure ensures volumes are replicated across three operating systems which can be located within IP sight of each other over LAN, WLAN or WAN. This provides geographical resilience. The structure on the disk is organised into slots which can be directories. 2.1.3. Node roles The nodes are are allocated one of three roles; A, B or C. The roles a machines allocated can be changed in future. This allows machines to be re-located. For each node year slots are created, then month slots created and day slots including the special slots ad0 and aFull$$, where $ is an integer between 0 and 2. Full01 will store a Full volume at the beginning of the month and skip d1. Full02 is there for additional redundancy and to cope with the scenario where the current month is the last (this is not default behaviour). The special slot d0 has more than one purpose and is described in more detail in the section describing the data transfer mechanisms. 2.1.4. Archive date range The start year and end year can be chosen according to storage requirements. Retrospective archiving can be achieved by archiving in the past by creating a DIAP disk structure from any chosen date in the past. Retrospective archiving allows the user to transfer archives from old tapes into DIAP. The number of years of archiving required in the future is selectable. 2.1.5. Storage structure tables These diagrams show the disk storage structure. The root of the structure is a n*years then filled with months then days including the special slots. Brasher Expires May 30, 2010 [Page 4] Internet-Draft Distributed Internet Archive Protocol November 2009 Slots for each year, extendable and pre 2009 is retrospective +-------+--------+--------+--------+ | Slots | node A | node B | node C | +-------+--------+--------+--------+ | | ... | ... | ... | | | 2006 | 2006 | 2006 | | | 2007 | 2007 | 2007 | | | 2008 | 2008 | 2008 | | | 2009 | 2009 | 2009 | | | 2010 | 2010 | 2010 | | | 2011 | 2011 | 2011 | | | ... | ... | ... | +-------+--------+--------+--------+ Table 1: Year slots Slots for each month located in each year +-------+--------+--------+--------+ | Slots | node A | node B | node C | +-------+--------+--------+--------+ | | mth1 | mth1 | mth1 | | | mth2 | mth2 | mth2 | | | mth3 | mth3 | mth3 | | | mth4 | mth4 | mth4 | | | mth5 | mth5 | mth5 | | | mth6 | mth6 | mth6 | | | mth7 | mth7 | mth7 | | | mth8 | mth8 | mth8 | | | mth9 | mth9 | mth9 | | | mth10 | mth10 | mth10 | | | mth11 | mth11 | mth11 | | | mth12 | mth12 | mth12 | +-------+--------+--------+--------+ Table 2: Month slots Brasher Expires May 30, 2010 [Page 5] Internet-Draft Distributed Internet Archive Protocol November 2009 Slots in each day located in each month including special slots +--------+--------+--------+--------+ | Slots | node A | node B | node C | +--------+--------+--------+--------+ | (Dirs) | Full01 | Full01 | Full01 | | | Full02 | Full02 | Full02 | | | d0 | d0 | | | | d01 | d01 | d01 | | | d02 | d02 | d02 | | | d03 | d03 | d03 | | | d04 | d04 | d04 | | | d05 | d05 | d05 | | | d06 | d06 | d06 | | | d07 | d07 | d07 | | | d08 | d08 | d08 | | | d09 | d09 | d09 | | | d10 | d10 | d10 | | | d11 | d11 | d11 | | | d12 | d12 | d12 | | | d13 | d13 | d13 | | | d14 | d14 | d14 | | | d15 | d15 | d15 | | | d16 | d16 | d16 | | | d17 | d17 | d17 | | | d18 | d18 | d18 | | | d19 | d19 | d19 | | | d20 | d20 | d20 | | | d21 | d21 | d21 | | | d22 | d22 | d22 | | | d23 | d23 | d23 | | | d24 | d24 | d24 | | | d25 | d25 | d25 | | | d26 | d26 | d26 | | | d27 | d27 | d27 | | | d28 | d28 | d28 | | | d29 | d29 | d29 | | | d30 | d30 | d30 | | | d31 | d31 | d31 | +--------+--------+--------+--------+ Table 3: Day of month slots 2.2. Data transfer mechanisms Brasher Expires May 30, 2010 [Page 6] Internet-Draft Distributed Internet Archive Protocol November 2009 2.2.1. Lowest maximum bandwidth (LMB) [Recalibrate this section to adjust for the most recent developments] LMB = Lowest Maximum Bandwidth between any three nodes NB: actual max transfer will vary so test transfers are recommended for accuracy. LMB assumes all available bandwidth is allocated to running DIAP. Max Full01 = LMB x 6 hrs. This assumes no transfer interruptions and that the maximum bandwidth is constant. Example calculations for a month of archiving using a single full volume and a corresponding daily differentials. Ave. diff = (Sum 29 (or a month) daily differentials) / 29. Average differential is variable depending on your storage growth, this represents a trend and can be an estimate to start with, but by watching the trend of differential growth more accurate calculations can be made. It is assumed your differentials are always smaller in size than the initial full copy. Min DIAP slot size (node a) = (Max Full01 x 2) + (29 x ave. diff) + (1 x ave diff) plus 1 x ave diff to account for d0. Min DIAP slot size (node b/c) = (Max Full01 x 2) + (29 x ave diff). You can include transfer log files in the Min DIAP slot size, for simplicity they have been omitted. Example system +----------------------+------------+------------------+------------+ | Example system | LMB x 6 | Ave diff | Max | | | hrs | | aFull01 | +----------------------+------------+------------------+------------+ | LMB occurs between | 1Mbit/Sec | Estimated 500 | 2.6 GiB | | b->c | | MiB | | +----------------------+------------+------------------+------------+ Table 4: Example system Min DIAP slot size (node a) (2.6 x 2) + (29 x 0.5) + 0.5 = 20.2 GiB Min DIAP slot size (node b-c) (2.6 x 2) + (29 x 0.5) = 19.7 GiB If a copy fails then the system will retry the next day but you loose the day of failure. Logging can be used to trace successful copies. Brasher Expires May 30, 2010 [Page 7] Internet-Draft Distributed Internet Archive Protocol November 2009 2.2.2. Data transfer timing [Maybe re-write this section and incorporate into phases, or simplify and keep both sections] Two entry points, Full01 beginning of month and d0 for the remaining days. Assuming entry points are filled during the day before the cycle begins at night. Scheduled jobs split between 3 nodes, d0 is cleared after copy to d$. The system reduces single point of failure by creating a single full copy on each node at the beginning of the month then at the end of the month to cover the next 30 day diap cycle. The copies between a-a and a-b occur in the first three hours then the copy from b-c happens after three hours. These times are changed as required. Nightly copies to and between nodes are made to new slots, if due to some fail conditions a node is unavailable then the copy is not made, however the next day when communication is restored copies continue to the next nightly directory. This increases robustness over previous layout as the next nightly copy is not dependent on the success of the previous night's copy. Only two copies between nodes are made between days 3-30, day 1 a single full and day 2 full and day 30 does make an internal copy on all nodes. 2.2.3. Phases To be written. Transfer from implementation. 2.2.4. Data flow, table Flow of data (* for all in column) +------------+----------------+----------------+----------------+ | Day - Time | A | B | | | D1-T=0 | Full01-> | Full01 | | | D2-T=0 | | | | | D2-T=0 | d0->d1 *(a->a) | | | | D2-T=0 | d0->d1 *(a->b) | | | | D2-T=3 | | d1->d1 | | | D3-T=0 | d0->d2 | | | | D3-T=0 | d0->d2 | | | | D3-T=3 | | d2->d2 | | | D4-T=0 | d0->d3 | | | | D4-T=0 | d0->d3 | | | | D4-T=3 | | d3->d3 | | | D5-T=0 | d0->d4 | | | | D5-T=0 | d0->d4 | | | | D5-T=3 | | d4->d4 | | Brasher Expires May 30, 2010 [Page 8] Internet-Draft Distributed Internet Archive Protocol November 2009 | D6-T=0 | d0->d5 | | | | D6-T=0 | d0->d5 | | | | D6-T=3 | | d5->d5 | | | D7-T=0 | d0->d6 | | | | D7-T=0 | d0->d6 | | | | D7-T=3 | | d6->d6 | | | D8-T=0 | d0->d7 | | | | D8-T=0 | d0->d7 | | | | D8-T=3 | | d7->d7 | | | D9-T=0 | d0->d8 | | | | D9-T=0 | d0->d8 | | | | D9-T=3 | | d8->d8 | | | D10-T=0 | d0->d9 | | | | D10-T=0 | d0->d9 | | | | D10-T=3 | | d9->d9 | | | D11-T=0 | d0->d10 | | | | D11-T=0 | d0->d10 | | | | D11-T=3 | | d10-d10 | | | D12-T=0 | d0->d11 | | | | D12-T=0 | d0->d11 | | | | D12-T=3 | | d11->d11 | | | D13-T=0 | d0->d12 | | | | D13-T=0 | d0->d12 | | | | D13-T=3 | | d12->d12 | | | D14-T=0 | d0->d13 | | | | D14-T=0 | d0->d13 | | | | D14-T=3 | | d13->d13 | | | D15-T=0 | d0->d14 | | | | D15-T=0 | d0->d14 | | | | D15-T=3 | | d14->d14 | | | D16-T=0 | d0->d15 | | | | D16-T=0 | d0->d15 | | | | D16-T=3 | | d15->d15 | | | D17-T=0 | d0->d16 | | | | D17-T=0 | d0->d16 | | | | D17-T=3 | | d16->d16 | | | D18-T=0 | d0->d17 | | | | D18-T=0 | d0->d17 | | | | D18-T=3 | | d17->d17 | | | D19-T=0 | d0->d18 | | | | D19-T=0 | d0->d18 | | | | D19-T=3 | | d18->d18 | | | D20-T=0 | d0->d19 | | | | D20-T=0 | d0->d19 | | | | D20-T=3 | | d19->d19 | | | D21-T=0 | d0->d20 | | | | D21-T=0 | d0->d20 | | | | D21-T=3 | | d20->d20 | | Brasher Expires May 30, 2010 [Page 9] Internet-Draft Distributed Internet Archive Protocol November 2009 | D22-T=0 | d0->d21 | | | | D22-T=0 | d0->d21 | | | | D22-T=3 | | d21->d21 | | | D23-T=0 | d0->d22 | | | | D23-T=0 | d0->d22 | | | | D23-T=3 | | d22->d22 | | | D24-T=0 | d0->d23 | | | | D24-T=0 | d0->d23 | | | | D24-T=3 | | d23->d23 | | | D25-T=0 | d0->d24 | | | | D25-T=0 | d0->d24 | | | | D25-T=3 | | d24->d24 | | | D26-T=0 | d0->d25 | | | | D26-T=0 | d0->d25 | | | | D26-T=3 | | d25->d25 | | | D27-T=0 | d0->d26 | | | | D27-T=0 | d0->d26 | | | | D27-T=3 | | d26->d26 | | | D28-T=0 | d0->d27 | | | | D28-T=0 | d0->d27 | | | | D28-T=3 | | d27->d27 | | | D29-T=0 | d0->d28 | | | | D29-T=0 | d0->d28 | | | | D29-T=3 | | d28->d28 | | | D30-T=0 | Full01->Full02 | | | | D30-T=0 | | Full01->Full02 | Node C | | D30-T=0 | | | Full01->Full02 | | D30-T=0 | d0->d29 | | | | D30-T=0 | d0->d29 | | | | D30-T=3 | | d29->d29 | | +------------+----------------+----------------+----------------+ Table 5: Data flow Start 00:00 - End 00:06 - T = 0(00:00) - T=3(03:00) All copies from a-a are not used in bandwidth calculations. 2.2.5. Hyper virtual autochanger (HVA) HVA is the term used to collectively describe the three algorithms that work together but operate independently on each node to ensure the data transfers occur as describe in the previous two sections. This term is derived from the term virtual auto changer. A virtual auto changer still requires hardware tape drives, 'Hyper' takes this one stage further by emulating the virtual auto changer in software. Brasher Expires May 30, 2010 [Page 10] Internet-Draft Distributed Internet Archive Protocol November 2009 2.2.6. Copy types Transfer from implementation. 2.2.7. Leap years Transfer from implementation. 2.2.8. Fill mechanism The filling mechanism works as follows: The start time is an integer between 0 and 11. The fill is triggered by a scheduling application like cron. Then a check is made to see if the previous days copy was successfully. If not then an alert is made and logged for later use. If yes then a search, using a pre-defined string, is made in a directory containing the backup volumes. If full volumes have been selected for collection then a check for the day of month is made. [Currently in implementation this is day 2 - this will be settable to any day]. The name of the full volume is a pre-defined string. If a full needs to be transferred then the most recent full volume is located. A check is made to see if the full has been collected before is made. If no then the full is copied to the appropriate slot and a date and sha1sum is created and located in the slot with the volume. [see section; Security considerations, checksum for more detail]. The activity is logged and the algorithm ends. If yes then the activity is logged and the algorithm stops. If a full volume is not required to be transferred then the most recent differential volume is located using a pre-defined string. The contents of d0 are cleared. A check is made to verify the differential has not been collected before. If yes then the activity is logged and the algorithm ends. If not the the latest differential is copied to the appropriate slot and a date and sha1sum created and located in the slot with the volume. Activity is logged and the algorithm ends. 3. Security considerations In implementation it is recommended these security precautions are followed. 3.1. Passwords Do not store any passwords on file. Passwords should be stored in memory temporarily. When a password is requested the entry view is hidden. New account passwords are quality checked and a warning given if not secure. Brasher Expires May 30, 2010 [Page 11] Internet-Draft Distributed Internet Archive Protocol November 2009 3.2. User space Implementation to operate in user space to reduce risk of system compromise and the effect of system compromises. 3.3. Application layer Handle network communications with OpenSSH. [1] Generate unique RSA or better certificates. Pass-wordless certificates should be re- generated often. Rsync uses OpenSSH to transfer data. Use different port to the standard SSH port 22 and individually set these for each node. RFC4251 [RFC4251] 3.4. Checksum Use sha1 checksum RFC3174 [RFC3174] and date stamp volumes as they enter DIAP. This information can be used to validate the integrity of stored archives in the future. 3.5. Virtual private network Use a virtual private network between nodes for an additional layer of security. 3.6. Encrypted partitions, logical volumes and volumes Use encrypted partitions or logical volumes to enhance physical security. Use encrypted archive volumes. The encryption may be applied by backup software initial responsible for generating the volumes. 4. Community project and UK trademarks A community software implementation resides at DIASER (R) [2] A UK trademark exists for DIAP and DIASER to protect the acronyms for Open Source community development. 5. Conclusion To be written. 6. Acknowledgements Thanks are due to my wife Marisa and Myles McClelland and a number of individuals from various groups. Also Stephen Pelc of MPE Forth [3] Brasher Expires May 30, 2010 [Page 12] Internet-Draft Distributed Internet Archive Protocol November 2009 for SME deployment context and advice and IPR consultancy. JISC [4] for providing technical development funding through OMII-UK [5] and ECS [6] (Southampton University) in collaboration with Interlinux Ltd. [7] 7. Change log 26 Nov 09 - Error corrections and additions. 25 Nov 09 - Complete document overhaul incorporating six months technical development. 09 Nov 09 - Change log correction. 09 Nov 09 - Fill algorithm described. 27 Jul 09 - Spell check. 27 Jul 09 - Remove section to avoid IP infringement. 15 Apr 09 - Extended data retention. 03 Dec 08 - Corrected Architecture. 02 Dec 08 - Refined Architecture. 06 July 08 - Architecture - arithmetic. Acknowledgements. 16 May 08 - Address. 8. Informative references [RFC4251] Ylonen, T., "The Secure Shell (SSH) Protocol Architecture", RFC 4251, January 2006. [RFC3174] Eastlake, D., "US Secure Hash Algorithm 1 (SHA1)", RFC 3174, September 2001. [DIAP] Brasher, D., "Distributed Internet Archive Protocol (DIAP)", Nov 2009, . [DIASER] Brasher, D., "Distributed Internet Archiving for Educational Repositories (DIASER)", April 2009, . [DIASER manual] Brasher Expires May 30, 2010 [Page 13] Internet-Draft Distributed Internet Archive Protocol November 2009 Brasher, D., "DIASER manual", November 2009, . [1] [2] [3] [4] [5] [6] [7] Author's Address Damian Brasher Interlinux LTD PO Box 1623 Southampton, Hampshire SO15 9AE United Kingdom Email: dbrasher@interlinux.co.uk Brasher Expires May 30, 2010 [Page 14]