Introduction to smcryptoR

Yu Meng

2024-03-18

The goal of smcryptoR is to use China’s Standards of Encryption Algorithms(SM) in R. smcryptoR uses rust FFI(Foreign Function Interface) bindings for rust crate.

SM3: message digest

SM2: encrypt/decrypt, sign/verify, key exchange

SM4: encrypt/decrypt

SM3

SM3 is similar to other well-known hash functions like SHA-256 in terms of its security properties and structure, which provides a fixed size output of 256 bits.

The sm3_hash function accepts a raw vector parameter, which is equivalent to a byte array represented in hexadecimal format. In R, the charToRaw() or serialize() functions can be used to convert strings or objects into the raw vector type.

msg <- charToRaw('abc')
sm3_hash(msg)
#> [1] "66c7f0f462eeedd9d1f2d46bdc10e4e24167c4875cf2f7a2297da02b8f4ba8e0"

You can also use sm3_hash_string() to hash a character string directly.

sm3_hash_string('abc')
#> [1] "66c7f0f462eeedd9d1f2d46bdc10e4e24167c4875cf2f7a2297da02b8f4ba8e0"

sm3_hash_file() is provided to hash a local file on your machine. For example use sm3_hash_file('/etc/hosts').

SM2

SM2 is based on the elliptic curve cryptography (ECC), which provides stronger security with shorter key lengths compared to traditional cryptography algorithms.

Keypair

In asymmetric encryption, public keys and private keys appear in pairs. The public key is used for encryption and verification, while the private key is used for decryption and signing. The public key can be derived from the private key, but not the other way around.

## generate a keypair
keypair <- sm2_gen_keypair()
sk <- keypair$private_key
pk <- keypair$public_key
sk
#> [1] "0dbf3ea63efd867a41822a1cd2ee485ebe3993432fbcf2e96bfbadd3be2b6ac3"
pk
#> [1] "eab671dfeed05b8c4c89f7d5c4872f3985501dd9c7764063c13303d97ef899d611387457af41cb9ada26bb99559452fe19a88d74e16107a600f76e4b10d087c5"

You can also export the public key from a private key.

pk <- sm2_pk_from_sk(sk)
pk
#> [1] "eab671dfeed05b8c4c89f7d5c4872f3985501dd9c7764063c13303d97ef899d611387457af41cb9ada26bb99559452fe19a88d74e16107a600f76e4b10d087c5"

Sign/Verify

This is to ensure the integrity of the data and guarantee its authenticity. Typically, the data owner uses the SM3 message digest algorithm to calculate the hash value and signs it with the private key, generating signed data. Then the owner distributes the original data and the signed data of the original data to the receiver. The receiver uses the public key and the received signed data to perform the verification operation. If the verification is successful, it is considered that the received original data has not been tampered with.

id <- 'someone@company.com' |> charToRaw()
data <- 'abc' |> charToRaw()
sign <- sm2_sign(id, data, sk)
## return 1 or 0
sm2_verify(id, data, sign, pk)
#> [1] 1

Encrypt/Decrypt

SM2 is an asymmetric encryption algorithm that can also be used to directly encrypt data. Typically, A encrypts a file or data using the public key, passes the ciphertext to B, and B decrypts it using the corresponding private key. SM2 encryption and decryption are suitable for shorter texts only. For larger files, the process can be very slow.

## encrypt using public key
enc <- sm2_encrypt(data, pk)
## cipher text
enc
#>  [1] b2 dd a3 01 79 64 de 69 5c a2 ea 7e 61 61 5f 2c fe dc 4d 1c a6 af ec 40 51
#> [26] e3 84 34 83 3a 94 7c d4 e2 bc e2 ac 57 90 a0 8a a9 95 c7 d3 d2 23 7f a0 b1
#> [51] 72 f5 dd 02 e8 70 3e 81 89 64 a2 b4 bf ac 0c 5e c6 7f 99 e9 13 67 af 1c ea
#> [76] 37 35 5b a9 8d bc 01 9b f9 77 07 ec 51 0e 73 de 3b 77 1f c4 0f 3f 43 ca
## decrypt using private key
dec <- sm2_decrypt(enc, sk)
## plain text
dec
#> [1] 61 62 63
## convert to character string
rawToChar(dec)
#> [1] "abc"

For ease of use, we have provided functions to encrypt data into hex or base64 format and decrypt them from these formats.

enc <- sm2_encrypt_base64(data, pk)
## cipher text as base64
enc
#> [1] "IKKpuCTG0TgI0OwLek/nY/i7/iy9737Xe57GbmiTOxyBB4Ua+N/cZ5oVLrHknHM1EXL488JUiaDmU2d6rYu6lEGWvpTD+qyNS5t3a98u2VI8n+ZjoUx33PXVM2W6Vm7Lzmf2"
sm2_decrypt_base64(enc, sk) |> rawToChar()
#> [1] "abc"

Or you can use hex as output instead.

enc <- sm2_encrypt_hex(data, pk)
## cipher text as hex
enc
#> [1] "7dad0f006314f93d1e30126d1e436b5a104f1ffd9555cfa03e245b399f8933df8238109021ffc3c75df633c3d8be2efd605f39d9163823ff788b5dbf2402f386ffc486cb32aedb05bf72e679d76d2b2f50952e5bd2b6caf79f946516aabe2dc45bdcc1"
sm2_decrypt_hex(enc, sk) |> rawToChar()
#> [1] "abc"

Key Exchange

If A and B want to generate a recognized key for encryption or authentication, this algorithm can ensure that the key itself will not be transmitted through untrusted channels, and the private keys of A and B will not be disclosed. Even if an attacker intercepts the data exchanged by A and B, they cannot calculate the key agreed upon by A and B.

## Step 1
klen <- 16
id_a <- "a@company.com" |> charToRaw()
id_b <- "b@company.com" |> charToRaw()
private_key_a <- sm2_gen_keypair()$private_key
private_key_b <- sm2_gen_keypair()$private_key
step_1_a <- sm2_keyexchange_1ab(klen, id_a, private_key_a)
step_1_b <- sm2_keyexchange_1ab(klen, id_b, private_key_b)

## Step 2
step_2_a <- sm2_keyexchange_2a(id_a, private_key_a, step_1_a$private_key_r, step_1_b$data)
step_2_b <- sm2_keyexchange_2b(id_b, private_key_b, step_1_b$private_key_r, step_1_a$data)
step_2_a$k
#> [1] "00c365484451c918e6e30a43ffa33478"
step_2_b$k
#> [1] "00c365484451c918e6e30a43ffa33478"

The output key k should be length of 16 and step_2_a$k and step_2_b$k should be equal.

SM4

The SM4 algorithm is a block symmetric encryption algorithm with a block size and key length of 128 bits. SM4 supports both the ECB (Electronic Codebook) mode and the CBC (Cipher Block Chaining) mode. The ECB mode is a simple block cipher encryption mode that encrypts each data block independently without depending on other blocks. The CBC mode, on the other hand, is a chained block cipher encryption mode where the encryption of each block depends on the previous ciphertext block. Therefore, it requires an initialization vector (IV) of the same 128-bit length. The CBC mode provides higher security than the ECB mode.

Encrypt/Decrypt - ECB mode

In ECB mode, each block of plaintext is encrypted independently, without any chaining with previous blocks. This means that the same plaintext block will always produce the same ciphertext block, given the same key.

## ecb mode
key <- '1234567812345678' |> charToRaw()
enc <- sm4_encrypt_ecb(data, key)
## cipher text
enc
#>  [1] 06 6f eb d7 55 4a 8f ed 55 5b a2 6c f8 2a ff 3b
## plain text
sm4_decrypt_ecb(enc, key) |> rawToChar()
#> [1] "abc"

Encrypt/Decrypt - CBC mode

In CBC mode, each block of plaintext is combined (usually through XOR operation) with the previous ciphertext block before being encrypted. This chaining of blocks ensures that even if there are repeated blocks in the plaintext, the resulting ciphertext blocks will be different due to the influence of the previous ciphertext blocks.

iv <- '0000000000000000' |> charToRaw()
enc <- sm4_encrypt_cbc(data, key, iv)
## cipher text
enc
#>  [1] 4d 2b cf dc f0 c1 13 34 4b 54 0e 76 fa a2 2f 08
sm4_decrypt_cbc(enc, key, iv) |> rawToChar()
#> [1] "abc"