Clement Huge
04.07.2018

Phase 2: AE and Search Patterns

This time, I implemented the same code but with encryption and hashing, encryption with a symetric key for the email addresses and hashing for the dictionary.

This would be similar as implementing random encryption for the email address field and deterministic encryption for the dictionary under Always Encrypted (AE) setup.

Before I do the next step and implemented this under always encrypted on another article, I wanted to implement such encryptions outside AE. Always encrypted is a SQL server feature. This way, the encryption/hashing mecanism I use here can be implemented on any database server with encryption at the application layer or database layer.

For testing purposes, I use the database to generate the encryption keys and certificates and cypher/decypher values but I recommend to do it on the application level, so that you minimize the circuit where the values are potentially unencrypted.

  1. Setup the encryption keys:

As my main goal is not to implement the best secured keys under the best practices but just to test the solution, I looked at a simple way to implement it using the web page here: How to setup symetric encryption on SQL server.

Here is the main code to set this up using this link:

-- Create database Key
USE AETest;
GO
CREATE MASTER KEY 
ENCRYPTION BY PASSWORD = 'Password123';
GO
CREATE CERTIFICATE Certificate1
WITH SUBJECT = 'Protect Data';
GO
-- Create symmetric Key
CREATE SYMMETRIC KEY SymmetricKey1
WITH ALGORITHM = AES_128
ENCRYPTION BY CERTIFICATE Certificate1;
GO

Once this is done we can easily use the encryption and decryption functions. I also checked whether I can do search on it. It is not possible, so it acts exactly as a random encryption, which is what we wanted to test.

  1. Setup the table with the encrypted data and the dictionary of Email Address

CREATE TABLE [Encrypt].[Person]
(
    [Id] [bigint] NOT NULL,
    [LastName] [varbinary](max) NOT NULL,
    [EmailAddress] [varbinary](max) NULL,
    CONSTRAINT [PK_EPerson] PRIMARY KEY CLUSTERED
    (
        [Id] ASC
    )
    WITH 
    (
        PAD_INDEX = OFF, 
        STATISTICS_NORECOMPUTE = OFF, 
        IGNORE_DUP_KEY = OFF, 
        ALLOW_ROW_LOCKS = ON, 
        ALLOW_PAGE_LOCKS = ON
    ) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
GO
CREATE TABLE [Encrypt].[PersonEmailAddressSearchText]
(
    [Id] [int] NOT NULL,
      NOT NULL,
    [Rank] [smallint] NOT NULL
) ON [PRIMARY]
GO
CREATE CLUSTERED INDEX [IX_EncryptPersonEmailAddressearchText_IdRank] 
ON [Encrypt].[PersonEmailAddressSearchText]
(
    [Id] ASC,
    [Rank] ASC
)
WITH 
(
    PAD_INDEX = OFF, 
    STATISTICS_NORECOMPUTE = OFF, 
    SORT_IN_TEMPDB = OFF, 
    DROP_EXISTING = OFF, 
    ONLINE = OFF, 
    ALLOW_ROW_LOCKS = ON, 
    ALLOW_PAGE_LOCKS = ON
) ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX [IX_EncryptPersonEmailAddressSearchText_SearchPattern] 
ON [Encrypt].[PersonEmailAddressSearchText]
(
    [SearchPattern] ASC
)
WITH 
(
    PAD_INDEX = OFF, 
    STATISTICS_NORECOMPUTE = OFF, 
    SORT_IN_TEMPDB = OFF, 
    DROP_EXISTING = OFF, 
    ONLINE = OFF, 
    ALLOW_ROW_LOCKS = ON, 
    ALLOW_PAGE_LOCKS = ON
) ON [PRIMARY]
GO

  1. Fill up the data

The data would be filled up by the application but for testing puposes, we will fill up the data from the unencrypted dataset (please see previous article regarding the dataset).

OPEN SYMMETRIC KEY SymmetricKey1
DECRYPTION BY CERTIFICATE Certificate1;
GO
INSERT INTO Encrypt.Person
SELECT 
    Id, 
    EncryptByKey(Key_GUID('SymmetricKey1'), LastName) AS LastName, 
    EncryptByKey(Key_GUID('SymmetricKey1'), EmailAddress) AS EmailAddress
FROM Unencrypted.Person
GO
-- Closes the symmetric key
CLOSE SYMMETRIC KEY SymmetricKey1;
GO
TRUNCATE TABLE Encrypt.PersonLastNameSearchText

INSERT INTO Encrypt.PersonLastNameSearchText
SELECT 
    Id,
    SearchPattern,
    Rank 
FROM Unencrypted.Person
CROSS APPLY [String].[Fn_GetHashedLetterPosition](LastName)

I used the following function to hash the values for the dictionary (of course, we would use a better hashing mecanism with a specific Salt. I just wanted here to simplify the test.

  1. Hash function

Here is the hash function used (for purpose of readiness in the table I converted the varbinary value of the hashbyte result to hexadecimal values before storing onto the database but this is not necessary):

CREATE FUNCTION [dbo].[fn_varbintohexstr] (@pbinin varbinary(max))
RETURNS nvarchar(max)
AS
BEGIN
    RETURN sys.fn_varbintohexsubstring(1, @pbinin, 1, 0)
END
GO
CREATE FUNCTION [String].[Fn_GetHashedLetterPosition] (@String NVARCHAR(4000))
RETURNS TABLE
AS
RETURN
    SELECT 
        dbo.fn_varbintohexstr(HASHBYTES('SHA1', SUBSTRING(@String, N.Num, 1))) AS SearchPattern,
        RANK() OVER (ORDER BY N.Num) AS [Rank]
    FROM [Performance].[Number] AS N
    WHERE (N.Num >= 0 AND N.Num <= LEN(@String))

  1. Search the encrypted data

Here is the search stored procedure. It does work similarly than the one that worked on the unencrypted data, including the underscore wild card usage.

/* Prerequisites for performance:
CREATE INDEX IX_EncryptedPersonEmailAddressSearchText_SearchPattern 
ON Encrypted.PersonEmailAddressSearchText(SearchPattern)
*/

ALTER PROCEDURE [Encrypt].[GetPersonFromEmailAddress]
    @SearchPattern NVARCHAR(128) = 'ph__@adven',
    @Accuracy SMALLINT = 0,   --> if 0 this means no difference, 1 means 1 letter can be off, etc...
    @IsRegExpr BIT = 1        --> this means that underscore can be used (more possible)
AS
BEGIN TRY

    -- 1. get the length
    DECLARE @Len SMALLINT = LEN(@SearchPattern)

    -- 2. get the gram
    DECLARE @Gram SMALLINT = 1
    DECLARE @ErrMsg VARCHAR(256) = 
        'Search pattern must be above ' + CONVERT(VARCHAR(16), @Gram) + ' characters'

    IF @Len < @Gram
        RAISERROR(@ErrMsg, 16, 1)

    -- 3. Create the search values
    DECLARE @tSearchPattern TABLE 
    (
        SearchPattern NVARCHAR(128),
        [Rank] SMALLINT,
        Underscore NVARCHAR(128)
    )
    DECLARE @TId TABLE 
    (
        Id INT,
        Accuracy SMALLINT
    )
    INSERT INTO @tSearchPattern (SearchPattern, [Rank], Underscore)
    SELECT 
        SearchPattern,
        [Rank],
        UnderScore
    FROM [String].[Fn_GetHashedLetterPosition](@SearchPattern) A
    CROSS JOIN 
    (
        SELECT CONVERT(NVARCHAR(4000), SearchPattern) AS UnderScore
        FROM [String].[Fn_GetHashedLetterPosition]('_')
    ) B
    -- If regex with underscore
    IF @IsRegExpr = 1 AND CHARINDEX('_', @SearchPattern) > 0
    BEGIN
        WITH CTE AS
        (
            SELECT A.Id, A.[Rank]
            FROM Encrypt.[PersonEmailAddressSearchText] A
            JOIN @tSearchPattern B 
              ON A.SearchPattern = B.SearchPattern
            WHERE B.[Rank] = 1
        )
        INSERT INTO @TId
        SELECT 
            C.Id,
            COUNT(*) AS Accuracy
        FROM CTE C
        JOIN Encrypt.PersonEmailAddressSearchText D 
          ON C.Id = D.Id
        JOIN @tSearchPattern E 
          ON CASE WHEN E.SearchPattern = E.Underscore 
                  THEN D.SearchPattern 
                  ELSE E.SearchPattern 
             END = D.SearchPattern
         AND E.[Rank] = D.[Rank] + 1 - C.[Rank]
        GROUP BY C.Id, C.[Rank]
        HAVING COUNT(*) >= @Len - @Accuracy
    END
    ELSE
    BEGIN
        WITH CTE AS
        (
            SELECT A.Id, A.[Rank]
            FROM [Encrypt].[PersonEmailAddressSearchText] A
            JOIN @tSearchPattern B 
              ON A.SearchPattern = B.SearchPattern
            WHERE B.[Rank] = 1
        )
        INSERT INTO @TId
        SELECT 
            C.Id,
            COUNT(*) AS Accuracy
        FROM CTE C
        JOIN Encrypt.PersonEmailAddressSearchText D 
          ON C.Id = D.Id
        JOIN @tSearchPattern E 
          ON D.SearchPattern = E.SearchPattern
         AND E.[Rank] = D.[Rank] + 1 - C.[Rank]
        GROUP BY C.Id
        HAVING COUNT(*) >= @Len - @Accuracy
    END

    -- Decrypt results
    OPEN SYMMETRIC KEY SymmetricKey1
    DECRYPTION BY CERTIFICATE Certificate1;

    SELECT 
        B.Id,
        @Len - Accuracy AS LetterOff,
        CONVERT(NVARCHAR(4000), DecryptByKey(A.EmailAddress)) AS EmailAddress
    FROM Encrypt.[Person] A
    JOIN @TId B 
      ON A.Id = B.Id
    ORDER BY LetterOff

    CLOSE SYMMETRIC KEY SymmetricKey1;

END TRY
BEGIN CATCH
    SET @ErrMsg = ERROR_MESSAGE()
    RAISERROR(@ErrMsg, 16, 1)
END CATCH
GO

The main thing to consider is to hash the underscore value as you will compare only hashed values here. In this stored procedure, the result is purposely decrypted but obviously we can keep the result encrypted and let the application decrypt it (better security).