Hello again,

This time, I implemented the same code but with encryption and hashing, encryption with a symetric key for the email addresses and hashing for the dictionary. This would be similar as implementing random encryption for the email address field and deterministic encryption for the dictionary under Always Encrypted (AE) setup.

Before I do the next step and implemented this under always encrypted on another article, I wanted to implement such encryptions outside AE. Always encrypted is a SQL server feature. This way, the encryption/hashing mecanism I use here can be implemented on any database server with encryption at the application layer or database layer.

For testing purposes, I use the database to generate the encryption keys and certificates and cypher/decypher values but I recommend to do it on the application level, so that you minimize the circuit where the values are potentially unencrypted.

  1. Setup the encryption keys:

As my main goal is not to implement the best secured keys under the best practices but just to test the solution, I looked at a simple way to implement it using the web page here: How to setup symetric encryption on SQL server.

Here is the main code to set this up using this link:

-- Create database Key

USE AETest;

GO

CREATE MASTER KEY ENCRYPTION BY PASSWORD = 'Password123';

GO

CREATE CERTIFICATE Certificate1

WITH SUBJECT = 'Protect Data';

GO

-- Create symmetric Key

CREATE SYMMETRIC KEY SymmetricKey1

WITH ALGORITHM = AES_128

ENCRYPTION BY CERTIFICATE Certificate1;

GO

Once this is done we can easily use the encryption and decryption functions. I also checked whether I can do search on it. It is not possible, so it acts exactly as a random encryption, which is what we wanted to test.

2. Setup the table with the encrypted data and the dictionary of EmailAddress

CREATE TABLE [Encrypt].[Person](

[Id] [bigint] NOT NULL,

[LastName] [varbinary](max) NOT NULL,

[EmailAddress] [varbinary](max) NULL,

CONSTRAINT [PK_EPerson] PRIMARY KEY CLUSTERED

(

[Id] ASC

)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]

GO

And:

CREATE TABLE [Encrypt].[PersonEmailAddressSearchText](

[Id] [int] NOT NULL,

[SearchPattern] [nvarchar](4000) NOT NULL,

[Rank] [smallint] NOT NULL

) ON [PRIMARY]

GO

CREATE CLUSTERED INDEX [IX_EncryptPersonEmailAddressearchText_IdRank] ON [Encrypt].[PersonEmailAddressSearchText]

(

[Id] ASC,

[Rank] ASC

)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

GO

CREATE NONCLUSTERED INDEX [IX_EncryptPersonEmailAddressSearchText_SearchPattern] ON [Encrypt].[PersonEmailAddressSearchText]

(

[SearchPattern] ASC

)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

GO

3. Fill up the data

The data would be filled up by the application but for testing puposes, we will fill up the data from the unencrypted dataset (please see previous article regarding the dataset).

OPEN SYMMETRIC KEY SymmetricKey1

DECRYPTION BY CERTIFICATE Certificate1;

GO

INSERT INTO Encrypt.Person

SELECT Id, EncryptByKey (Key_GUID('SymmetricKey1'),LastName) LastName, EncryptByKey (Key_GUID('SymmetricKey1'),EmailAddress) EmailAddress

FROM Unencrypted.person

GO

-- Closes the symmetric key

CLOSE SYMMETRIC KEY SymmetricKey1;

GO

TRUNCATE TABLE Encrypt.PersonLastNameSearchText

INSERT INTO Encrypt.PersonLastNameSearchText

select Id,SearchPattern,Rank from Unencrypted.Person

cross apply [String].[Fn_GetHashedLetterPosition](LastName)

I used the following function to hash the values for the dictionary (of course, we would use a better hashing mecanism with a specific Salt. I just wanted here to simplify the test.

4. Hash function

Here is the hash function used (for purpose of readiness in the table I converted the varbinary value of the hashbyte result to hexadecimal values before storing onto the database but this is not necessary):

CREATE function [dbo].[fn_varbintohexstr]

(

@pbinin varbinary(max)

)

returns nvarchar(max)

as

begin

return sys.fn_varbintohexsubstring(1,@pbinin,1,0)

end

GO

CREATE FUNCTION [String].[Fn_GetHashedLetterPosition]

(@String NVARCHAR (4000))

RETURNS TABLE

AS

RETURN

SELECT

dbo.Fn_varbintohexstr(HASHBYTES('SHA1', SUBSTRING(@String, N.Num, 1))) AS SearchPattern,

RANK() OVER (ORDER BY N.Num) AS [Rank]

FROM  [Performance].[Number] AS N

WHERE (N.Num >= 0

AND N.Num <= LEN(@String))

5. Search the encrypted data

Here is the search stored procedure. It does work similarly than the one that worked on the unencrypted data, including the underscore wild card usage.

/*

Prerequisites for performance:

CREATE INDEX IX_EncryptedPersonEmailAddressSearchText_SearchPattern ON Encrypted.PersonEmailAddressSearchText(SearchPattern)

*/

ALTER PROCEDURE [Encrypt].[GetPersonFromEmailAddress]

@SearchPattern nvarchar(128) = 'ph__@adven',

@Accuracy smallint = 0, --> if 0 this means no difference, 1 means 1 letter can be off, etc...

@IsRegExpr bit = 1 --> this means that underscore can be used (more possible)

AS

BEGIN TRY

-- 1. get the lenth

DECLARE @Len smallint = LEN(@SearchPattern)

-- 2. get the gram

DECLARE @Gram smallint = 1

DECLARE @ErrMsg varchar(256) = 'Search pattern must be above '+CONVERT(VARCHAR(16),@Gram)+' characters'

IF @Len < @Gram

RAISERROR(@ErrMsg,16,1)

-- 3. Create the search values

DECLARE @tSearchPattern TABLE (SearchPattern NVARCHAR(128),[Rank] smallint, Underscore NVARCHAR(128))

DECLARE @TId table (Id Int, Accuracy smallint)

INSERT INTO @tSearchPattern (SearchPattern, [Rank], Underscore)

SELECT SearchPattern,[Rank],UnderScore FROM [String].[Fn_GetHashedLetterPosition](@SearchPattern) A

CROSS JOIN (SELECT convert(nvarchar(4000),SearchPattern) UnderScore FROM [String].[Fn_GetHashedLetterPosition]('_') ) B

-- If the gram is less than the len of the text to search:

IF @IsRegExpr = 1 AND CHARINDEX('_',@SearchPattern)>0

BEGIN

WITH CTE AS (

SELECT A.Id, A.[Rank]

FROM Encrypt.[PersonEmailAddressSearchText] A

JOIN @tSearchPattern B ON A.SearchPattern = B.SearchPattern

WHERE B.[Rank] = 1

)

INSERT INTO @TId

SELECT C.Id, COUNT(*) Accuracy

FROM CTE C

JOIN Encrypt.PersonEmailAddressSearchText D ON C.Id = D.Id

JOIN @tSearchPattern E ON

CASE WHEN E.SearchPattern = E.Underscore THEN D.SearchPattern ELSE E.SearchPattern END = D.SearchPattern

AND E.[Rank] = D.[Rank] + 1 - C.[Rank]

GROUP BY C.Id, C.[Rank]

HAVING COUNT(*) >= @Len-@Accuracy

END

ELSE

BEGIN

WITH CTE AS (

SELECT A.Id, A.[Rank]

FROM [Encrypt].[PersonEmailAddressSearchText] A

JOIN @tSearchPattern B ON A.SearchPattern = B.SearchPattern

WHERE B.[Rank] = 1

)

INSERT INTO @TId

SELECT C.Id, COUNT(*) Accuracy

FROM CTE C

JOIN Encrypt.PersonEmailAddressSearchText D ON C.Id = D.Id

JOIN @tSearchPattern E

ON

D.SearchPattern = E.SearchPattern

AND E.[Rank] = D.[Rank] + 1 - C.[Rank]

GROUP BY C.Id

HAVING COUNT(*) >= @Len-@Accuracy

END

OPEN SYMMETRIC KEY SymmetricKey1

DECRYPTION BY CERTIFICATE Certificate1;

SELECT B.Id, @Len - Accuracy LetterOff, convert(nvarchar(4000),DecryptByKey(A.EmailAddress)) EmailAddress FROM Encrypt.[Person] A

JOIN @TId B ON A.Id = B.Id

ORDER BY LetterOff

CLOSE SYMMETRIC KEY SymmetricKey1;

END TRY

BEGIN CATCH

SET @ErrMsg = ERROR_MESSAGE()

RAISERROR(@ErrMsg,16,1)

END CATCH

go

The main thing to consider is to hash the underscore value as you will compare only hashed values here. In this stored procedure, the result is purposely decrypted but obviously we can keep the result encrypted and let the application decrypt it (better security).

Here are a couple of tests that works totally similarly to the previous article:

or with the usage of underscore wild card:

Please do not hesitate to share your point of view.

Clement