This would be similar as implementing random encryption for the email address field and deterministic encryption for the dictionary under Always Encrypted (AE) setup.
Before I do the next step and implemented this under always encrypted on another article, I wanted to implement such encryptions outside AE. Always encrypted is a SQL server feature. This way, the encryption/hashing mecanism I use here can be implemented on any database server with encryption at the application layer or database layer.
For testing purposes, I use the database to generate the encryption keys and certificates and cypher/decypher values but I recommend to do it on the application level, so that you minimize the circuit where the values are potentially unencrypted.
- Setup the encryption keys:
As my main goal is not to implement the best secured keys under the best practices but just to test the solution, I looked at a simple way to implement it using the web page here: How to setup symetric encryption on SQL server.
Here is the main code to set this up using this link:
-- Create database Key
USE AETest;
GO
CREATE MASTER KEY
ENCRYPTION BY PASSWORD = 'Password123';
GO
CREATE CERTIFICATE Certificate1
WITH SUBJECT = 'Protect Data';
GO
-- Create symmetric Key
CREATE SYMMETRIC KEY SymmetricKey1
WITH ALGORITHM = AES_128
ENCRYPTION BY CERTIFICATE Certificate1;
GO
Once this is done we can easily use the encryption and decryption functions. I also checked whether I can do search on it. It is not possible, so it acts exactly as a random encryption, which is what we wanted to test.
- Setup the table with the encrypted data and the dictionary of Email Address
CREATE TABLE [Encrypt].[Person]
(
[Id] [bigint] NOT NULL,
[LastName] [varbinary](max) NOT NULL,
[EmailAddress] [varbinary](max) NULL,
CONSTRAINT [PK_EPerson] PRIMARY KEY CLUSTERED
(
[Id] ASC
)
WITH
(
PAD_INDEX = OFF,
STATISTICS_NORECOMPUTE = OFF,
IGNORE_DUP_KEY = OFF,
ALLOW_ROW_LOCKS = ON,
ALLOW_PAGE_LOCKS = ON
) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
GO
CREATE TABLE [Encrypt].[PersonEmailAddressSearchText]
(
[Id] [int] NOT NULL,
NOT NULL,
[Rank] [smallint] NOT NULL
) ON [PRIMARY]
GO
CREATE CLUSTERED INDEX [IX_EncryptPersonEmailAddressearchText_IdRank]
ON [Encrypt].[PersonEmailAddressSearchText]
(
[Id] ASC,
[Rank] ASC
)
WITH
(
PAD_INDEX = OFF,
STATISTICS_NORECOMPUTE = OFF,
SORT_IN_TEMPDB = OFF,
DROP_EXISTING = OFF,
ONLINE = OFF,
ALLOW_ROW_LOCKS = ON,
ALLOW_PAGE_LOCKS = ON
) ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX [IX_EncryptPersonEmailAddressSearchText_SearchPattern]
ON [Encrypt].[PersonEmailAddressSearchText]
(
[SearchPattern] ASC
)
WITH
(
PAD_INDEX = OFF,
STATISTICS_NORECOMPUTE = OFF,
SORT_IN_TEMPDB = OFF,
DROP_EXISTING = OFF,
ONLINE = OFF,
ALLOW_ROW_LOCKS = ON,
ALLOW_PAGE_LOCKS = ON
) ON [PRIMARY]
GO
- Fill up the data
The data would be filled up by the application but for testing puposes, we will fill up the data from the unencrypted dataset (please see previous article regarding the dataset).
OPEN SYMMETRIC KEY SymmetricKey1
DECRYPTION BY CERTIFICATE Certificate1;
GO
INSERT INTO Encrypt.Person
SELECT
Id,
EncryptByKey(Key_GUID('SymmetricKey1'), LastName) AS LastName,
EncryptByKey(Key_GUID('SymmetricKey1'), EmailAddress) AS EmailAddress
FROM Unencrypted.Person
GO
-- Closes the symmetric key
CLOSE SYMMETRIC KEY SymmetricKey1;
GO
TRUNCATE TABLE Encrypt.PersonLastNameSearchText
INSERT INTO Encrypt.PersonLastNameSearchText
SELECT
Id,
SearchPattern,
Rank
FROM Unencrypted.Person
CROSS APPLY [String].[Fn_GetHashedLetterPosition](LastName)
I used the following function to hash the values for the dictionary (of course, we would use a better hashing mecanism with a specific Salt. I just wanted here to simplify the test.
- Hash function
Here is the hash function used (for purpose of readiness in the table I converted the varbinary value of the hashbyte result to hexadecimal values before storing onto the database but this is not necessary):
CREATE FUNCTION [dbo].[fn_varbintohexstr] (@pbinin varbinary(max))
RETURNS nvarchar(max)
AS
BEGIN
RETURN sys.fn_varbintohexsubstring(1, @pbinin, 1, 0)
END
GO
CREATE FUNCTION [String].[Fn_GetHashedLetterPosition] (@String NVARCHAR(4000))
RETURNS TABLE
AS
RETURN
SELECT
dbo.fn_varbintohexstr(HASHBYTES('SHA1', SUBSTRING(@String, N.Num, 1))) AS SearchPattern,
RANK() OVER (ORDER BY N.Num) AS [Rank]
FROM [Performance].[Number] AS N
WHERE (N.Num >= 0 AND N.Num <= LEN(@String))
- Search the encrypted data
Here is the search stored procedure. It does work similarly than the one that worked on the unencrypted data, including the underscore wild card usage.
/* Prerequisites for performance:
CREATE INDEX IX_EncryptedPersonEmailAddressSearchText_SearchPattern
ON Encrypted.PersonEmailAddressSearchText(SearchPattern)
*/
ALTER PROCEDURE [Encrypt].[GetPersonFromEmailAddress]
@SearchPattern NVARCHAR(128) = 'ph__@adven',
@Accuracy SMALLINT = 0, --> if 0 this means no difference, 1 means 1 letter can be off, etc...
@IsRegExpr BIT = 1 --> this means that underscore can be used (more possible)
AS
BEGIN TRY
-- 1. get the length
DECLARE @Len SMALLINT = LEN(@SearchPattern)
-- 2. get the gram
DECLARE @Gram SMALLINT = 1
DECLARE @ErrMsg VARCHAR(256) =
'Search pattern must be above ' + CONVERT(VARCHAR(16), @Gram) + ' characters'
IF @Len < @Gram
RAISERROR(@ErrMsg, 16, 1)
-- 3. Create the search values
DECLARE @tSearchPattern TABLE
(
SearchPattern NVARCHAR(128),
[Rank] SMALLINT,
Underscore NVARCHAR(128)
)
DECLARE @TId TABLE
(
Id INT,
Accuracy SMALLINT
)
INSERT INTO @tSearchPattern (SearchPattern, [Rank], Underscore)
SELECT
SearchPattern,
[Rank],
UnderScore
FROM [String].[Fn_GetHashedLetterPosition](@SearchPattern) A
CROSS JOIN
(
SELECT CONVERT(NVARCHAR(4000), SearchPattern) AS UnderScore
FROM [String].[Fn_GetHashedLetterPosition]('_')
) B
-- If regex with underscore
IF @IsRegExpr = 1 AND CHARINDEX('_', @SearchPattern) > 0
BEGIN
WITH CTE AS
(
SELECT A.Id, A.[Rank]
FROM Encrypt.[PersonEmailAddressSearchText] A
JOIN @tSearchPattern B
ON A.SearchPattern = B.SearchPattern
WHERE B.[Rank] = 1
)
INSERT INTO @TId
SELECT
C.Id,
COUNT(*) AS Accuracy
FROM CTE C
JOIN Encrypt.PersonEmailAddressSearchText D
ON C.Id = D.Id
JOIN @tSearchPattern E
ON CASE WHEN E.SearchPattern = E.Underscore
THEN D.SearchPattern
ELSE E.SearchPattern
END = D.SearchPattern
AND E.[Rank] = D.[Rank] + 1 - C.[Rank]
GROUP BY C.Id, C.[Rank]
HAVING COUNT(*) >= @Len - @Accuracy
END
ELSE
BEGIN
WITH CTE AS
(
SELECT A.Id, A.[Rank]
FROM [Encrypt].[PersonEmailAddressSearchText] A
JOIN @tSearchPattern B
ON A.SearchPattern = B.SearchPattern
WHERE B.[Rank] = 1
)
INSERT INTO @TId
SELECT
C.Id,
COUNT(*) AS Accuracy
FROM CTE C
JOIN Encrypt.PersonEmailAddressSearchText D
ON C.Id = D.Id
JOIN @tSearchPattern E
ON D.SearchPattern = E.SearchPattern
AND E.[Rank] = D.[Rank] + 1 - C.[Rank]
GROUP BY C.Id
HAVING COUNT(*) >= @Len - @Accuracy
END
-- Decrypt results
OPEN SYMMETRIC KEY SymmetricKey1
DECRYPTION BY CERTIFICATE Certificate1;
SELECT
B.Id,
@Len - Accuracy AS LetterOff,
CONVERT(NVARCHAR(4000), DecryptByKey(A.EmailAddress)) AS EmailAddress
FROM Encrypt.[Person] A
JOIN @TId B
ON A.Id = B.Id
ORDER BY LetterOff
CLOSE SYMMETRIC KEY SymmetricKey1;
END TRY
BEGIN CATCH
SET @ErrMsg = ERROR_MESSAGE()
RAISERROR(@ErrMsg, 16, 1)
END CATCH
GO
The main thing to consider is to hash the underscore value as you will compare only hashed values here. In this stored procedure, the result is purposely decrypted but obviously we can keep the result encrypted and let the application decrypt it (better security).