Phase 2: AE and search patterns
Hello again,
This time, I implemented the same code but with encryption and hashing, encryption with a symetric key for the email addresses and hashing for the dictionary. This would be similar as implementing random encryption for the email address field and deterministic encryption for the dictionary under Always Encrypted (AE) setup.
Before I do the next step and implemented this under always encrypted on another article, I wanted to implement such encryptions outside AE. Always encrypted is a SQL server feature. This way, the encryption/hashing mecanism I use here can be implemented on any database server with encryption at the application layer or database layer.
For testing purposes, I use the database to generate the encryption keys and certificates and cypher/decypher values but I recommend to do it on the application level, so that you minimize the circuit where the values are potentially unencrypted.
- Setup the encryption keys:
As my main goal is not to implement the best secured keys under the best practices but just to test the solution, I looked at a simple way to implement it using the web page here: How to setup symetric encryption on SQL server.
Here is the main code to set this up using this link:
-- Create database Key
USE AETest;
GO
CREATE MASTER KEY ENCRYPTION BY PASSWORD = 'Password123';
GO
CREATE CERTIFICATE Certificate1
WITH SUBJECT = 'Protect Data';
GO
-- Create symmetric Key
CREATE SYMMETRIC KEY SymmetricKey1
WITH ALGORITHM = AES_128
ENCRYPTION BY CERTIFICATE Certificate1;
GO
Once this is done we can easily use the encryption and decryption functions. I also checked whether I can do search on it. It is not possible, so it acts exactly as a random encryption, which is what we wanted to test.
2. Setup the table with the encrypted data and the dictionary of EmailAddress
CREATE TABLE [Encrypt].[Person](
[Id] [bigint] NOT NULL,
[LastName] [varbinary](max) NOT NULL,
[EmailAddress] [varbinary](max) NULL,
CONSTRAINT [PK_EPerson] PRIMARY KEY CLUSTERED
(
[Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
GO
And:
CREATE TABLE [Encrypt].[PersonEmailAddressSearchText](
[Id] [int] NOT NULL,
[SearchPattern] [nvarchar](4000) NOT NULL,
[Rank] [smallint] NOT NULL
) ON [PRIMARY]
GO
CREATE CLUSTERED INDEX [IX_EncryptPersonEmailAddressearchText_IdRank] ON [Encrypt].[PersonEmailAddressSearchText]
(
[Id] ASC,
[Rank] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX [IX_EncryptPersonEmailAddressSearchText_SearchPattern] ON [Encrypt].[PersonEmailAddressSearchText]
(
[SearchPattern] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
3. Fill up the data
The data would be filled up by the application but for testing puposes, we will fill up the data from the unencrypted dataset (please see previous article regarding the dataset).
OPEN SYMMETRIC KEY SymmetricKey1
DECRYPTION BY CERTIFICATE Certificate1;
GO
INSERT INTO Encrypt.Person
SELECT Id, EncryptByKey (Key_GUID('SymmetricKey1'),LastName) LastName, EncryptByKey (Key_GUID('SymmetricKey1'),EmailAddress) EmailAddress
FROM Unencrypted.person
GO
-- Closes the symmetric key
CLOSE SYMMETRIC KEY SymmetricKey1;
GO
TRUNCATE TABLE Encrypt.PersonLastNameSearchText
INSERT INTO Encrypt.PersonLastNameSearchText
select Id,SearchPattern,Rank from Unencrypted.Person
cross apply [String].[Fn_GetHashedLetterPosition](LastName)
I used the following function to hash the values for the dictionary (of course, we would use a better hashing mecanism with a specific Salt. I just wanted here to simplify the test.
4. Hash function
Here is the hash function used (for purpose of readiness in the table I converted the varbinary value of the hashbyte result to hexadecimal values before storing onto the database but this is not necessary):
CREATE function [dbo].[fn_varbintohexstr]
(
@pbinin varbinary(max)
)
returns nvarchar(max)
as
begin
return sys.fn_varbintohexsubstring(1,@pbinin,1,0)
end
GO
CREATE FUNCTION [String].[Fn_GetHashedLetterPosition]
(@String NVARCHAR (4000))
RETURNS TABLE
AS
RETURN
SELECT
dbo.Fn_varbintohexstr(HASHBYTES('SHA1', SUBSTRING(@String, N.Num, 1))) AS SearchPattern,
RANK() OVER (ORDER BY N.Num) AS [Rank]
FROM [Performance].[Number] AS N
WHERE (N.Num >= 0
AND N.Num <= LEN(@String))
5. Search the encrypted data
Here is the search stored procedure. It does work similarly than the one that worked on the unencrypted data, including the underscore wild card usage.
/*
Prerequisites for performance:
CREATE INDEX IX_EncryptedPersonEmailAddressSearchText_SearchPattern ON Encrypted.PersonEmailAddressSearchText(SearchPattern)
*/
ALTER PROCEDURE [Encrypt].[GetPersonFromEmailAddress]
@SearchPattern nvarchar(128) = 'ph__@adven',
@Accuracy smallint = 0, --> if 0 this means no difference, 1 means 1 letter can be off, etc...
@IsRegExpr bit = 1 --> this means that underscore can be used (more possible)
AS
BEGIN TRY
-- 1. get the lenth
DECLARE @Len smallint = LEN(@SearchPattern)
-- 2. get the gram
DECLARE @Gram smallint = 1
DECLARE @ErrMsg varchar(256) = 'Search pattern must be above '+CONVERT(VARCHAR(16),@Gram)+' characters'
IF @Len < @Gram
RAISERROR(@ErrMsg,16,1)
-- 3. Create the search values
DECLARE @tSearchPattern TABLE (SearchPattern NVARCHAR(128),[Rank] smallint, Underscore NVARCHAR(128))
DECLARE @TId table (Id Int, Accuracy smallint)
INSERT INTO @tSearchPattern (SearchPattern, [Rank], Underscore)
SELECT SearchPattern,[Rank],UnderScore FROM [String].[Fn_GetHashedLetterPosition](@SearchPattern) A
CROSS JOIN (SELECT convert(nvarchar(4000),SearchPattern) UnderScore FROM [String].[Fn_GetHashedLetterPosition]('_') ) B
-- If the gram is less than the len of the text to search:
IF @IsRegExpr = 1 AND CHARINDEX('_',@SearchPattern)>0
BEGIN
WITH CTE AS (
SELECT A.Id, A.[Rank]
FROM Encrypt.[PersonEmailAddressSearchText] A
JOIN @tSearchPattern B ON A.SearchPattern = B.SearchPattern
WHERE B.[Rank] = 1
)
INSERT INTO @TId
SELECT C.Id, COUNT(*) Accuracy
FROM CTE C
JOIN Encrypt.PersonEmailAddressSearchText D ON C.Id = D.Id
JOIN @tSearchPattern E ON
CASE WHEN E.SearchPattern = E.Underscore THEN D.SearchPattern ELSE E.SearchPattern END = D.SearchPattern
AND E.[Rank] = D.[Rank] + 1 - C.[Rank]
GROUP BY C.Id, C.[Rank]
HAVING COUNT(*) >= @Len-@Accuracy
END
ELSE
BEGIN
WITH CTE AS (
SELECT A.Id, A.[Rank]
FROM [Encrypt].[PersonEmailAddressSearchText] A
JOIN @tSearchPattern B ON A.SearchPattern = B.SearchPattern
WHERE B.[Rank] = 1
)
INSERT INTO @TId
SELECT C.Id, COUNT(*) Accuracy
FROM CTE C
JOIN Encrypt.PersonEmailAddressSearchText D ON C.Id = D.Id
JOIN @tSearchPattern E
ON
D.SearchPattern = E.SearchPattern
AND E.[Rank] = D.[Rank] + 1 - C.[Rank]
GROUP BY C.Id
HAVING COUNT(*) >= @Len-@Accuracy
END
OPEN SYMMETRIC KEY SymmetricKey1
DECRYPTION BY CERTIFICATE Certificate1;
SELECT B.Id, @Len - Accuracy LetterOff, convert(nvarchar(4000),DecryptByKey(A.EmailAddress)) EmailAddress FROM Encrypt.[Person] A
JOIN @TId B ON A.Id = B.Id
ORDER BY LetterOff
CLOSE SYMMETRIC KEY SymmetricKey1;
END TRY
BEGIN CATCH
SET @ErrMsg = ERROR_MESSAGE()
RAISERROR(@ErrMsg,16,1)
END CATCH
go
The main thing to consider is to hash the underscore value as you will compare only hashed values here. In this stored procedure, the result is purposely decrypted but obviously we can keep the result encrypted and let the application decrypt it (better security).
Here are a couple of tests that works totally similarly to the previous article:
or with the usage of underscore wild card:
Please do not hesitate to share your point of view.
Clement
Related Posts
Leave a Reply Cancel reply
Categories
- Inspiration (1)
- Security (2)
- SQL Architecture (21)
- SQL Monitoring (1)
- SQL Performance (16)
- Tips & tricks (1)
- Uncategorized (5)
- Utilities (1)