When the company’s organization become large and production database servers have hundreds of jobs running with different schedule, the DBA might lose track of what job must be run successfully and in a timely matter.

Various reasons such as sql server maintenance windows could require disabling steps of jobs in its entirety. Human error is possible when the jobs is not reenabled.

In my experience, DBA/DevOps could forget to reenable a job such as:

  • A job that extracts data for archiving
  • A job that processes a cube
  • A job that updates the operations db at night
  • A job that monitors specific tables
  • A job that reindexes tables or update statistics

When having a major release such jobs could be disabled. They must however be reenabled but the release might be long, go the distance (until late in a week-end) and DBA/Devops might forget to reenable it. Later during the week, we might realize the issue but it might be too late…

As the job is disabled, the natural email sent to notify success, failure or completion of jobs do not arrive and DBA/DevOps might believe all is good, business as usual.

So how to prevent such issues?

Implementing a monitor:

The best solution is always a solution that is agnostic to what the DBA/DevOps believe is supervised. By agnostic, I mean that the monitor MUST be running and supervise all jobs including the ones disabled disregarding what the DBA or the DevOps beliefs.

Here is one simple and free solution but there might be many solution to implement. This one relates to a solution implemented on a SQL server on premises but it can work as well on the cloud with an ETL solution such as an SSIS package that would just inserts and queries remotely or via a sql server job on the cloud server (depending whether Sql server agent is available for you on the cloud).

The table I designed under a repository where I collect statistics to monitor database (here DBATool) is the following one:

USE [DBATool]

GO

SET ANSI_NULLS ON

GO

SET QUOTED_IDENTIFIER ON

GO

CREATE TABLE [Tool].[JobStepToMonitor](

[ServerName] [varchar](32) NOT NULL,

[JobName] [varchar](128) NOT NULL,

[JobStep] [varchar](128) NOT NULL,

[IsMonitored] [bit] NULL,

[CheckRunWithinMin] [int] NULL,

[AlertWhenRunningMoreThan] [int] NULL,

CONSTRAINT [pkToolJobStepToMonitor] PRIMARY KEY CLUSTERED

(

[ServerName] ASC,

[JobName] ASC,

[JobStep] ASC

)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

) ON [PRIMARY]

GO

I decided to use the names instead of Ids as the table is small and to make it easier to migrate such data in case we create the same jobs and steps on other servers which could be the case in a federated server farms serving the same operational databases with the same sets of jobs.

3 fields are interesting:

  • IsMonitored field enables to monitor the job/step or not. This can be very handy as not all the jobs/steps need to be monitored.
  • CheckRunWithinMin field indicates that monitor how to flag the last successful run of the jobs when querying the history. For example, if the job step needed to be run in the last 60 minutes, the value would be 60 and the monitor would make sure that the job step was run within those 60 minutes.
  • AlertWhenRunningMoreThan field enables the monitor to send an alert when the job has been running for more than X minutes. In my experience, I have seen jobs that usually takes seconds bein locked and run minutes to hours and even days. This helped the DBA/Devops to be alerted on jobs that are running abnormally a long time and prevent catastrophic issues.

How to fill jobs and steps:

This stored procedure executed on regular basis add the jobs and steps on the table (with the flag IsMonitored = 0 and some default values the fields for the monitor. It is then relatively easy for the DBA/DevOps to add specific values and enable the monitor onto the job by changing the value of those fields:

CREATE PROCEDURE [Tool].[FillMonitorJobStep]

AS

INSERT INTO Tool.JobStepToMonitor

SELECT @@ServerName, A.Name, B.Step_Name, 0, 60*24*3, null

FROM Msdb.dbo.Sysjobs A JOIN Msdb.dbo.SysjobSteps B ON A.Job_Id = B.Job_Id

WHERE NOT EXISTS (SELECT 1 FROM Tool.JobStepToMonitor C WHERE A.Name = C.JobName AND C.JobStep = B.Step_Name)

GO

How to monitor the jobs:

Add such stored procedure execution to a sql server job and run it as frequently as you seem fit. In my case, I run it daily in the morning as I authorize DBA/DevOps to disable the jobs for more than a day. However, it depends on how long the jobs are allowed to be disabled or fail. Note that I use a table called Alert.T_DatabaseMailConfiguration to manage the recipients of my alerts and use SqlAlerts as a database mail profile.

CREATE PROCEDURE [Tool].[MonitorJobStep]

@nId tinyint = 1

AS

--SET NOCOUNT ON

DECLARE @recipients VARCHAR(512)

SELECT @recipients = Recipients FROM Alert.T_DatabaseMailConfiguration WHERE Id = @nId

SELECT *, convert(tinyint,0) IsChecked

INTO #tmpJobStep

FROM Tool.JobStepToMonitor WHERE IsMonitored = 1 AND ServerName = @@SERVERNAME

UPDATE PUB

SET IsChecked = 1

FROM #tmpJobStep PUB

JOIN

(

SELECT DISTINCT Jobname, JobStep FROM

(

SELECT b.*,

CONVERT(DATETIME, RTRIM(run_date)) +

((run_time/10000 * 3600) + ((run_time%10000)/100*60) + (run_time%10000)%100 /*run_time_elapsed_seconds*/) / (23.999999*3600 /* seconds in a day*/) AS Start_DateTime

, CONVERT(DATETIME, RTRIM(run_date)) + ((run_time/10000 * 3600) + ((run_time%10000)/100*60) + (run_time%10000)%100) / (86399.9964 /* Start Date Time */)

+ ((run_duration/10000 * 3600) + ((run_duration%10000)/100*60) + (run_duration%10000)%100 /*run_duration_elapsed_seconds*/) / (86399.9964 /* seconds in a day*/) AS End_DateTime

FROM #tmpJobStep b

LEFT JOIN msdb..sysjobhistory a

ON a.step_name = b.jobstep AND run_status = 1 -- success

LEFT JOIN msdb..sysjobs c

ON a.job_id = c.job_id AND c.name = b.JobName

) A

WHERE End_DateTime >= DATEADD(MI,-CheckRunWithinMin,GETDATE())

) SUB

ON PUB.JobName = SUB.JobName AND PUB.JObStep = SUB.JobStep

IF EXISTS (   SELECT * FROM #TmpJobStep ) --WHERE IsChecked = 0)

BEGIN

DECLARE @tableHTML  NVARCHAR(MAX) ;

SET @tableHTML =

N'<Div style="font-weight: bold;font-size: x-small">

' + @@servername + ' - Job Step monitor threshold Report:<br><br>

</Div>' +

N'<table border="1">' +

N'<tr style="background-color: #FFC;font-weight: bold;font-size: x-small;">' +

N'     <td>Server Name</td>

<td>Job Name</td>

<td>Job Step</td>

<td>Run</td>

<td>Check Run within X minutes</td>' +

N'</tr>'+

CAST ( ( SELECT td = a.ServerName,       '',

td = a.JobName, '',

td = a.JobStep, '',

td = CASE WHEN a.IsChecked = 1 THEN 'OK' ELSE 'KO' END, '',

td = a.CheckRunWithinMin, ''

FROM #TmpJobStep a

ORDER BY 4,1,2,3

--WHERE      IsChecked = 0

FOR XML PATH('tr_row'), TYPE

) AS NVARCHAR(MAX) ) +

N'</table>' ;

SET @tableHTML = REPLACE(@tableHTML,'<tr_row>','<tr style="font-weight: normal;font-size: x-small;">')

SET @tableHTML = REPLACE(@tableHTML,'&lt;br&gt;','<br>')

DECLARE @Subject VARCHAR(256)

SET @Subject = 'Sql Server Alert Job: Job Step Monitor'

IF EXISTS (SELECT 1 FROM #TmpJobStep WHERE IsChecked = 0)

SET @Subject = @Subject + ' - KO'

ELSE

SET @Subject = @Subject + ' - OK'

 

EXEC msdb.dbo.sp_send_dbmail

@profile_name = 'SqlAlerts',

@recipients= @recipients,

@subject = @subject,

@body = @tableHTML,

@body_format = 'HTML' --,

--@query = 'Select * from DBATool.Alert.DataFileSizeWarning' ,

--@attach_query_result_as_file = 2 ;

END

GO

Here is a screenshot of the email I receive :

As you noticed, the email warns on the subject that there is an issue by informing a KO. We should send an email when all is good as well so that we are sure the job is not disabled.

This way DBA/DevOps are notified even for a step that was skipped in a job or when a job is disabled or simply if a step or a job failed.

Additionally we can monitor jobs that run for an abnormal duration with the following stored procedure:

CREATE PROCEDURE [Tool].[JobRunningMonitor]

@Minutes smallint = 200,

@nId tinyint = 1

AS

SET NOCOUNT ON

DECLARE @recipients VARCHAR(512)

SELECT @recipients = Recipients FROM Alert.T_DatabaseMailConfiguration WHERE Id = @nId

SELECT @@Servername ServerName, b.Name JobName, DATEDIFF(MI,start_execution_date,GETDATE()) MinutesRunning

INTO #JobWithLongRunning

from msdb..sysjobactivity a

JOIN msdb..sysjobs b on a.job_id = b.job_id

WHERE start_execution_date >= DATEADD(DAY,-5,GETDATE())

AND job_history_id IS NULL

AND DATEDIFF(MI,start_execution_date,GETDATE()) >=  @Minutes

order by start_execution_date desc

IF EXISTS (   SELECT * FROM #JobWithLongRunning ) --WHERE IsChecked = 0)

BEGIN

DECLARE @tableHTML  NVARCHAR(MAX) ;

SET @tableHTML =

N'<Div style="font-weight: bold;font-size: x-small">

' + @@servername + ' - Job Running monitor threshold Report (above '+CONVERT(VARCHAR(16),@Minutes)+' minutes):<br><br>

</Div>' +

N'<table border="1">' +

N'<tr style="background-color: #FFC;font-weight: bold;font-size: x-small;">' +

N'     <td>Server Name</td>

<td>Job Name</td>

<td>Running for x minutes</td>' +

N'</tr>'+

CAST ( ( SELECT td = a.ServerName,       '',

td = a.JobName, '',

td = a.MinutesRunning, ''

FROM #JobWithLongRunning a

--WHERE      IsChecked = 0

FOR XML PATH('tr_row'), TYPE

) AS NVARCHAR(MAX) ) +

N'</table>' ;

SET @tableHTML = REPLACE(@tableHTML,'<tr_row>','<tr style="font-weight: normal;font-size: x-small;">')

SET @tableHTML = REPLACE(@tableHTML,'&lt;br&gt;','<br>')

DECLARE @Subject VARCHAR(256)

SET @Subject = 'Sql Server Alert Job: Job Running Monitor'

EXEC msdb.dbo.sp_send_dbmail

@profile_name = 'SqlAlerts',

@recipients=@recipients,

@subject = @subject,

@body = @tableHTML,

@body_format = 'HTML' --,

--@query = 'Select * from DBATool.Alert.DataFileSizeWarning' ,

--@attach_query_result_as_file = 2 ;

END

GO

Similarly, we would receive an alert if some jobs would run abnormally and such monitoring can be frequently run (in my case it is run hourly):

In our case, the job had been running for more than 330 minutes while we authorized only 200 minutes for this particular job in the table that we showed above.

Such monitor is very handy especially in the cloud as there is a lack of customized monitoring tool. Please do not hesitate to comment and/or use the code above. I engage no responsibility if the code does not work properly (legal stuff…).