Hello guys,

 

Today we will discuss the different high availability (HA) strategy you have on SQL server.

 

Basically, it all depends on your needs, your SLA towards your clients and the level of satisfactory you would like to have to your client.

Also, it depends on your wallet. Some HA strategies are very expensive, some are very cheap.

I am going to rate all solution from 1 to 10, 10 being the best and 1 is the worse for the criteria.

 

Here are the question you should consider:

 

1. How much money would I lose if my system would not work for 24 hours?

 

It is pretty reasonnable to think that you would need one full day to rebuild your database server, giving a robust backup regime and a good database administrator. Contrarily to a web server, database server can take more time to rebuild. If your primary server crashed and you do not have an HA strategy in place, you would have to reinstall Sql server on another machine, restore all databases, rebuild all server objects that are not within msdb or master and/or remap all your datasource or swith DNS lookup towards the new server.

If 1 day downtime does not make your company suffer, you do not need an HA strategy! It would be too expensive to maintain, dardware and software wise and it is not necessary for your company. Obviously as a DBA, it is hard for me to say it but it makes sense.

The How much money you can lose should also indicate how much budget you should dedicate on your Disaster Revovery Plan and High Availability solution, if any.

 

2. What is your SLA or satisfactory level?

 

If 1 day is too much, then it all depends on your degree of HA desired. This can be real-time and low maintenance, real-time and high maintenance, not real-time and low maintenance, etc...

For example, log shipping is less ressource intensive than mirroring but does not supply a real-time solution. For many companies this would be fine to have the risk to lose up to 15 mn. of data and recover immediately a crash. In the mean time, log shipping as well as mirroring (until the next release of Microsoft SQL server 2010, according to my internal contacts at Microsoft) are not providing HA for your server objects. This means you would need to care about SQL server jobs, DTSX packages, credentials, DBMail, Logins, Service brokers, separately. LS and Mirroring would only cover the databases.

Another solution is clustering and give you real-time coverage of your databases but also your server objects. As your storeage is shared on this solution, you need to make sure your storeage area does not become a single point of failure. For performance, you would also need to make all possible for your SAN to be setup for optimized database files' storeage.

 

Here is the rate for all the solution for this particular criteria: minimizing downtime and data loss

Sql server cluster: 9

(not much chance to lose significant downtime and data)

Mirroring: 8

(not much chance to lose significant downtime (maybe logins not deployed correctly) and data but maybe some server objects)

Log Shipping: 6

(chance to lose some downtime as this is a manual failover, also logins not deployed correctly on secondary server and some other server objects)

Replication: 6

(chance to lose some downtime as this is a manual failover, also the publication need to be broken to make your data available and proper installation needs to be done (to avoid identities reseed, not for replication foreign key and triggers activation, etc...)

Sharding: 5

(with proper architecture, sharding can also be good and not losing significant downtime but it is hard to find the right architecture. Sharding is almost th eopposite of one centralized database)

 

3. How much maintenance is required?

 

Some solution require very reccurent care. Some others not. Upgrading/patching can also be far more difficult for one solution compare to another one.

 

Here is the rate for all the solution for this particular criteria: high maintenance

Sql server cluster: 7

(once installed, maintenance is minimal but still require senior level management)

Mirroring: 7

(once installed, maintenance is low but still require specific maintenance)

Log Shipping: 8

(once installed, maintenance is low but still require proper supervision)

Replication: 7

(once installed, maintenance is low, especially if properly installed (separate distributor from publisher) but still require proper supervision)

Sharding: 5

(versioning and deployment is much more challenging, also merging shards or spliting shards while activity is on, more backups to do, more work on datawarehousing...: this is probably the most important drawback)

 

4. How much human ressources (DBA, system engineers) you like to dedicate

 

At the install, SQL server cluster is intensive in human ressource, it requires network engineer to set up all the proper ips, private Vlan. It requires system engineers to set up the shared disk ressources and set up the cluster groups. It requires DBA to install SQL server services in cluster (but SSIS, installed as active/active). It is much easier though to manage SQL server cluster on a daily maintenance as all server objects are maintained in cluster, even the server ressource allocation such as memory, CPU allocation or degree of parallelism. With a virtual IP, application just connect onto one virtual IP or Server Name. When in need of failover, the management is fairly easy as you just move the cluster group that brings in less than 30 seconds all the ressources onto the secondary server, becoming the active one. Your SPOF at the storeage level results in controller cards that fail or disk that fail. Usually this is where your cost will go onto: securing your storeage tier.

All of this require senior level engineers even though on daily basis, there is nothing to do.

 

At the install, mirroring is not too difficult to install. Network engineer would ensure DNS suffix and proper private network and firewall settings and DBA would set up all the security correctly as well as synchronising the databases. Once all done, daily maintenance is not too difficult but activity needs to be supervize as mirroring can suffer in performance during large operations such as massive reindexing for example (suspend mirroring during reindexing). Also server objects have to be created on both side with server specificitt=y (keep login SID, make sure to use localhost as server name for the server on DTSX, etc...). Automatic failover is ensure with a witness server. Manual failover is also fairly easy to deal with. Application datasource needs to handle a failover parner to switch to the partner in cas of a failover.

Log shipping might be the easiest solution and might be the best to implement for most of the companies. It just requires proper networking between the two servers. DBA will set up very easily a log shippment between the two servers. Proper monitoring.

 

Here is the rate for all the solution for this particular criteria: human ressources

Sql server cluster: 9

(once installed, DBA only, mid-level)

Mirroring: 9 (nowdays: Always-on replaced Mirroring)

(once installed, DBA only, mid-level)

Log Shipping: 9

(once installed, DBA only, mid-level)

Replication: 6

(once installed, DBA only, senior level)

Sharding: 6

(once installed, DBA only, senior level)

 

5. How much performance can you accept to lose?

 

This is an obvious equation. Adding redundancy usually adds up in ressource usage. For example, high safety mirroring does not give as much performance as asynchroneous mirroring. The first one gives real-time mirroring of all your database-specific operations but require a two-commit transaction, whereas asynchroneous mirroring does not provide real-time but does not influence much performance. Asynchroneous mirroring is still a good deal because it is quasi-real-time.

SQL server clustering does provide high performance as it does not require your sql code to be played at two database servers. Redundancy is offered at the storeage level. A software, cluster administration, enables you to automatically failover.

That is probably the best advantage of sharding and grid solution. The ability to lose performance is almost none. You will probably gain performance by sharding or setting up a grid. Writing on grid might be a bit impacted but with asynchroneous transaction on the other grid node, you will not have too much impact. Concerning the sharding, the write should be done only on one shard and limit in terms of ressources the load to the dedicated shard. Scalability is your key in these two solutions.

Log shipping is also a good solution as backup and transfer should not hinder much your performance on the principal server. You can also sue your secondary servers as read-only databases for your near-real-time  datasource revenues. Mirroring can give a small advantage on entreprise version with the snapshot of your mirrored databases but is limited to one passive node exposure.

SQL server clustering does not give you more performance but is not impacted either. So this is a pretty good solution overall.

Replication can also help exposing data onto a different server to avoid ressource usage of your reporting tools and cubes onto the primary sources. Replication can severely hinder, as well as mirroring performance when logs are not getting through due to a network failure, security disruption (replication and mirroring), or human errors (especially on replication, it can quickly happen). Replication can for example make your transaction log grow as the transactions are not marked bein processed by the distributor. Mirroring can also be abottleneck if your log does not get through, your mirroring is suspended for maintenance (reindexing for example).

 

Here is the rate for all the solution for this particular criteria: High performance

Sql server cluster: 7

(once installed, your ressources are not very impacted by the cluster. However it does not help.)

Mirroring: 5

(once installed, your ressources are somewhat impacted but you might use your mirrored databases with snapshots on entreprise to expose your data to reporting)

Log Shipping: 8

(once installed, DBA only, your ressources are not impacted by much and is often using ressources your disaster recovery plan would use anyway (I am thinking of your log backups and transfer). Your secondary server can expose your database with a standby version)

Replication: 6

(once installed, replication is ressource intensive unless you dedicate a distribution server, your data can be exposed for reporting)

Sharding: 9

(once installed, the solution can load-balance your reads and also your writes as you can access several shards)

 

Conclusion:

Depending on where your company is located on the criteria above, the best solution for your company can be one of the solution above or even another (like third party replication even at the subsystem level. I have seen success in all high availability strategy.

I think a lot of time log shipping is not considered quickly but one thing I like a lot with log shipping is the fact that your disastry recovery plan usually handles as well log backups. Log shipping does not involve a lot more work and is very inexpensive.

Also, HA can also be done geographically. Based on network speed between the two datacenters or two subnets, all the solution can be done but that comes with a definite price. Log shipping stays the cheapest solution. Mirroring is not far behind. Sharding is not that expensive either. Geoclustering is expensive definitely as you need to implement serious sussystem redundancy (like recoverypoint from EMC).

 

I hope this article can help. Do not hesite to comment.