Published on: April 6th 2010

Hey,

This is my first article. My friends over facebook commented on my blog and asked for a first article. The challenge is there: try to wrtie an article frequently (I wish daily but this is too presemptuous: plus, I am going to have a baby and I am not sure I will focus on the blog).

This blog is not intended to show how to implement things. I can definitely help people on that, no problem but I want the blog to be light-wighted and focusing on architecting and not on implementing.

So the first topic is transactional replication, requested by my good friend John Park, who spend several poker nights and football beer trips over the fifth bar in the rue Mouffetard.... good time !

This is an interesting topic. Too often companies completely miscalculate the use of replication. Replication is considered as one way for high availability. It is not a solution for high availability, far from it!

Replication should be use to spread information across sql servers. I can see usual needs between a sql server handling the back office / configuration of a website and a front office completely denormalized and ready to rock - type servers.

The nice thing with replication is that you do not have to worry about all the coding to transfer the data. Data is transferred by simple technique that everybody can implement. Bcp, stored procedure to prepare the snapshot and the transaction data, a log reader at the distribution to control the overall flow and at the target some local procedure to make sure we do not implement distributed transactions. Overall a good mechanism.

Obviously this goes with drawbacks and one of the main ones is administration. Replication is super easy to implement but the not-experienced DBA will have a hard time to get this maintained. Large reindexing, transaction stock at the source server or any kind of non-sized correctly log file can be trouble! Well he will end up breaking the replication and rebuilding it while there are techniques to release pressure, publish only one article, etc...

Servers and databases should be sized correctly to implement it and distribution server, if replication is very heavy should be deported to a dedicated server.

Another aspect systematically overlook is the stickiness of replication: some DBA for non large companies are implementing replication, correctly, architecturally speaking but want a very easy administration on the source server and therefore implement the database with simple recovery, thinking it would make the database easy to maintain given the fact that no point-in-time restore nor any log backup should be implementing. Well, probably a good call. But replication does not give the ability to empty the log file if some transactions still get marked. DBA has to be aware of this particularity. When your applications writes data faster than the transactions get buffered and delivered by the distributer, your log will get bigger until the pressure is released.

Another aspect neglected is the fact that implementing a snapshot replication and not real-time replication will be heavy. The bigger your database is the bigger your snapshot will be. It is probably safe to say that real-time replication on most circumstances is your best bet.

Another important architectural advise: just replicate your tables, not your stored procedures. You will not be able to change your stored procedure at the source while you are not going to likely want to change it at the target immediately. Just get a simple development lifecycle and take the time to deploy it in packages.

Another advise: put your articles (tables then) in a different database on the subscriber and point to the tables with synonyms on the real target database. This will allow you to maintain your connection to the subscribed articles and if you need to reinitialize the snapshot, you will be able to do it without downtime by creation a new subscriber on a different database and switching synonyms at the last moment, reducing significantly the downtime. Also, having a different database will allow you to put simple recovery while your target database require to be in full recovery. Replicated data are just redundant data that do not need to be in full recovery at the subscriber.

Well I let you comment a bit on this first article. As you noted, I am not the kind of DBAs that really enjoy replication. At my main client, very visible one, like a flagship customer for Microsoft, they implemented replication everywhere, even for Informatica and power exchange, that rely on replication to extract real-time data (I cannot believe a third-party provider makes Microsoft rely on one additional layer)! Well the thing is that this client has open a case with Microsoft on a bug concerning the purge of the log reader and this case has been open for too long, like 6 months. Microsoft comes and go to the office, making good money out of it but cannot fix the issue. This has to be said, very large replication is not handle very well by Microsoft!

So here I say it as well: try to use replication for configuration data but data that are handled over tera octets, forget it or you will suffer! Better use simple data warehouse extraction techniques