Should you shard or centralize your database ?
Nowadays, business tends to get very big database very quickly and they usually need to find ways to manage functionally and physically data in a more efficient matter, both for security and performance reasons.
In many ways, centralizing the operating database can make your life easier. However, scalability solution tends to be difficult to put in place and highly skilled data architects and administrators will need to be hired.
In many times, compliance like GDPR and other heavy regulations (KYC, PCI) can put in jeopardy the previous solution.
So I also like to ask my peers on linkedin, what do you prefer? Sharding or centralization? Of course I am expecting the “it depends” answer but I also want to ask what you like to work with, not what is preferable.
I tend to like sharding actually. And you?
1. Centralized operating database helps deploy functionalities faster.
If your business delivers functionalities very quickly and require many change to the data modelling in the database, it is easier to deploy new model in one database compared to x number of databases.
2. Sharding operating businesses helps mitigate maintenance/regression risks, downtime.
When you deploy to only one small shards, you can mitigate the risks of bugs. A large release after many testing could still when deployed be faulty for many reasons. Deploying on a small business shard helps you validate even more your large release.
Additionally, you can reduce downtime by rolling over deployment and play with the time zones in a global business. If you use a centralized database that operates many businesses in different time zones, some business segments can suffer traffic disturbance during a maintenance window.
3. Sharding gives a natural functional way to render scalable your operating system
With sharding, you can dedicate resources (database, server as well) to specific segments or even clients (like SaaS). However, with the cloud, data partitioning technics, you can achieve better management and scalability than before and in a way you simplify things.
Sharding allows you natural high availability and many active nodes at the same time. For example, if a shard goes down, your other business segments are still up and also you can have on the side a “spare” shard to get the business on the segment back up in no time, or just have multiple shards on the same business segment to render your “cluster” of nodes active / active.
4. Centralized database facilitates Data warehousing
With centralized database, ids are easier to manage. Sharding, not only require a management at the application layer (to use the right shard) but extraction of data warehouse require a more elaborate configuration. However, if you apply industrialization process in your data warehouse, sharding is actually not much more difficult. It might even mitigate the issue of dealing with large record sets if done properly.
5. Sharding vs centralization for compliance
I am not sure on this one. It seems easier to manage one database to be compliant. But in the same time, you can just rollover compliance in an easier fashion (when encrypting personal data or applying certificate and patching servers that demand downtime, etc…
Some business also might require the data to be located in specific countries (like business in Saudi Arabia compliance regulations). Sharding allows you to keep the client data in the country while other clients’ data can be located in other countries.