Developers running large transaction processing systems sometimes have to partition their production relational databases for both performance and capacity reasons. This action is known as sharding, but can be quite difficult to do… until now. We are going to look at a possible solution, known as the ScaleBase Database Load Balancer.
The ScaleBase Database Load Balancer, launched yesterday, is a proxy server that sits in front of your actual database and breaks a monolithic relational database into small sections and spreads it out across multiple physical servers.
The partition of relational databases (sharding) has been done for many years and is a popular method of increasing the performance of databases working on very large servers or spread across multiple clustered nodes. However, if you shard your database, you are basically rewriting the whole data access layer of a database management system – which is what Oracle Real Application Clusters do. Basic sharding algorithms often spread data over a fixed number of nodes, and reports and applications based on the database have to be tweaked to be aware of the shards. The backing up and tuning of each database node traditionally has to be completed by hand.
The ScaleBase Database Load Balancer wants to trick those applications, backup programs, and report writers into thinking they are talking to one database even though they are in fact talking too many. It performs exactly like a MySQL or Oracle database would at the network level to any application, but it shards the database across multiple nodes automatically. The database proxy then accepts SQL commands and depending on what those commands are, it either runs the query against the appropriate subset of the database or across all the shards at once. You don”t have to change one line of your application code, but you may have to work out a different license with your database vendor.
The Database Load Balancer is packaged up in a virtual machine that is compatible with Amazon”s EC2 compute cloud as well as VMware”s ESXi hypervisor. The database shards themselves can be run inside virtual machines or on bare metal. Each node of the ScaleBase tool can manage from 8 to 12 database nodes, according to ScaleBase, with the load balancer itself being just a two-socket machine with four-core x64 processors and 16GB of memory.
Currently the Database Load Balancer from ScaleBase supports the open source MySQL database, now controlled by Oracle, with the next database to get front-ended and sharded will likely be Oracle”s eponymous database. Depending on customer demand, ScaleBase will add support for IBM”s DB2, Microsoft”s SQL Server, and other open source databases such as PostgreSQL. The sharding program was written in Java and requires a Java SE6-compliant runtime to operate. While all of the beta testers have deployed the tool on top of Linux, the program will run atop AIX. Solaris, HP-UX, and any other box that has the right Java support. Customers should cluster their ScaleBase sharding nodes for high availability, of course, and the architecture recommends having standaby servers for each shard as well.
The automated sharding software is the brainchild of Doron Levari, currently the CEO at ScaleBase, and Liran Zelkha, vice president of business development. Levari has been a database administrator for 15 years, and ran Aluna, a database consulting firm that was eventually sold to Matrix, the largest system integrator in Israel. Zelkha has worked for a number of large-scale database and cloud hosting projects and kept running into the same issues of performance and scalability.
“The third time we wrote a sharding layer for a customer, we knew we were on to something,” Zelkha said.