Monday, October 30, 2006

Scaling MySQL

Have been experimenting with MySQL 5. It's not optimized right out of the box which is expected. But I can't find a lot of literature on the web to solve them. Seems MySQL is mostly used by kiddies to run their website who does no performance tunning. Of course, it could also mean I haven't looked well enough.

Anyway, my concern is with scaling. Cluster or replication... that is the question. Replication works because its mostly read based. You can have a central master with slaves replicating from it. Thankfully, I don't need real-time replication. The data size is now in giga bytes which will eventually approach tera bytes. More data means more time for replicating. The slaves will be down during replication but need them to be up 24/7. One way to solve this is to have two sets of slaves. Replicate one set while the other one is up and then switch over. However, this is a waste of resource as one set will always be unsed.

Basically, the problem with replication is it doesn't scale the writes. Expensive raid can speed it up but I can't afford that. As soon as I hit 100% of write transaction operation, I've hit the scaling wall with replication.

I need to investigate vertical and horizontal partitioning and clustering. I'm using my text book, Distributed Systems by Coulouris, Dollimore and Kindberg to base my theory. The book is supposed to be a bible on Distributed Systems. But its only general theory. I need something more geared towards MySQL. Anyone has any lead?

Google uses BigTable which is IO based instead of SQL based. Too bad I can't use that. Think at some point I'll have to develop my own custom thingy. Of course, the other bootleneck with replication is the network. Gigabit ethernet isn't going to cut it anymore. I have seen some cheap optical network card on eBay for $30, but optical cables cost about $100 per 10 meters. I can't afford that either. Oh well.

No comments: