Monday, October 30, 2006
Scaling MySQL
Anyway, my concern is with scaling. Cluster or replication... that is the question. Replication works because its mostly read based. You can have a central master with slaves replicating from it. Thankfully, I don't need real-time replication. The data size is now in giga bytes which will eventually approach tera bytes. More data means more time for replicating. The slaves will be down during replication but need them to be up 24/7. One way to solve this is to have two sets of slaves. Replicate one set while the other one is up and then switch over. However, this is a waste of resource as one set will always be unsed.
Basically, the problem with replication is it doesn't scale the writes. Expensive raid can speed it up but I can't afford that. As soon as I hit 100% of write transaction operation, I've hit the scaling wall with replication.
I need to investigate vertical and horizontal partitioning and clustering. I'm using my text book, Distributed Systems by Coulouris, Dollimore and Kindberg to base my theory. The book is supposed to be a bible on Distributed Systems. But its only general theory. I need something more geared towards MySQL. Anyone has any lead?
Google uses BigTable which is IO based instead of SQL based. Too bad I can't use that. Think at some point I'll have to develop my own custom thingy. Of course, the other bootleneck with replication is the network. Gigabit ethernet isn't going to cut it anymore. I have seen some cheap optical network card on eBay for $30, but optical cables cost about $100 per 10 meters. I can't afford that either. Oh well.
Thursday, October 26, 2006
Thesis Demo
Basically what my thesis did was implement a metric monitoring and reporting platform. You can use it to records metrics from different points and then the application will allow you to do whatever you want with the data. The project is actually implemented in the largest bank in the state where the data is hooked into Cognos for balanced scorecarding and other business intelligence thingies. At the bank, what used to take 2 days to perform now takes less than 5 minutes with my product.
The people who appreciated my product were mostly managers who frquently has to collect and analyze data for monitroing and reporting. When they saw what my product, they started grilling me on how it was implemented and the nuts and gritty. One guy even asked if I wanted to commercialize the product. I wasn't since the bank had sponsered me on the project and they were clear from the beginning that they would own the code. Well, the actual reason I don't want to commercialize my product is because there is nothing innovative about it. It basically leveraged existing product to create a new product. Anyone can do it in 6 months (which is how long it took me). Now, after learning from my mistakes, I can make a better one. But I'm still not keen on commercializing it. If I do and it is successful, they I know compititors will spring up on less than 6 months, since they can work on it full time (I worked on it part-time while I had a full load at uni and also working). Hell, with Cognos's resources, they can probably do it in 2 weeks.
So, yup, no commercial product from me. I wish I had asked the bank to pay me for it. I can't believe I did it for free just so I can write a good thesis. Another of life's lesson.
No more free lunch.
Wednesday, October 25, 2006
Finished My Thesis
I recommend using LaTeX to anyone planning to write a thesis. Basically, LaTeX is a typesetting application that takes care of formatting for you so you can concentrate on the content. It takes care of all formatting and creates table of content, figures, chapter, bibligraphies etc. LaTeX might seem an overkill, but when you approach more than 50 pages, you'll be thankful you chose LaTeX. Don't go with Word and EndNote like I did in the beginning.
Friday, October 20, 2006
Debian Etch
Just installed the latest Debian Etch (Debian 4.0) on my server using netinst cd. The netinst cd contains the base install and is about 150MB in size (better than downloading the full CD). Wow, is it me or is Etch much faster than Sarge.
Since netinst only has the base packages, I need to manually install everything I need, which is better because I get to keep the server lean. I need ssh, svn, apache, mysql, and perl (basically the whole LAMP thingy). Java, Tomcat, JUnit, ANT and Continuous Integration server will come later on. But I ran into a bit of a trouble. Since I installed from CD, I don't have the web mirrors set up. The fix is simple:
1.Add to /etc/apt/sources.list:
deb http://mirror.optus.net/debian/ etch main
2. apt-get update
Which threw up the following error:
# apt-get update
......
Fetched 5562B in 13s (421B/s)
Reading package lists... Done
W: There are no public key available for the following key IDs: A70DAF536070D3A1
W: You may want to run apt-get update to correct these problems
Bit of googling showed the fix is simple
3. apt-get install debian-archive-keyring
4. apt-get update
NOTE: You need to be root when you do all these.
Saturday, October 07, 2006
New House
I will move in a week later as the new house has no internet and I need it to complete my thesis and an assignment.
Need to check up on the status of DSL and Foxtel connection.