Tuesday, November 28, 2006

Snippet://java/internet connectivity

The following code checks if internet can be accessed. Note, this is not a ping. The Java Socket class isn't capable of such low level function. However, since JDK5, Java java.net.InetAddress.isReachable(int) can be used to check if a server is reachable or not.

isReachable() will use ICMP ECHO REQUESTs if the privilege can be obtained, otherwise it will try to establish a TCP connection on port 7 (Echo) of the destination host. But most Internet sites have disabled the service or blocked the requests (except some university such as web.mit.edu).



public static boolean checkinternet(String url) {
try {
InetAddress address = InetAddress.getByName(url);
System.out.println("Name: " + address.getHostName());
System.out.println("Addr: " + address.getHostAddress());
System.out.println("Reach: " + address.isReachable(1000));
} catch (UnknownHostException e) {
System.err.println("Unable to lookup " + url);
} catch (IOException e) {
System.err.println("Unable to reach " + url);
}
}

Monday, November 27, 2006

Snippet://Java/InputStream

Java code snippet to read InputStream using buffered reader.

NOTE: StringBuffer won't insert extra \n, so the returned string will be exactly as the InputStream. Also, the unusual for statement in the snippet below is 10 times faster than the traditional while statement.

private static String slurp(InputStream in) throws IOException {
StringBuffer out = new StringBuffer();
byte[] b = new byte[4096];
for (int n; (n = in.read(b)) != -1;) {
out.append(new String(b, 0, n));
}
return out.toString();
}

java.lang.NoClassDefFoundError: org/apache/commons/

I'm working on a project where I needed to make HTTP requests. Instead of reinventing the wheel, I decided to use Apache-Commons-Httpclient library. Upon compiling, the code blew up in my face. The debugger says:

java.lang.NoClassDefFoundError: org/apache/commons/logging/LogFactory

Hmm... so Apache-Commons-Httpclient is dependent upon Apache-Commons-Logging library. After I download and add the required library, I compile the code to have it blow up in my face again. This time the debugger says:

java.lang.NoClassDefFoundError: org/apache/commons/codec/DecoderException

Good god, another dependency. This time Apache-Commons-Codec. That's what I hate about external libraries. None of them is self contained. I need the code small enough to fit in embedded devices like a cell phone. Good thing Apache-Commons library is opensource, so the source code is available. I need to strip it down. So much for reinventing the wheel.

Friday, November 17, 2006

High Avalibility - 5 Nines - 99.999%

Recently machinehead asked me how to build high availibility. For the less technically inclined, high availibility is a measure of the reliability of a system and sometimes indicated by "Five nines" or 99.999% reliability. Basically what it means is the system has a total downtime of no longer than five minutes per year.

Having done a course on distributed systems, I decided to take a crack at it. There are three main issues that needs to be addressed:

  • Hardware

  • Software

  • Data

Hardware
Everyone thinks high availibility means only hardware. Its not. Its only the beginning. Firstly, you need at least two of everything located geographically apart. For example, most people believe RAID is insurance against data loss. But RAID only protects against one or two drive failure. What happens if the PSU shorts and pump 240V instead of 5V. That will fry every disk in the array. Plus, rebuilding disk takes time which isn't high availibility. It should also be geographically located apart to protect against fire, earthquake, tsunami, plane etc.

Secondly, you need smooth, automatic switchover incase of crash. For example, using heartbeat to monitor servers and changing the IP at DNS server to the failover server. This makes it smooth and you don't need to make any other server aware of the server failure.

Software
Software should be built from ground up with high availibility in mind. Meaning they should be scalable and clusterable. The best way to do this is to make them stateless. For example, when you click on "2" to goto the second result page on Google, the second page doesn't necessarily have to be processed by the same server that did the initial first page. It can be done by any server. This is the power of stateless.

Data
Data is the hardest problem to solve. If you fragment and replicate the data for performace and scalability, you need to address sync issues. How would you lock and commit multiple partitions? How would you detect deadlocks? 2 phrase lock and 2 phrase commit is not an easy answer. eBay takes down the site on Monday 12-4AM every week to archive sold items so as to keep the fragments small (smaller fragments means faster searching by the database). Yes, this 4 hours is "planned" downtime and no, planned downtime does count towards high availibility.

Tuesday, November 14, 2006

Sun Just Killed Java

Yup, its official, Sun open-sourced Java by changing its license to GPLv2. The announcement is at http://www.sun.com/software/opensource/java/.

This is a good news for the open-source community. Especially GNU Linux which can now include all Java software with the platform like JBoss, IBM WebSphere, BEA Weblogic because all of them are nor GPL. Hehe.

Bad news is for all commercial firms who's propreity Java applications are now GPL. The GPL requires that any code combined with GPL code must be distributed under the same license. Developers must provide their contributions back to the community. This provision provides a mechanism to ensure that Java continues as a unifying platform for innovation. Most probably this will give rise to .NET technology as all commercial firms move to .NET.

P.S. This does not affect Java applications developed in-house for use in-house ONLY. It mostly effects commercial development of Java.

Tuesday, November 07, 2006

Format External SATA Drive

I just bought a Seagate SATA 250GB HD and an external SATA enclosure on eBay. Windows 2003 recognized the enclosure in Device Manager but the HD did not show up in My Computer. Reason being the drive was uninitialized and unformatted (since its new).

So how do you format a drive that does not show up on My Computer. Normally, a new HD connected directly to IDE shows up as unallocated drive in My Computer which you can then partition and format. However, what do you do when its connected to USB?

After half hour of playing around, the answer was simple enough:

1. Start -> Administrative Tools -> Computer Management -> Disk Management.
2. The HD will show up as Disk1 (unallocated)
3. Right click on Disk1 and select Initialize.
4. A Wizard will walk you through initializing, partitioning and formatting.

I partitioned it into 4 primary partition (maximum allowed by NTFS).

Monday, November 06, 2006

Debian Hard Disk Spindown

This article Debian hdd spindown - Kurobox Central has an interesting note on how to spin down the hard drive in Debian. When will Linux be as easy to use as Windows?

Install hdparam: apt-get install hdparm

Hard drive info: hdparm -i /dev/hda

Hard drive performance: hdparm -tT /dev/hda

Here's my 7 years old Toshiba Satellite2755 laptop Debian Server 6GB hard drive performance:

Timing cached reads: 62 MB in 2.01 seconds = 30.86 MB/sec
Timing buffered disk reads: 40 MB in 3.07 seconds = 13.01 MB/sec

Set spin down to 35 seconds: hdparm -S7 /dev/hda

To turn on spin down at 35 seconds by default, edit /etc/hdparm.conf:

/dev/hda {
mult_sect_io = 16
write_cache = off
dma = on
spindown_time = 120
}

Run to add hdparm setting to run control 2 (with shutdown running at reboot, shutdown, and single-user mode):

update-rc.d -f hdparm remove
update-rc.d hdparm start 19 2 . stop 19 0 1 6 .

Add to /etc/init.d/sysklogd so SYSKLOGD doesn't log MARK to /var/log/messages every 20 minutes and force the hard drive to turn on

SYSLOGD="-m 0"

Sunday, November 05, 2006

Installing .rpm in Debian

Installing a RedHat .rpm in Debian means converting to .deb and then installing it. And here's how to do it:

Convert .rpm to .deb: alien -k package.rpm

NOTE: The -k option preserves the minor version

Install .deb: dpkg -i packagename.deb

Uninstall .deb: dpkg -r packagename

Wednesday, November 01, 2006

How to Install WordPress On Debian Etch

Complete the following the steps:
  1. # apt-get install mysql
  2. # apt-get install apache2
  3. # apt-get install php5
  4. # apt-get install wordpress
  5. # mysql -u root
  6. mysql> CREATE DATABASE wordpress
  7. mysql> GRANT ALL PRIVILEGES ON *.* TO wordpress@localhost
    IDENTIFIED BY ‘wordpresspassword’ WITH GRANT OPTION;
  8. mysql> FLUSH PRIVILEGES;
  9. #mv wp-config-sample.php wp-config.php
  10. Edit wp-config.php to enter the correct setting:
    // ** MySQL settings ** //
    define('DB_NAME', 'wordpress');
    define('DB_USER', 'wordpress');
    define('DB_PASSWORD', 'wordpresspassword');
    define('DB_HOST', 'localhost');
  11. Add to /etc/apache2/sites-available/default:
    Alias /blog "/usr/share/wordpress/"
    <Directory "/usr/share/wordpress/">
    Options FollowSymLinks
    AllowOverride Limit Options FileInfo
    DirectoryIndex index.php
    </Directory>
  12. #/etc/init.d/apache2 restart
  13. Goto: http://localhost/blog/wp-admin/install.php
  14. Follow the instruction to create a login
  15. Use generated login to log in and start blogging.

Sun Blackbox

Sun just released Sun Blackbox, a new readymade data-center in a shipping container complete with 25 racks, 1.5PB storage, networking, cooling and shock absorbers. The idea is they ship you the whole container and you just power it up. A la data-center in a box.

How cool is that? Do you see whats happening here. Sun Microsystem is changing their whole business model. I remember a few years ago, Cemex, a cement manufacturer based in Mexico changed their business model from '$ per weight' to 'just-in-time' model. Cement manufacturing is a highly standardized business and the margins are too thin. So Cemex came up with a new idea. See, the problem is cement starts setting from the moment it leaves the factory, so you can't have it lying around. Also, sometimes the site is not ready or labours lying around too long waiting for the cement to arrive. Cemex identified these issues and told their customers that from now on, they will only pay for the right amount of cement at right time i.e., time is the business model. Customers phones the call-center to say when and how much cement they want and their ERP system optimizes and routes trucks in the field (fitted with GPS monitoring and comminicator) to the customer. Benefit for customer is they get right amount of cement at right time (usually within plus minus an hour). Benefit for Cemex is they bacame a multibillion dollar company and the 3rd largest cement manufacturer in the world. All by changing their business model to 'just-in-time'.

And this is what Sun Microsystem is doing. Shortening the time and effort for customers to set up a data-center. Sure you won't get the Blackbox within an hour of ordering. But even then, it will be there in a few days all set up and ready to go. Compare this with time taken to plan and build a data-center, from evaluating vendors to purchasing, building and testing infrastructure, etc.