Thursday, June 18, 2009

Tomcat Event Handler - privileges

In the previous post regarding Tomcat and Event handler, there was one major problems, privileges. NRPE is a daemon that run in the background when it's launched by nagios. The event handler developed launched "kill" when tomcat did not stop gracefully. Also, when trying to execute the application, the following error happened:
sudo: sorry, you must have a tty to run sudo
The solution is having a service application running the main application. In this example, restart-tomcat-eventhandler.sh is the service which calls restart-tomcat.sh. Also, I made the applicaiton (restart-tomcat.sh) to run in the background mode. But first, below are the changes that we need for the sudoers (visudo), alter the default for requiretty and the privileges for the nagios user to:
Defaults:nagios    !requiretty
...
nagios ALL=(ALL) NOPASSWD:/opt/tomcat/bin/catalina.sh,/bin/kill,/opt/tomcat/bin/startup.sh
The service application also takes care for the logging mechanism, and verifies that only one process is running for the restart-tomcat.sh
#!/bin/sh
#
# Application that launches the restarting of tomcat.
# The application will be launch in the background but
# its logging will be set in the LOGGER
#
LOGGER=/usr/local/nagios/libexec/eventhandlers/restart-tomcat.log
EVENT_HANDLER_APP=/usr/local/nagios/libexec/eventhandlers/restart-tomcat.sh
echo "Restarting Tomcat `date`...."
count=` ps -ef | grep -c '[r]estart-tomcat.sh' `
echo "Total process running: $count"
typeset -i count
if [ $count -ge 1 ]
then
echo "Another process is running and so the script will stop `date`"
exit
fi
$EVENT_HANDLER_APP >> $LOGGER 2>&1 &

The actual event handler is the following:

#!/bin/bash

#
# tomcat-restart.sh - tomcat restart script for cron
# Need to have access to the sudo to restart the tomcat
# Also, modify the visudo

echo "---------------------`date`---------------------"
CATALINA_PATH=/opt/apache-tomcat-6.0.18
CATALINA_SCRIPT=catalina.sh

echo "CATALINA_HOME : $CATALINA_PATH"

# Verify that tomcat is not running. If it is, stop it gracefully
# get the tomcat pid
tomcat_pid=`ps -ef | grep java | grep tomcat | cut -c10-14`
echo "Tomcat PID is: $tomcat_pid"

if [ -n "$tomcat_pid" ]
then
echo "Stopping tomcat ..."
sudo $CATALINA_PATH/bin/$CATALINA_SCRIPT stop
# give tomcat 60 seconds to shutdown gracefully
sleep 60
fi

tomcat_pid=`ps -ef | grep java | grep tomcat | cut -c10-14`
# if tomcat_pid exists, kill the process
if [ -n "$tomcat_pid" ]
then
echo "Noticed that process is still running trying to kill it"
sudo kill $tomcat_pid
sleep 60
fi

tomcat_pid=`ps -ef | grep java | grep tomcat | cut -c10-14`
# if tomcat_pid still exists, really kill the process
if [ -n "$tomcat_pid" ]
then
echo "Forcefully killing the process for tomcat $tomcat_pid..."
sudo kill -n 9 $tomcat_pid
sleep 60
fi

# restart tomcat
echo "`date` Starting tomcat..."
sudo $CATALINA_PATH/bin/$CATALINA_SCRIPT start
echo "`date` Finished starting tomcat"
echo "---------------------------------------------"

Tuesday, June 16, 2009

Starting to learn Ruby on Rails

Started to learn Ruby on Rails. Bought a couple of books (I'm a book addict). I end up purchasing Simply Rails 2 by Patrick Lenz and Advance Rails Recipes by Mike Clark since I need to have a project up-and-running in a couple of weeks.

IT Project Kill Switch

Waldo Moreira pointed me to the article How to Make Profit which brings an important lesson for IT and project decisions. The article is based on a decision made by the CEO of Rakspace Managed Hosting, Graham Weston, when he passed on a $20 million deal with Morgan Stanley. His decision was based on the fact that the deal was not profitable enough. To be more precise, it was 5% less than the original 15% profit margin for Rakspace.

The article explains that many companies lack the discipline of true profit or economic value added,
...Lots of big corporations don't make a true profit. That is equally true of small businesses, which can be so desperate to close deals early on that they neglect to really look at the numbers. As a result, line managers are clueless about the cost of capital and the returns — or the lack thereof — they are generating.

Jim Collins wrote in his master piece Good to Great how leaders "Confront the Brutal Facts". The great leaders had the following patterns: all of the them gather data before making a decission, then make excellent use of it, and finally use it to confront their decisions head-on. This is what Weston end up doing. After analyzing the venture with Morgan Stanley, he noticed that Rakspace was going to make 10% profit, 5% less than the original 15% profit margin. In his new book, How The Mighty Falls, Jim Collins talks about the five steps that companies take before failing. The second step is called "Undisciplined Pursued of More",
...More scale, more growth, more acclaim, more of whatever those in power seem as success...Although complacency and resistance to change remains dangers to any successful enterprise, overreaching better capture how the mighty falls.
There has to be some type of threshold that allows senior management to take the bold step and say "no" to specific projects. Senior management need to look beyond the numbers. In the HBR essay, The Truths About IT Cost, Susan Cramm writes about what drives up IT costs. She identified seven such truths. Perhaps the most interesting is "Project Failures are too High". Being an IT director, I'm faced with different "wish list" of projects from marketing, sales, and senior management. IT should not be the one to define whether or not a project should launch. As Cramm explains,
Managing these truth is tricky. IT can't do it alone, because simply saying no to business partners harms relationships with them.
Senior Manager should provide a threshold, a magic number, like the 15% of Weston and IT should raise the flag when a project is going down the wrong path. As Cramm says,
Establish a "kill switch" rules for projects.
If a project is out of the initial budget and has been modified twice and beta deployment still not occurred, KILL IT!

Nagios and Tomcat Event Handler


The first thing that we need to make sure is understand how Nagios work. Assuming that Tomcat is in a remote server, then there is a "nagios" user, and this needs to have rights to restart tomcat (CATALINA_HOME/bin/catalina.sh stop). If you try to stop tomcat, the nagios user will get the following error:
su nagios
/usr/local/tomcat-18version/bin/catalina.sh stop
Jun 15, 2009 2:54:18 PM org.apache.catalina.startup.Catalina stopServer
SEVERE: Catalina.stop:
java.io.FileNotFoundException: /opt/apache-tomcat-6.0.18/conf/server.xml (Permission denied)

The best thing to do is to create a group "tomcat", provide privileges on CATALINA_HOME to this group, and add the user "nagios" to this group. In this case, the user download Tomcat in the following directory: /opt/apache-tomcat-6.0.18/. Use the "root" user to do the following steps:
I created a symbolic link so I don't have to change anything in case Tomcat is upgraded.
ln -s /opt/apache-tomcat-6.0.18/ /opt/tomcat
Now, if you do something like this:
ls -l /opt
tomcat -> /opt/apache-tomcat-6.0.18/
Create a group using the groupadd command and add the "nagios" user to this group:
groupadd tomcat
Add the existing nagios user to the tomcat group.
usermod -g tomcat nagios
Add privileges to the /opt/tomcat to the group "tomcat" and the original . First check the id for the user
[root@dev opt]# id nagios
uid=501(nagios) gid=503(tomcat) groups=503(tomcat)
chgrp -R tomcat apache-tomcat-6.0.18
chgrp -R tomcat tomcat

#To test that the nagios user is able to restart run the following command:
su nagios
/usr/local/tomcat-18version/bin/catalina.sh stop
Privileges also need to be provided to restart the tomcat server and killed in case the tomcat doesn't shutdown. Since only root can start certain ports (i.e. port 80), edit the sudoers file (visudo):
##add the following line below "root    ALL=(ALL)       ALL"
nagios ALL=(ALL) NOPASSWD:/opt/tomcat/bin/catalina.sh,/bin/kill
Now, add the event handler. Create a file in /user/local/nagios/libexec/eventhandler/restart-tomcat.sh
#!/bin/bash

#
# tomcat-restart.sh - tomcat restart script for cron
#
echo "`date`------------ Shutting down tomcat---------------"
CATALINA_PATH=
CATALINA_SCRIPT=catalina.sh

# Verify that tomcat is not running. If it is, stop it gracefully
# get the tomcat pid
tomcat_pid=`ps -ef | grep java | grep tomcat | cut -c10-14`
echo "Tomcat PID is: $tomcat_pid"

if [ -n "$tomcat_pid" ]
then
echo "Stopping tomcat ..."
sudo $CATALINA_PATH/bin/$CATALINA_SCRIPT stop
# give tomcat 60 seconds to shutdown gracefully
sleep 60
fi

tomcat_pid=`ps -ef | grep java | grep tomcat | cut -c10-14`
# if tomcat_pid exists, kill the process
if [ -n "$tomcat_pid" ]
then
echo "Noticed that process is still running trying to kill it"
sudo kill $tomcat_pid
sleep 60
fi

tomcat_pid=`ps -ef | grep java | grep tomcat | cut -c10-14`
# if tomcat_pid still exists, really kill the process
if [ -n "$tomcat_pid" ]
then
echo "Forcefully killing the process for tomcat $tomcat_pid..."
sudo kill -n 9 $tomcat_pid
sleep 60
fi

# restart tomcat
echo "`date` Starting tomcat..."
sudo $CATALINA_PATH/bin/$CATALINA_SCRIPT start
echo "`date` Finished starting tomcat"
Configure an application that runs the event-handler.sh (restart-tomcat-eventhandler.sh). This way when the application restart, a log that monitors everything:
#!/bin/sh

echo "Restarting Tomcat `date`" >> /usr/local/nagios/libexec/eventhandlers/restart-tomcat.log
/usr/local/nagios/libexec/eventhandlers/restart-tomcat.sh >> /usr/local/nagios/libexec/eventhandlers/restart-tomcat.log
echo "Finished `date`" >> /usr/local/nagios/libexec/eventhandlers/restart-tomcat.log
echo "-------------------------Finished `date`-----------------------------"

In the Nagios server
Create the event handler: /opt/user/local/nagios/event-handler/restart-tomcat-eventhandler.sh

#!/bin/sh
#
# Event handler script for restarting the web server on the local machine
#
# Note: This script will only restart the web server if the service is
# retried 3 times (in a "soft" state) or if the web service somehow
# manages to fall into a "hard" error state.
#


# What state is the HTTP service in?

case "$1" in
OK)
# The service just came back up, so don't do anything...
;;
WARNING)
# We don't really care about warning states, since the service is probably still running...
;;
UNKNOWN)
# We don't know what might be causing an unknown error, so don't do anything...
;;
CRITICAL)
# Aha! The HTTP service appears to have a problem - perhaps we should restart the server...

# Is this a "soft" or a "hard" state?
case "$2" in

# We're in a "soft" state, meaning that Nagios is in the middle of retrying the
# check before it turns into a "hard" state and contacts get notified...
SOFT)

# What check attempt are we on? We don't want to restart the web server on the first
# check, because it may just be a fluke!
case "$3" in

# Attempt number
3)
echo -n "Hard-> Restarting JBoss..."
echo -n "/usr/local/nagios/libexec/check_nrpe -H " $4 " -c restart_jboss"

/usr/local/nagios/libexec/check_nrpe -H $4 -c restart_jboss

;;
esac
;;

# The HTTP service somehow managed to turn into a hard error without getting fixed.
# It should have been restarted by the code above, but for some reason it didn't.
# Let's give it one last try, shall we?
# Note: Contacts have already been notified of a problem with the service at this
# point (unless you disabled notifications for this service)
HARD)
echo -n "Hard-> Restarting Tomcat..."
echo -n "/usr/local/nagios/libexec/check_nrpe -H " $4 " -c restart-tomcat"

/usr/local/nagios/libexec/check_nrpe -H $4 -c restart-tomcat


;;
esac
;;
esac
:


Finally, add these event handler as a command by editing the /usr/local/nagios/etc/nrpe.cfg:
command[restart-tomcat]=/usr/local/nagios/libexec/eventhandlers/restart-tomcat-eventhandler.sh
Test that the command is working correctly by executing the following command from the Nagios server:
Now, add the service to restart the server:

/usr/local/nagios/libexec/check_nrpe -H tomcatserver -c restart-tomcat -t 30
define service{
use generic-service
host_name midc
service_description check_midc_login_page
process_perf_data 1
check_command check_http!-H midc.up-mobile.com -u /midc/doLogin.do -w 5 -c 10
event_handler restart-tomcat
}

Now restart nagios (service nagios restart) and you should be ready.

Friday, June 12, 2009

AT&T and Verizon is making a mess out of PSMS

U.S. Telcos have tried over and over again to gain a hold of spammers using MT premium SMS (PSMS), but instead they have made a mess of imposing APIs for aggregators. Verizon has tried OIOO and now AT&T came up with its own version OPPC. Both flows tried to do the same thing, enforce the opt-in to the users so they know what are they being billed. OPPC's flow uses a "start" message that the aggregators have to send to AT&T. This message contains the parameters for AT&T to build the opt-in message (price, description, shortcode, etc). However, what if your campaign is targetting non-english speakers and you want to control the opt-in message? What happens if you are using a trivia or a chat? Should the users receive an opt-in for every question or every chat that they send? Eventually, AT&T came up with a "except-tag". This is the exception to the OPPC rule. I think that eventually, everyone would like to have an "except tag". How about the spammer, wouldn't they eventually use this except tag? This is just frustrating. I understand the use of these API, but why do we need to impose different API for different operators? Can we all just get a standard? Perhaps we need to think about doing the billing via MO like Europe and Latin America is doing.

Tuesday, June 9, 2009

Boxing Business Model - HBR did it!

They said that Latinos have three sports: football (soccer), baseball, and boxing. I love boxing! Being a Nicaraguan and married to a Mexican, I watch boxing and follow most of the light weight fighters. At the moment, my favorite fighter is Manny Pacquiao. I can't get enough of this guy! He is awesome! You can learn so much from boxing. I wondered if anyone made a business model out of this sport. Finally, I found out that Donald Sull made one, Thrive in Turbulent Markets. In here he shows the memorable fight of Mohammed Ali vs. George Foreman "Rumble in the Jungle". Not my favorite fight, but definitely in my top 5. Here he explain how to be Agile like Mohammed Ali to quickly spot and exploit market changes. Also explains how to use an absorption model like Foreman to weather unexpected threats.

mBlox needs to get with it - give up your IM

I have been working with mBlox for quiet some time. We had our rough times, but as of now I've been happy. The only thing that I don't like is that I don't have anyones IM. I once asked my sells rep if I could just have his MSN, Yahoo!, or Skype and he just did not answered. I can always call my sales rep at his office or cell phone, but sometimes I just need to know from someone technical if everything is ok.

The other day I had to look for another SMSC aggregator since mBlox does not have Cricket Wireless. I end-up doing business with Motricity. The first thing that they provided was IM for their technical team and business reps. I'm thinking this is just nothing but customer service. Why would mBlox do this? Something so simple, should be provided. It has been a while since I have a problem with mBlox, but when I do I have to open a ticket. When I have a problem with Motricity, I just IM one of the technical guys or my sales rep. Yes, it's all about customer service.

PSMS beware of Twitter


I believe that a disruptive technology for the PSMS might be Twitter. It's free!! In the book, What Would Google Do? they explain how free is a business model.
Free is impossible to compete against. The most efficient marketplace is a free marketplace. Money gets in the way. It costs money to market and to acquire customers so you can sell things to them.
This is contrary to the concept of PSMS, specially for subscription. Why would anyone subscribe to a "joke" subscription where you get charged $6.99 monthly, when anyone can follow George Lopez, Dane Cook, Dave Chapelle, or even your funniest friend on Twitter for a standard rate (zero, zip, nada)? I encouraged my company to move away from subscriptions, and to start thinking on this business model, even if we currently have a "cash cow".

Another reason of considering Twitter as a disruptive technology is its simplicity. It's so simple to tweet. Even TV shows as Meet The Press, whose average viewers are not your average techie, can be follow on Twitter. Christensen's The Innovator's Dilemma explains that,
Two additional important characteristics of disruptive technologies consistently affects product life cycles and competitive dynamics: First, the attributes that make disruptive products worthless in mainstream markets typically become their strongest selling points in emerging markets; and second, disruptive technologies products tend to be simpler, cheaper, and more reliable and convenient than establish products.

It is obvious that this disruptive technology is coming - and it's coming down hard. I personally thing is going to be a good thing. The users will be getting better service, and will be more in control. We are looking forward to Twitter specially for Latin America.