Monday, November 16, 2009

Scalability and Performance - learning from the best (Amazon and eBay)

I always go back to these presentations at least once a year to review performance and scalability.
Amazon and eBay are architectures that makes any developer salivate or wonder, "how they do it?". The Werner Vogels and Randy Shoup are (at least to this humble developer) awesome architects which we can learn MUCH about what they do best; architect high available, high performance, and highly available architectures. Werner Vogels discusses the CAP theorem very well, and how it's a KILLER to current applications. Shoup discusses the forces that take place when you develop these type of architectures.

Thursday, November 12, 2009

GMail and log4j e-mail appender - error STARTTLS

I couldn't get log4j working along with Google. I kept on getting the following error:

com.sun.mail.smtp.SMTPSendFailedException: 530 5.7.0 Must issue a STARTTLS


Later I found out that the best way to get it working was creating my own SMTP appender. What I wanted, was to have a flexible application that I could configure once with a gmail username (marcelo@gmail) and then have the users/clients configure the rest of the log4j appender using the log4j property file.

Below is how I created the application. First you will need your parameters for your gmail account including the starttls. I place all these in a Enum class.

public enum EmailEnum {
HOST(
"smtp.googlemail.com"), PORT("587"),
USERNAME(
"marcelo@gmail.com"),PASSWORD("secret"),
AUTHORIZED(
"mail.smtp.auth"), STARTTTS("mail.smtp.starttls.enable"),
TRUE(
"true");
private String value;
private EmailEnum(String value) {
this.value = value;
}

public String getValue() {
return value;
}
}


Then, I created the class that actually creates the mail appender. The main part is the appender method. As you can see, I am using the SMTPAppender class and I override the appender method. As you can tell, I'm using the Spring SimpleMail class to send the message.


package com.upmobile.midccore.commons.logger;

import java.util.Properties;

import org.apache.log4j.net.SMTPAppender;
import org.apache.log4j.spi.LoggingEvent;
import org.springframework.mail.MailException;
import org.springframework.mail.SimpleMailMessage;
import org.springframework.mail.javamail.JavaMailSenderImpl;


/**
* Application builds the SMPT Appender for the Google Mail (gmail).
*
@author marceloolivas
*
*/
public class GMailAppender extends SMTPAppender {
private JavaMailSenderImpl javaMail;
private static final String NL = System.getProperty("line.separator");

public GMailAppender() {
javaMail
= new JavaMailSenderImpl();
javaMail.setHost(EmailEnum.HOST.getValue());
javaMail.setPort(Integer.parseInt(EmailEnum.PORT.getValue()));
javaMail.setUsername(EmailEnum.USERNAME.getValue());
javaMail.setPassword(EmailEnum.PASSWORD.getValue());
Properties props
= new Properties();
props.setProperty(EmailEnum.AUTHORIZED.getValue(), EmailEnum.TRUE.getValue());
props.setProperty(EmailEnum.STARTTTS.getValue(), EmailEnum.TRUE.getValue());
javaMail.setJavaMailProperties(props);

}

@Override
public void append(LoggingEvent event) {
super.append(event);
// Create a thread safe "copy" of the template message and customize it
SimpleMailMessage msg = new SimpleMailMessage();
StringBuilder builder
= new StringBuilder();
builder.append(getLayout().format(event));
builder.append(event.getMessage().toString());
if (event.getThrowableInformation() != null) {
builder.append(NL);
String[] stackTrace
= event.getThrowableInformation().getThrowableStrRep();
for(int i = 0; i < stackTrace.length; i++) {
builder.append(stackTrace[i]
+ NL);
}
msg.setText(builder.toString());
}
String[] senders
= getTo().trim().replace(" ", "").split(",");
msg.setTo(senders);
msg.setSubject(
this.getSubject());
try{
javaMail.send(msg);
}
catch (MailException ex){
}

}
}


The actual log4j properties file is the following:

# Global logging configuration
log4j.rootLogger
=ERROR, stdout, EMAIL
# SqlMap logging configuration...
log4j.logger.com.ibatis
=DEBUG
log4j.logger.com.ibatis.common.jdbc.SimpleDataSource
=DEBUG
log4j.logger.com.ibatis.sqlmap.engine.cache.CacheModel
=DEBUG
log4j.logger.com.ibatis.sqlmap
=DEBUG
log4j.logger.com.ibatis.sqlmap.engine.builder.xml.SqlMapParser
=DEBUG
log4j.logger.com.ibatis.common.util.StopWatch
=DEBUG
log4j.logger.java.sql.Connection
=DEBUG
log4j.logger.java.sql.Statement
=DEBUG
log4j.logger.java.sql.PreparedStatement
=DEBUG
log4j.logger.java.sql.ResultSet
=DEBUG

log4j.logger.org.springframework
=INFO
log4j.logger.com.upmobile
=DEBUG
# Console output...
log4j.appender.stdout
=org.apache.log4j.ConsoleAppender
log4j.appender.stdout.layout
=org.apache.log4j.PatternLayout
log4j.appender.stdout.layout.ConversionPattern
=%-4p [%d{ISO8601}][%t] (%c.%M()line %-4L ) %m%n

# e
-mail appender
log4j.appender.EMAIL
=com.upmobile.midccore.commons.logger.GMailAppender
log4j.appender.EMAIL.Threshold
=ERROR
log4j.appender.EMAIL.Subject
=Error with the application
log4j.appender.EMAIL.to
=abc@acme.com,xyz@acme.commons
log4j.appender.EMAIL.SMTPDebug
=true
log4j.appender.EMAIL.layout
=org.apache.log4j.PatternLayout
log4j.appender.EMAIL.layout.ConversionPattern
=[%d] [%t] %-5p %c %x - %m%n
log4j.appender.EMAIL.BufferSize
=1

Monday, November 9, 2009

Motivating programmers

Dan Pink had an extraordinary presentation in TED regarding the disconnect on traditional reward system. We have learned over and over about the power of incentives and how they should improved motivation. However, science has proven that for complex problem this does not work. Instead, it could dull thinking and it slows creativity. It was impressive that for the past forty years what science has proven the business has not noticed.

Contingent motivator (cause and effect) it's a good approach for 20th century tasks for simple set of rules (narrow our focus), but for most jobs/problems (specially programmers/software engineers) extrinsic motivators are better because we need to see the entire context (work with our left and right brain). He explained it extremely well using the "candle problem".

I can watch this presentation over and over, and always find something useful. Highly recommend it.

Eliminating obsolete objects reference with profiling tools

I'm getting ready for my Miami Java User Group (MJUG) talk on December with buddy Antonio Llanos. We thought about doing a "performance and scalability" talk. I wanted to focused on the tools (specially open-source) that a programmer should have to help them analyze their applications. I also wanted for the attendants to come out of the talk knowing exactly what they should do the next day.

I also wanted to come up with snippet of code that would not be so easy to detect to show the importance of these tools. It was then when I stumbled with a piece of code from Effective Java 2nd Edition, "Item 6: Eliminate obsolete objects references". The code is a simple stack class (LIFO algorithm), which has a memory leak.

package com.midc.spikes.effectivejava.obsoleteobject;

//Can you spot the "memory leak"?

import java.util.*;

public class Stack {
private Object[] elements;
private int size = 0;
private static final int DEFAULT_INITIAL_CAPACITY = 16;

public Stack() {
elements
= new Object[DEFAULT_INITIAL_CAPACITY];
}

public void push(Object e) {
ensureCapacity();
elements[size
++] = e;
}


public Object pop() {
if (size == 0)
throw new EmptyStackException();
return elements[--size];
}

/**
*
* Ensure space for at least one more element, roughly
* doubling the capacity each time the array needs to grow.
*/

private void ensureCapacity() {
if (elements.length == size)
elements
= Arrays.copyOf(elements, 2 * size + 1);
}

public int getSize() {
return size;
}
}


I couldn't find the memory leak. According to the book,
If a stack grows and then shrinks, the objects that were popped off the stack will not be garbage collected, even if the program using the stack has no more references to them.

To fix the problem, we just need to null the reference of the object that is popped.

public Object pop() {
if (size == 0)
throw new EmptyStackException();
Object result
= elements[--size];
elements[size]
= null;
return result;
}

The book also explains, that these bugs are hard to detect, and that most the best way to find them is through "careful code inspection or with the aid of a debugging tool known as a heap profiler". As always, it is very desirable to find these type of problems ahead of time to prevent them from happening. In other words, exactly what I needed for my talk.

Tuesday, November 3, 2009

Application Performance Management

I have been working on a topic for the Miami Java User Group. Lately, I have been doing a great deal of performance and scalability for my projects. Thanks to my friend Edson Cimionatto, he let me borrow the Apress' Pro Java EE 5 by Steven Haines. It is a nice book and I have been reading part of the chapters, so today I reviewed some of my notes. Looking back there are four things that really cought my attention:
  1. Have SLAs
  2. Performance applies the rule of "better to do it from the start" or "earlier is cheaper"
  3. Have a set of tools to diagnose the application (memory, code profiling, and code coverage)
  4. There is a "culture" aspect on tackling performance. Not only you need to do tests, and have a decent architecture, but you have to be willing to be relentless on keep the performance metrics (SLAs)

Wednesday, October 28, 2009

Technical debt

InfoQ has a great article about technical debt. Most of all, I like one of the definition of Uncle Bob,
A mess is not a technical debt. A mess is just a mess. Technical debt decisions are made based on real project constraints. They are risky, but they can be beneficial. The decision to make a mess is never rational, is always based on laziness and unprofessionalism, and has no chance of paying of in the future. A mess is always a loss.
I have always known that some technical debts are unavoidable, but had a hard time deciding who should be the person that dictates what should be technical debt. The answer, the team. The team should be the one that decides (as long as we can provide all the parameters and waive all the risks and benefits) what should be included in the technical debt. I agreed with Martin Fowler,
The key lies in making sure that a team is not introducing reckless debts which contribute to the mess and are very difficult, if not impossible to deal with.

Thursday, September 24, 2009

Metaphors, valuable assets in conversations.

The company is in negotiations on selling one of my source code to one of our competitors. I was asked by the CEO to talk to the CFO regarding the risks and to come up with a price. The conversation was going nowhere. The CFO was totally lost regarding the source code. It was obvious that he did not understand the context of the conversation. Finally, I told him to think as the source code as a recipe. "Think of us as Coca-Cola (the company) and we have the recipe, the "secret formula", and now some other competitor like Pepsi wants to purchase it."

After that, we were both engaged on the conversations. The CFO was asking questions such as:
  • if we give them the recipe, they (competitors) can make as many cokes as they please.
  • how long did it take your team to create the recipe (in terms of hours, and most of all money).
All these questions contributed to a really good conversation. Afterwards, the CFO and I talked to the CEO and discussed our assessment.

In the book, Pragmatic Thinking and Learning, Andy Hunt explains methaphors and software as the following:
The idea is that any software system should be able to be guided by an appropriate metaphor

I couldn't agree more with him.

Tuesday, September 22, 2009

Hudson CI

I love the whole concept of Continuous Integration (CI). My first interaction with CI was using Cruise Control along with a team of Thought Works (great guys!). However, I noticed that the configuration is all XML and it could be cumbersome. In my previous project, only a few individuals really know how to configure it.

I have been slacking on using CI (not proud of it) for two reasons:
1. I only have two programmers and they are here in Miami. Most of the time they pair programed together.
2. I didn't want to bring the pain of XML into the company.

However, I just acquired a new programmer and he is located in Mexico. I have also noticed broken builds due to items not being checked, some broken unit tests, etc. Perfect candidates for CI. Last week, I decided that instead of calling the programmers and asked them to solve the problem, I would install a CI server but not Cruise Control. I have heard a lot of great things from Hudson. Today I just finished the installation. Really nice and easy. Now, I need to finished the configuration with the deployment, SVN, and other few items. But, I'm really exited to be back into the CI world. My team should be more effective and hopefully we can increase the amount of deployments. It was wrong for me of not using CI from the beginning. Even if we would have implemented Cruise Control we would be better than today.

Monday, September 21, 2009

mysqldump from external db to local db

Sometimes I need to transfer data from my remote database (product/test) to my local box. I usually use dbUnit to avoid the transfer. However, sometimes I need to use specific data to solve either a specific bug. If you are using MySQL, then most likely you are using mysqldump. It's a really good tool. Here is how I transfer data from my test environment (marcelo.prod) to my local box without affecting (delte, truncate, or doing anything to my current schema):

mysqldump -h marcelo.prod -uprod_username -pprod_password --skip-create-options --no-create-info --no-create-db --compact --skip-add-drop-database --insert-ignore --where="OPERATOR_SERVICE = 'MBLOX_US'"  upmobile OPERATOR_TBL | mysql -uroot -psecret upmobile

I also use this code to transfer data from my production database to my historic database.

Sunday, August 2, 2009

Twitter Business Model?

I have been reading a couple of articles about Twitter, simple because I believe that it is a disruptive technology to the SMS.

Two articles that really got my attention:
Both articles where really impressive and very insightful, but by the end of the day I came up with one question, how in the world are they paying for all those BILLION messages? How come companies such as Facebook or Twitter who have over 400 - 500 million users aren't profitable? Everyone is saying that they are waiting for an IPO, but I'm not sure if that's going to happen any time soon. I mean, I doubt that anyone will be OK by getting an ad on their phone.

Time's provided a great example on how Twitter is changing the way we established conversations:
Injecting Twitter into that conversation fundamentally changed the rules of engagement. It added a second layer of discussion and brought a wider audience into what would have been a private exchange. And it gave the event an afterlife on the Web. Yes, it was built entirely out of 140-character messages, but the sum total of those tweets added up to something truly substantive, like a suspension bridge made of pebbles.

Understood, but wait...someone is paying for all these standard rate messages. You see, I'm in the business of monetizing from these type of messages. I provide what people called Premium Short Messages (PSMS). Indeed, the market that controls all that downloadable content such as ringtones, wallpaper, subscriptions, etc has paid my bill. I don't hate Twitter, is a matter of fact I think is the future, just like I think the "Free" business model that Chris Anderson talked in his book is the 21st century model. However, it seems crazy to me to have such a financial hemorrhaging (the cost of short-codes, and standard rate messages is quiet high).

The Economist's article caught my attention on the amount of money that the founders have gained,
... a hacker recently leaked documents after gaining access to the private e-mail accounts of a Twitter employee and the wife of one of its founders, the blogosphere was abuzz. The haul included a spreadsheet showing revenues reaching $140m by the end of 2010, up from $4.4m this year.
Later it mentioned about the most-likely case scenario for Twitter:
Embedding advertisements in “tweets”, short text messages that can be up to 140 characters long, is unlikely to appeal to users. A better bet would be for the firm to charge corporate users for premium services. For example, it could pocket a fee from businesses for verifying their Twitter accounts, so that users following their postings would know the firms’ tweets are genuine. It could also develop a statistical toolkit that measures the effectiveness of tweets in generating sales.

Sunday, July 19, 2009

Unix Admin Mantra - "Only the paranoid survives"

I am not a Unix admin. I had to pick it up due to the fact that I'm working on a small start-up company. I quickly learned that the mantra of Andrew Grove from Intell, "Only the paranoid survives" best fit for this type of job. System admins need to be ahead of the curve. The other day, I went to restart the only windowns server that I have and noticed an error in one of my Unix servers. The message was that I had a bad memory chip. How can I check if everything is OK, specially since my server are in a data center? Then I found out from Linux Journal that I can use SNMP and Nagios to get this type of monitoring. I will be playing with it along as with Ruby for the next couple of weeks. I hope to get status of memory modules, fans, and power supplies in each of my servers.
SNMP (Simple Network Management Protocol) is a network protocol designed for monitoring network-attached devices. It uses OIDs (Object IDentifiers) for defining the information,
known as MIBs (Management Information Base), that can be monitored. The design is extensible, so vendors can define their own items to be monitored

Teams - different breeds

I am a big believer of teams. When I completed my MBA, it was a huge shock for me. Most of my work was in teams. Being a Computer Science graduate, I always wanted to do the work either alone or with other individuals that were developers or knowledgeable in the computer field. During my MBA at the University of Miami, I learned much about teams. One thing that stuck with me is that cohesive teams become more united during great circumstances. Think about Hurricane Andrew, Hurricane Katrina, or 9-11. During these times, people came together for one particular cause, their communities. The same thing happens when the team itself feels threaten.

I recently moved, and noticed that the drier of my new home was not working. Yesterday I called my landlord to explain to him about the problem. His voice started to break, and told me that his 17 year-old son drowned the day before and that he was going to take care of the drier issue later. Words cannot explain how I felt. I have a seven-year-old and he and my wife are my life. I cannot imagine what I would do if something similar happened to me. I told my landlord that he shouldn't worry about it, and that I would fix it and for him to focus on his family.

A day later, three gentleman came to the house and fixed the problem with my drier. They said that they came in behalf of my landlord. I found out that my landlord is the chief of the Miami Fire Department and they are proud to help their chief. One of the gentlemen told me,
Fire Departments are a different animal. We take care of our people.
I was deeply moved. It also made me think, would my team do the same thing for me? Would my friends? My wife and I picked up a sympathy card to send it to our landlord. I can't imagine what is he going through, but I would take this opportunity to remind me how vulnerable we are. Indeed, the hardest times reveal the true friendships.

Thursday, June 18, 2009

Tomcat Event Handler - privileges

In the previous post regarding Tomcat and Event handler, there was one major problems, privileges. NRPE is a daemon that run in the background when it's launched by nagios. The event handler developed launched "kill" when tomcat did not stop gracefully. Also, when trying to execute the application, the following error happened:
sudo: sorry, you must have a tty to run sudo
The solution is having a service application running the main application. In this example, restart-tomcat-eventhandler.sh is the service which calls restart-tomcat.sh. Also, I made the applicaiton (restart-tomcat.sh) to run in the background mode. But first, below are the changes that we need for the sudoers (visudo), alter the default for requiretty and the privileges for the nagios user to:
Defaults:nagios    !requiretty
...
nagios ALL=(ALL) NOPASSWD:/opt/tomcat/bin/catalina.sh,/bin/kill,/opt/tomcat/bin/startup.sh
The service application also takes care for the logging mechanism, and verifies that only one process is running for the restart-tomcat.sh
#!/bin/sh
#
# Application that launches the restarting of tomcat.
# The application will be launch in the background but
# its logging will be set in the LOGGER
#
LOGGER=/usr/local/nagios/libexec/eventhandlers/restart-tomcat.log
EVENT_HANDLER_APP=/usr/local/nagios/libexec/eventhandlers/restart-tomcat.sh
echo "Restarting Tomcat `date`...."
count=` ps -ef | grep -c '[r]estart-tomcat.sh' `
echo "Total process running: $count"
typeset -i count
if [ $count -ge 1 ]
then
echo "Another process is running and so the script will stop `date`"
exit
fi
$EVENT_HANDLER_APP >> $LOGGER 2>&1 &

The actual event handler is the following:

#!/bin/bash

#
# tomcat-restart.sh - tomcat restart script for cron
# Need to have access to the sudo to restart the tomcat
# Also, modify the visudo

echo "---------------------`date`---------------------"
CATALINA_PATH=/opt/apache-tomcat-6.0.18
CATALINA_SCRIPT=catalina.sh

echo "CATALINA_HOME : $CATALINA_PATH"

# Verify that tomcat is not running. If it is, stop it gracefully
# get the tomcat pid
tomcat_pid=`ps -ef | grep java | grep tomcat | cut -c10-14`
echo "Tomcat PID is: $tomcat_pid"

if [ -n "$tomcat_pid" ]
then
echo "Stopping tomcat ..."
sudo $CATALINA_PATH/bin/$CATALINA_SCRIPT stop
# give tomcat 60 seconds to shutdown gracefully
sleep 60
fi

tomcat_pid=`ps -ef | grep java | grep tomcat | cut -c10-14`
# if tomcat_pid exists, kill the process
if [ -n "$tomcat_pid" ]
then
echo "Noticed that process is still running trying to kill it"
sudo kill $tomcat_pid
sleep 60
fi

tomcat_pid=`ps -ef | grep java | grep tomcat | cut -c10-14`
# if tomcat_pid still exists, really kill the process
if [ -n "$tomcat_pid" ]
then
echo "Forcefully killing the process for tomcat $tomcat_pid..."
sudo kill -n 9 $tomcat_pid
sleep 60
fi

# restart tomcat
echo "`date` Starting tomcat..."
sudo $CATALINA_PATH/bin/$CATALINA_SCRIPT start
echo "`date` Finished starting tomcat"
echo "---------------------------------------------"

Tuesday, June 16, 2009

Starting to learn Ruby on Rails

Started to learn Ruby on Rails. Bought a couple of books (I'm a book addict). I end up purchasing Simply Rails 2 by Patrick Lenz and Advance Rails Recipes by Mike Clark since I need to have a project up-and-running in a couple of weeks.

IT Project Kill Switch

Waldo Moreira pointed me to the article How to Make Profit which brings an important lesson for IT and project decisions. The article is based on a decision made by the CEO of Rakspace Managed Hosting, Graham Weston, when he passed on a $20 million deal with Morgan Stanley. His decision was based on the fact that the deal was not profitable enough. To be more precise, it was 5% less than the original 15% profit margin for Rakspace.

The article explains that many companies lack the discipline of true profit or economic value added,
...Lots of big corporations don't make a true profit. That is equally true of small businesses, which can be so desperate to close deals early on that they neglect to really look at the numbers. As a result, line managers are clueless about the cost of capital and the returns — or the lack thereof — they are generating.

Jim Collins wrote in his master piece Good to Great how leaders "Confront the Brutal Facts". The great leaders had the following patterns: all of the them gather data before making a decission, then make excellent use of it, and finally use it to confront their decisions head-on. This is what Weston end up doing. After analyzing the venture with Morgan Stanley, he noticed that Rakspace was going to make 10% profit, 5% less than the original 15% profit margin. In his new book, How The Mighty Falls, Jim Collins talks about the five steps that companies take before failing. The second step is called "Undisciplined Pursued of More",
...More scale, more growth, more acclaim, more of whatever those in power seem as success...Although complacency and resistance to change remains dangers to any successful enterprise, overreaching better capture how the mighty falls.
There has to be some type of threshold that allows senior management to take the bold step and say "no" to specific projects. Senior management need to look beyond the numbers. In the HBR essay, The Truths About IT Cost, Susan Cramm writes about what drives up IT costs. She identified seven such truths. Perhaps the most interesting is "Project Failures are too High". Being an IT director, I'm faced with different "wish list" of projects from marketing, sales, and senior management. IT should not be the one to define whether or not a project should launch. As Cramm explains,
Managing these truth is tricky. IT can't do it alone, because simply saying no to business partners harms relationships with them.
Senior Manager should provide a threshold, a magic number, like the 15% of Weston and IT should raise the flag when a project is going down the wrong path. As Cramm says,
Establish a "kill switch" rules for projects.
If a project is out of the initial budget and has been modified twice and beta deployment still not occurred, KILL IT!

Nagios and Tomcat Event Handler


The first thing that we need to make sure is understand how Nagios work. Assuming that Tomcat is in a remote server, then there is a "nagios" user, and this needs to have rights to restart tomcat (CATALINA_HOME/bin/catalina.sh stop). If you try to stop tomcat, the nagios user will get the following error:
su nagios
/usr/local/tomcat-18version/bin/catalina.sh stop
Jun 15, 2009 2:54:18 PM org.apache.catalina.startup.Catalina stopServer
SEVERE: Catalina.stop:
java.io.FileNotFoundException: /opt/apache-tomcat-6.0.18/conf/server.xml (Permission denied)

The best thing to do is to create a group "tomcat", provide privileges on CATALINA_HOME to this group, and add the user "nagios" to this group. In this case, the user download Tomcat in the following directory: /opt/apache-tomcat-6.0.18/. Use the "root" user to do the following steps:
I created a symbolic link so I don't have to change anything in case Tomcat is upgraded.
ln -s /opt/apache-tomcat-6.0.18/ /opt/tomcat
Now, if you do something like this:
ls -l /opt
tomcat -> /opt/apache-tomcat-6.0.18/
Create a group using the groupadd command and add the "nagios" user to this group:
groupadd tomcat
Add the existing nagios user to the tomcat group.
usermod -g tomcat nagios
Add privileges to the /opt/tomcat to the group "tomcat" and the original . First check the id for the user
[root@dev opt]# id nagios
uid=501(nagios) gid=503(tomcat) groups=503(tomcat)
chgrp -R tomcat apache-tomcat-6.0.18
chgrp -R tomcat tomcat

#To test that the nagios user is able to restart run the following command:
su nagios
/usr/local/tomcat-18version/bin/catalina.sh stop
Privileges also need to be provided to restart the tomcat server and killed in case the tomcat doesn't shutdown. Since only root can start certain ports (i.e. port 80), edit the sudoers file (visudo):
##add the following line below "root    ALL=(ALL)       ALL"
nagios ALL=(ALL) NOPASSWD:/opt/tomcat/bin/catalina.sh,/bin/kill
Now, add the event handler. Create a file in /user/local/nagios/libexec/eventhandler/restart-tomcat.sh
#!/bin/bash

#
# tomcat-restart.sh - tomcat restart script for cron
#
echo "`date`------------ Shutting down tomcat---------------"
CATALINA_PATH=
CATALINA_SCRIPT=catalina.sh

# Verify that tomcat is not running. If it is, stop it gracefully
# get the tomcat pid
tomcat_pid=`ps -ef | grep java | grep tomcat | cut -c10-14`
echo "Tomcat PID is: $tomcat_pid"

if [ -n "$tomcat_pid" ]
then
echo "Stopping tomcat ..."
sudo $CATALINA_PATH/bin/$CATALINA_SCRIPT stop
# give tomcat 60 seconds to shutdown gracefully
sleep 60
fi

tomcat_pid=`ps -ef | grep java | grep tomcat | cut -c10-14`
# if tomcat_pid exists, kill the process
if [ -n "$tomcat_pid" ]
then
echo "Noticed that process is still running trying to kill it"
sudo kill $tomcat_pid
sleep 60
fi

tomcat_pid=`ps -ef | grep java | grep tomcat | cut -c10-14`
# if tomcat_pid still exists, really kill the process
if [ -n "$tomcat_pid" ]
then
echo "Forcefully killing the process for tomcat $tomcat_pid..."
sudo kill -n 9 $tomcat_pid
sleep 60
fi

# restart tomcat
echo "`date` Starting tomcat..."
sudo $CATALINA_PATH/bin/$CATALINA_SCRIPT start
echo "`date` Finished starting tomcat"
Configure an application that runs the event-handler.sh (restart-tomcat-eventhandler.sh). This way when the application restart, a log that monitors everything:
#!/bin/sh

echo "Restarting Tomcat `date`" >> /usr/local/nagios/libexec/eventhandlers/restart-tomcat.log
/usr/local/nagios/libexec/eventhandlers/restart-tomcat.sh >> /usr/local/nagios/libexec/eventhandlers/restart-tomcat.log
echo "Finished `date`" >> /usr/local/nagios/libexec/eventhandlers/restart-tomcat.log
echo "-------------------------Finished `date`-----------------------------"

In the Nagios server
Create the event handler: /opt/user/local/nagios/event-handler/restart-tomcat-eventhandler.sh

#!/bin/sh
#
# Event handler script for restarting the web server on the local machine
#
# Note: This script will only restart the web server if the service is
# retried 3 times (in a "soft" state) or if the web service somehow
# manages to fall into a "hard" error state.
#


# What state is the HTTP service in?

case "$1" in
OK)
# The service just came back up, so don't do anything...
;;
WARNING)
# We don't really care about warning states, since the service is probably still running...
;;
UNKNOWN)
# We don't know what might be causing an unknown error, so don't do anything...
;;
CRITICAL)
# Aha! The HTTP service appears to have a problem - perhaps we should restart the server...

# Is this a "soft" or a "hard" state?
case "$2" in

# We're in a "soft" state, meaning that Nagios is in the middle of retrying the
# check before it turns into a "hard" state and contacts get notified...
SOFT)

# What check attempt are we on? We don't want to restart the web server on the first
# check, because it may just be a fluke!
case "$3" in

# Attempt number
3)
echo -n "Hard-> Restarting JBoss..."
echo -n "/usr/local/nagios/libexec/check_nrpe -H " $4 " -c restart_jboss"

/usr/local/nagios/libexec/check_nrpe -H $4 -c restart_jboss

;;
esac
;;

# The HTTP service somehow managed to turn into a hard error without getting fixed.
# It should have been restarted by the code above, but for some reason it didn't.
# Let's give it one last try, shall we?
# Note: Contacts have already been notified of a problem with the service at this
# point (unless you disabled notifications for this service)
HARD)
echo -n "Hard-> Restarting Tomcat..."
echo -n "/usr/local/nagios/libexec/check_nrpe -H " $4 " -c restart-tomcat"

/usr/local/nagios/libexec/check_nrpe -H $4 -c restart-tomcat


;;
esac
;;
esac
:


Finally, add these event handler as a command by editing the /usr/local/nagios/etc/nrpe.cfg:
command[restart-tomcat]=/usr/local/nagios/libexec/eventhandlers/restart-tomcat-eventhandler.sh
Test that the command is working correctly by executing the following command from the Nagios server:
Now, add the service to restart the server:

/usr/local/nagios/libexec/check_nrpe -H tomcatserver -c restart-tomcat -t 30
define service{
use generic-service
host_name midc
service_description check_midc_login_page
process_perf_data 1
check_command check_http!-H midc.up-mobile.com -u /midc/doLogin.do -w 5 -c 10
event_handler restart-tomcat
}

Now restart nagios (service nagios restart) and you should be ready.

Friday, June 12, 2009

AT&T and Verizon is making a mess out of PSMS

U.S. Telcos have tried over and over again to gain a hold of spammers using MT premium SMS (PSMS), but instead they have made a mess of imposing APIs for aggregators. Verizon has tried OIOO and now AT&T came up with its own version OPPC. Both flows tried to do the same thing, enforce the opt-in to the users so they know what are they being billed. OPPC's flow uses a "start" message that the aggregators have to send to AT&T. This message contains the parameters for AT&T to build the opt-in message (price, description, shortcode, etc). However, what if your campaign is targetting non-english speakers and you want to control the opt-in message? What happens if you are using a trivia or a chat? Should the users receive an opt-in for every question or every chat that they send? Eventually, AT&T came up with a "except-tag". This is the exception to the OPPC rule. I think that eventually, everyone would like to have an "except tag". How about the spammer, wouldn't they eventually use this except tag? This is just frustrating. I understand the use of these API, but why do we need to impose different API for different operators? Can we all just get a standard? Perhaps we need to think about doing the billing via MO like Europe and Latin America is doing.

Tuesday, June 9, 2009

Boxing Business Model - HBR did it!

They said that Latinos have three sports: football (soccer), baseball, and boxing. I love boxing! Being a Nicaraguan and married to a Mexican, I watch boxing and follow most of the light weight fighters. At the moment, my favorite fighter is Manny Pacquiao. I can't get enough of this guy! He is awesome! You can learn so much from boxing. I wondered if anyone made a business model out of this sport. Finally, I found out that Donald Sull made one, Thrive in Turbulent Markets. In here he shows the memorable fight of Mohammed Ali vs. George Foreman "Rumble in the Jungle". Not my favorite fight, but definitely in my top 5. Here he explain how to be Agile like Mohammed Ali to quickly spot and exploit market changes. Also explains how to use an absorption model like Foreman to weather unexpected threats.

mBlox needs to get with it - give up your IM

I have been working with mBlox for quiet some time. We had our rough times, but as of now I've been happy. The only thing that I don't like is that I don't have anyones IM. I once asked my sells rep if I could just have his MSN, Yahoo!, or Skype and he just did not answered. I can always call my sales rep at his office or cell phone, but sometimes I just need to know from someone technical if everything is ok.

The other day I had to look for another SMSC aggregator since mBlox does not have Cricket Wireless. I end-up doing business with Motricity. The first thing that they provided was IM for their technical team and business reps. I'm thinking this is just nothing but customer service. Why would mBlox do this? Something so simple, should be provided. It has been a while since I have a problem with mBlox, but when I do I have to open a ticket. When I have a problem with Motricity, I just IM one of the technical guys or my sales rep. Yes, it's all about customer service.

PSMS beware of Twitter


I believe that a disruptive technology for the PSMS might be Twitter. It's free!! In the book, What Would Google Do? they explain how free is a business model.
Free is impossible to compete against. The most efficient marketplace is a free marketplace. Money gets in the way. It costs money to market and to acquire customers so you can sell things to them.
This is contrary to the concept of PSMS, specially for subscription. Why would anyone subscribe to a "joke" subscription where you get charged $6.99 monthly, when anyone can follow George Lopez, Dane Cook, Dave Chapelle, or even your funniest friend on Twitter for a standard rate (zero, zip, nada)? I encouraged my company to move away from subscriptions, and to start thinking on this business model, even if we currently have a "cash cow".

Another reason of considering Twitter as a disruptive technology is its simplicity. It's so simple to tweet. Even TV shows as Meet The Press, whose average viewers are not your average techie, can be follow on Twitter. Christensen's The Innovator's Dilemma explains that,
Two additional important characteristics of disruptive technologies consistently affects product life cycles and competitive dynamics: First, the attributes that make disruptive products worthless in mainstream markets typically become their strongest selling points in emerging markets; and second, disruptive technologies products tend to be simpler, cheaper, and more reliable and convenient than establish products.

It is obvious that this disruptive technology is coming - and it's coming down hard. I personally thing is going to be a good thing. The users will be getting better service, and will be more in control. We are looking forward to Twitter specially for Latin America.

Thursday, May 21, 2009

Fail fast - so true!

Yesterday, one of my developers end up messing up the firewall setting of my test environment. He was so worry, but I think it was the best thing that ever happened to us. We are in a start-up shop. Which means, that capital funds are scarce and that we need to do more with less. I told him that it was OK. We don't know much Unix, but that we could get it working. In the book What Would Google Do? Jeff Jarvis says,
Corrections enhance credibility... Being willing to be wrong is a key to innovation.
He is right! My team got together from different parts of the world using nothing but our laptops, skype, Unix screen (awesome), and gray matter. You can learn so much more from mistakes. The only thing is that you need to fail fast or make mistakes well.

Wednesday, May 20, 2009

Kannel 1.4.1 SMPP connection problem (bug)

For a while, I had a problem with Kannel where eventually it would stop sending MO for some reason. The messages would not get lost, instead, they started queuing, but they will not sent. Later, I found out the problem. It's only with when the transceiver-mode is set. For example:
group = smsc
smsc = smpp
host = 123.123.123.123
port = 600
transceiver-mode = true
smsc-username = "STT"
smsc-password = foo
system-type = "VMA"
address-range = ""
Apparently, as per Donal Jackson (another Kannel Guru) this error also applies for those SMPP connection that uses the TRX/RX:

group = smsc
smsc = smpp
host = 123.123.123.123
group = smsc
smsc = smpp
host = 123.123.123.123
port = 600
receive-port=1234
smsc-username = "STT"
smsc-password = foo
system-type = "VMA"
address-range = ""

Once again, Sipte Tolj (aka: the messiah of Kannel) helped me with this problem. The way to fix it is to configure a transmitter and receiver and remove the transceiver. The example will be the following:
####TRAX#######
group = smsc
smsc = smpp
host = 123.123.123.123
port = 600
smsc-username = "STT"
smsc-password = foo
system-type = "VMA"
address-range = ""

####RECEIVER#######
group = smsc
smsc = smpp
host = 123.123.123.123
receive-port=2345
smsc-username = "STT"
smsc-password = foo
system-type = "VMA"
address-range = ""

Kannel HTTP Connection


I had to configured an HTTP connection for Kannel today. Below is the way you have to configured the config file:

group = smsc
smsc = http
smsc-id = http_connection
system-type = kannel
smsc-username = tester
smsc-password = foobar
port = 11030
send-url = "http://localhost:9090/outgoingmessage"

Once you configured this part, my biggest question was, how you will send the MO (incoming message) to kannel. It's implied that the MT will be using the "send-url".

Later I found out from my friend Stipe Tolj (best Telco consultant and one of the senior developers of Kannel) that it does not matter. You can do the following to send the MO:
http://localhost:11030/?username=tester&password=foobar&from=1234567890&to=4444&text=Hello

The only thing you need to use is the port for the configuration, username, and password. Then, you use the same syntax as the sendsms api. You also need to make sure you can access this port. You can always use the iptables syntaxt below:
iptables -I RH-Firewall-1-INPUT -p tcp -m tcp --dport 11030 -j ACCEPT

Wednesday, May 13, 2009

Software Architecture and Management

Business should understand the risks that are involved when you put speed in front of architecture. I worked on a project that was considered such a cash cow that they rush to get the application out without considering the architecture. When it was deployed, soon we noticed several problems:
  • Complexity - it took a long time to fix problems due to the fact that we did not had a domain model and the business rules where scared throughout the code
  • Poor performance - the application was slow
  • Scalability - we had to invest a lot of money in hardware because the application was not scaling to the amount of users
Ultimately, the team had to explain to the business why we had to take a long time re-factoring the code. Not only we jeopardize architecture over speed, but we missed to explain the causes and effects of this decision. I stumbled on a book in Barnes and Noble named: 97 Things Every Software Architect Should Know. If the team would have applied some of the followings, the project would have been such a great success:
All these items are very true indeed and I am taking them in consideration in my current and future projects. Eventually the company was hit hard with the US recession and Miami's Real-estate. The entire team was laid off.

Tuesday, May 12, 2009

Bad Development Managers

I believe that development managers should be developers themselves. I was in a team where a former PMO became the manager for our team. He has been by FAR the worse manager I've ever had. It nearly destroyed the team. He didn't know what we were doing, and yet he considered himself a developer.

One day I mentioned some trends that I've seem. We were doing a serious of small websites (micro-sites) that were pure admin, had to done quickly, and were constantly changing. I mentioned that we should look into Ruby on Rails. He quickly told me about "standardization". I talked to him about what Neal Ford calls Poliglot Programing. The manager just did not get it. He also noticed that the team started to reject him, so he started to put some distance between us and him. Meeting were done through e-mail, requirements where discussed with IM, and status were done over the phone. It got so bad that the team eventually talked to HR.

I talked to one of my mentors (a finance guru who loves technology and the best CIO I ever had) about this manager and about some of the incidents. He told me that once should see the "silver lining" of the situation. One can really learn about bad managers. He was right! Through this bad manager, I learned that managers should monitor the "pulse" of his/her team members. They are the ones that will help you achieve your goals. HBR published an article How Not to Lose the Top Job where they talked about this issue,

The leadership feedback that you (and your CEO) receive from your direct reports can be an excellent predictor of your ability to lead the company. Poor feedback from a direct report can sabotage an heir apparent, just as poor feedback from peers can

Eventually the manager was laid off, but by that time I had left the company. I'm happy to report that one of my friend, and frankly the best guy for the position (a developer), is in charge of this team.

Monday, May 11, 2009

Agile and Management - using one-on-ones model

I've been programming for over 10 years, and I think that Agile methodologies are KEY for any IT field, but when I moved to management there was something missing: I did not know how to manage my team.

I joined a start-up company focused on the Latin American market and SMS applications. I started as a senior developer and eventually senior management made me Director of Technology. I was under the influence that not much was going to change. I had only a handful of directs, and I worked with them for quiet some time. Therefore, I continued doing my stand-up meetings, showcases, Sprint Planning Meetings, retrospectives, etc. However, many incidents quickly made me realize that I didn't know my directs as well as I thought.

One incident happened right after I joined the company. One of my developers (who was also a senior) became very distant and also showed some attitude. Also, some of the developers started coming in late. Worse of all, I was terrible on providing feedback. I was stressed and always talking bad to them.

Lucky enough, I stumbled on the podcast "Manager Tools". They had a podcast named "Boss One-on-Ones - Professional Updates". It really changed the way I managed my team. The model works awesome and it's very simple. The model is based on weekly meeting with each direct for 30 minutes.
  1. The first 10 minutes is for them,
  2. Next 10 minutes for you
  3. Last 10 minutes for development.
It took a while for my team to open-up, but later I realize that one of my developers wanted my position, the other developer said that he always needed help meeting deadlines and coming on time. Not only was I able to know my directs, but I was becoming a better manager. Because I knew my directs better I was able to help them with their careers, and I was able to efficiently delegate since I knew their strengths and weakness. Highly recommend it!