Wednesday, February 23, 2011

My rant about legacy code

"Ya ni llorar es bueno", this is a Mexican saying that translates to "it's not even worth crying anymore". I thought about this saying after a talked with few companies and friends regarding legacy application, specially legacy code that directly hit the ROI of the company. To add some context around this post, let me explain what does "legacy code" means to me:
  • It is using a technology relatively "old" (+5 years)
  • It is in production
  • The application plays an important role in the company

I have worked on a few of these applications and I hate most of them! Not because the they are hard but because of politics. It has been my perception that managers are "gun shy" when it comes to handle the re-write of legacy code when they are clearly out of control. I am not sure why this is, but this are some my reasons based on my interviews with tech leaders and developers:

  1. Time: aah, the Pandora box of managers. Lets face it, it is hard to estimate software projects, and worse the re-factoring of a major legacy code. The stakes are high if the re-factoring goes wrong. Managers have mentioned that these refactoring are a double edge sword because it can be really good, or it could eventually damage their reputation and the morale of the team.
  2. Resources: many manager are fighting with other departments when it comes on resources/programers. Many companies are trying to innovate and create new applications to latch on to the new business models. These projects take priority over the legacy systems that they have in place (if it's not broken, don't fix it).
  3. Domain experts: many of the large dev shops (+20 developers) have two types of development departments: engineer and support. The engineers are senior developers with at least 5 years of expertise. They develop the application based on the specs of the project manager and product owners. The support team are a mixture of junior and junior-senior developers. They take over the code when the engineers deploy the application to production. The problem with this model is the lack of mentoring/couching of the junior team. The support team does not have the know-how or the domain expertise as the engineers. The consequences are the introduction of bad code or bad practices when they need to add a feature or fix a bug.

Some of the dark side of the legacy codes are the following:
  • Bloated controllers and DAOs
  • Business logic is ALL over the code (DAO, stored procedures, controllers, DTO, the list goes on)
  • Silo effect - just a few developers know about the application
  • Fragile code - tightly coupled code
  • Bug identification and turn-around time is large
  • Application is slow

I doubt that I can get the right answer to this problems, but here is my advise to current and aspiring managers regarding this matter:
  1. Bite the bullet: if the legacy application is a major part of your business, don't wait any longer. Get a grip on this application before it get any worse and start doing a re-factoring by doing small iterations (not longer than 2 weeks old).
  2. Candid conversations: ask the tough questions regarding the application like "does the architecture needs to be changed?" Have all the developers review the current app and see what should be changed. Identify the large parts of the code to change (architecture) and what are the smaller (low hanging fruits) projects that can give more momentum to the project.
  3. Go agile! An agile methodology like XP, Scrum, and Kanban add great value to projects specially like pair programming, test-driven development (tdd), domain-driven development (ddd), and continuous integration (CI). The greatest thing about Agile is its rapid time to market. Small iterations means that your customers can see what you did in a couple of weeks and get their feedback. This also provides early risk reduction since you can find bugs relatively quickly. Pair programming helps mentor junior developers and avoid the silo effect. TDD guarantees that any code added to the application is tested before shipping it to production and gives better quality to your customers. DDD adds "depth" to your code and isolates all the business logic in one area of the code. This way, any developer know that if there is a change in the business logic, he/she needs to look for the domain packages. Finally, CI (building pipelines) it is primarily focused on asserting that the code compiles successfully and passes the suit of unit and acceptance tests (including performance and scalability tests).
Legacy code is a dreaded word for developers. The fact is that no developer wants to develop in Java 1.4 or Struts 1.x. If your want a company to attract good talented individuals, try to handle your legacy code.

Again, I'm sure that I missed or don't understand ALL the reasons why so many companies have large legacy code in their core systems, so please...I welcome your thoughts in this matter.

Thursday, February 17, 2011

My take on Ruby and Java

Following the advise from Debasish Ghosh, I put my list of new languages to learn and one of them was Ruby.

Why Ruby?
  • Many web developers have told me great deals about the language and aboutRails.
  • Puppet is a system management that I want to learn and it uses Ruby.
  • Some of my pending books use Ruby for its examples - Programming Amazon Web Services
  • I have a few projects in my pipeline that requires a lot of code with basic funcionality (CRUD) and a nice GUI
  • I want to avoid the "gold hammer" antipattern

Questions that I had up-front:
  • Is the syntax that nice? EVERY SINGLE Ruby guy swears by the syntax. What is so sexy about the syntax?
  • How slow is it? I have been hearing a lot of rumors regarding the horrible GC of Ruby. Java programmers have blog that JRuby was faster than Ruby
  • Multithreaded - green threads? Heard a lot from colleagues that Ruby is not well equipped for multithreaded
  • How long would it take me to learn it? Try to measure the learning curve.
  • Where would I use this in real life? Where are the best places/projects that I could use Ruby and most important which ones to avoid (Twitter).
  • How does it compare to Java? I'm sure that there are some pros and cons regarding the syntax, but what is difference of using Groovy, JRuby, or Scala vs using Ruby?

Analysis
Syntax
Indeed, the syntax is very nice. It is very similar to Groovy. It does has a few features in the language that I haven't seen in others. Here are some examples:

Virtual Attributes
I created a BookInStock class along with a few required parameters, then I wanted to get the price in cents.
class BookInStock
attr_reader :isbn, :price
attr_writer :price

def initialize(isbn, price)
@isbn = isbn
@price = Float(price)
end

def price_in_cents
Integer(@price*100 + 0.5)
end

def price_in_cents=(cents)
@price = cents / 100.0
end
end

book = BookInStock.new("isbn1", 33.80)
puts "Price = #{book.price}"
puts "Price in cents = #{book.price_in_cents}"
book.price_in_cents = 1234
puts "Price = #{book.price}"
puts "Price in cents = #{book.price_in_cents}"

Result:
Price = 33.8 Price in cents = 3380
Price = 12.34
Price in cents = 1234

Here we've used attribute method to create a virtual instance variable. To the outside world, price_in_cents seems to ben attribute like any other. Internally, though, it has no corresponding instance variable.

Default Hash
Lets say that you need to count the amount of words in a file. The way that would do the counting, is that each line would be treated like an array of words. Create a hash, check if the hash has the word, if it doesn't then add the count of 1; otherwise, increment it.

if(counts.containsKey(word))
counts.put(word, counts.get(word) + 1);
else {
counts.put(word, 1);
}

In Ruby you can create a Hash.new(0), the parameter (0 in this case) will be used as the hash's default value - it will be the value returned if you look up a key that isn't yet there.

def count_frequency(word_list)
counts = Hash.new(0)
for word in word_list
counts[word] += 1
end
counts
end

Regular Expression
You can give name to the pattern and retrieve the value of the matched regular expression
pattern = /(\d\d):(\d\d):(\d\d)/
string = "It is 12:34:56 precisely"
if match = pattern.match(string)
puts "Hour = #{match[:hour]}"
puts "Min = #{match[:min]}"
puts "Sec = #{match[:sec]}"
end

The result is the following:
Hour = 12
Min = 34
Sec = 56

Lambda
Lambda is like a call done assuming that it is like a method:
Form example:
say_hi = lambda {|a| "Hello #{a}"}
say_hi.("quintin")
Will return, "Hello quintin"

Another example, lets say that you have to calculate ten consecutive numbers but the calculation is based on a parameter passed by the user (multiplication or addition).
if operator == :multiplication && number != nil
calc = lambda{|n| n * number}
else
calc = lambda{|n| n + number}
end

puts ((1...10).collect(&calc).join(","))

Here, line 7 does the iteration of each of the 10 numbers and uses the appropriate calculation based on the parameter and passes each number to it. Then, it joins each number with a comma.

Slow?
The Ruby team has done a lot to its virtual machine, but (coming from a static typed language perspective) it is still hugs a lot resources.

Multithreaded
In the previous version of Ruby (1.8), the way that it handle the threads was through its VM. This process is called green threads. In Ruby 1.9, threading is now performed by the operating system. The advantage is that it is available for multiprocessors. However, it operates only a single thread at a time.

Learning Curve
If you are using Groovy, Python, or any similar dynamic language, the learning curve is almost null. The only problem is the lack of good support for editors. NetBeans used to be a very nice editor, but they have decided to discontinue the support for Ruby. I've been a Mac guy for quiet some time now, so I've been using TextMate and it worked great.

Where should I use it?
I try to "pigeon hole" the applications carefully before me or my team decide which type of language to use. For example, if the application needs to be highly efficient with a large amount of transactions, and performance needs to be immediate, I do NOT trust Ruby (sorry). I would turn to Java, Spring, Hibernate/iBatis, EHCache stack. However, if the application is a quick and simple CRUD pages with low usage like a series of admin pages, then Ruby would be my choice.

Comparing Ruby to Java
It is worth to keep Ruby in your toolbox in case you need to do quick scripts or sites. I did end up learning different things, like Ruby's nice components. For example, if you would like to get the top-five words in the previous word count example, you can just do the following:
sorted    = counts.sort_by {|word, count| count}
top_five = sorted.last(5)
The sort_by and inject are two handy component. Here are some other examples:
[ 1, 2, 3, 4, 5 ].inject(:+) # => 15

( 'a'..'m').inject(:+) # => "abcdefghijklm"

Also, being a TDD guy, I really enjoyed the Behavior-Driven Development or BDD. Based on the books and forums that I've read, this is the Ruby community's choice of tests. It encourages people to write tests in terms of your expectations of the program’s behavior in a given set of circumstances. In many ways, this is like testing according to the content of user stories, a common requirements-gathering technique in agile methodologies. Some of the frameworks are RSpec and Shoulda. With these testing frameworks, the focus is not on assertions. Instead, you write expectations.

The class that I created is a class for tennis tournament:
class TennisScorer
OPPOSITE_SIDE_OF_NET = {:server => :receiver, :receiver => :server}

def initialize
@score = { :server => 0, :receiver => 0 }
end

def score
"#{@score[:server]*15}-#{@score[:receiver]*15}"
end

def give_point_to(player)
other = OPPOSITE_SIDE_OF_NET[player]
fail "Unkown player #{player}" unless other
@score[player] += 1
end

end

The test is the following:

require 'simplecov'
SimpleCov.start

require_relative "tennis_scorer"

describe TennisScorer, "basic scoring" do
let(:ts) { TennisScorer.new}


it "should start with a score of 0-0" do
ts.score.should == "0-0"
end

it "should be 15-0 if the server wins a point" do
ts.give_point_to(:server)
ts.score.should == "15-0"
end

it "should be 0-15 if the receiver wins a point" do
ts.give_point_to(:receiver)
ts.score.should == "0-15"
end

it "should be 15-15 after they both win a point" do
ts.give_point_to(:receiver)
ts.give_point_to(:server)
ts.score.should == "15-15"
end
end

As you can see you add context to your test and check if the solution is fine. Executing the tests echoes the expectations back and validates each test.

How slow is it?
Dave Thomas, author of the book Programing Ruby 1.9 version 3 wrote the following in his post regarding the re-factoring of the code of Twitter from Ruby to Scala:

At the kinds of volumes that Twitter handles (and with what I assume is a somewhat scary growth curve), Twitter needs to improve concurrency—it needs an environment/language with low memory overhead, incredible performance, and super-efficient threading. I don't know if Scala fits that particular bill, but I know that current Ruby implementations don't. It isn't what Ruby's intended to be. So the move away is just sound thinking. (I suspect it also took some courage.) I applaud Alex and the team for this.

Instead of defending Ruby when it's clearly not an appropriate solution, let's think about things the other way around.

The good folks at Twitter started off with Ruby because they wanted to get something running quickly, and they wanted to experiment. And Ruby gave them that. And, what's more, Ruby saw them through at least two rounds of phenomenal growth. Could they have done it in another language? Sure. But I suspect Ruby, despite the occasional headache, helped them get where they are now.


Conclusion
Finally, I would recommend learning Ruby and I would definitely keep doing more stuff with it. Although there are some characteristics similar to Groovy, Python, and others dynamic languages, it does has some nice different features. Also, the Ruby community is very vibrant/active. There are thousands of programmers building packages/APIs or gems.

The way to load the API is very similar to the "yum" command in Linux. Let say you need to do the following:
  1. Connect to a GMail account
  2. Check for e-mails that have attachments
  3. Do some type of business logic
  4. Send an e-mail with your results
I could create everything from zero, but instead I was able to find a nice little gem named "gmail". I just did "gem install gmail" and voila, I got the API! I also wanted to use MongoDB for a Ruby on Rails (RoR) project and a podcast explain to me how to do it.

Again, this is just the beginning but it was a really nice experience and remind me back of why I got into programming. The challenge and the unknown is what drive must of us on finding better solutions.

Wednesday, February 2, 2011

Persisting Tuning

Persisting tuning has been drilled into my head early in my career and I understand the term, "earlier is cheaper". At the beginning of my career I worked for a major bank on a data warehouse. I worked as a junior developer along side with some pretty solid data architects. I learned a lot about databases. Specially, that they are notorious for bottlenecks in applications. Later in my career, I worked on web development projects. I found love with the Spring framework and ORM (Hibernate and Ibatis). Here are my thought regarding a presentation done by Thomas Risberg, a senior consultant from SpringSource, he stated the following:
There is no silver bullet when it comes to persistence tuning. You have to keep an open mind an approach the problem systematically from a variety of angels
The presentation did not touch on "big data" and the NoSQL movement. However, there is still a lot of good stuff in it, specially if you are using Java, Spring, Hibernate, and an ODBC.

Currently, I have been working on an application that needed to support up to 200 messages per second. Below are a few strategies and process that I implemented to try to increase three major task:
  1. Performance: response time needs to be in millisecond
  2. Scalability: able to hold 200 message per second
  3. Availability: able to scale horizontally (not buying a bigger box, but getting a similar box)
DBA - Developer Relationship
When creating a database, you have two tasks as a DBA:
Operational DBA:
  • ongoing monitoring and maintenance of database system
  • keep the data safe and available
Development DBA:
  • plan and design new application database usage
  • tune queries
The Operational DBA role is the following:
  • Data volumes, row sizes
  • Growth estimates, update frequencies
  • Availability requirements
  • Testing/QA requirements
Development DBA role is the following:
  • Table design
  • Query tuning
  • Maintenance policies for purging/archiving

Database Design:

Database design can play a critical role in application performance

Guidelines:
  • Pick appropriate data types
  • Limit number of columns per table
  • Split large, infrequently used columns into a separate one-to-one table
  • Choose your index carefully - expensive to maintain. improves query
  • Normalize your data
  • Partition your data

Application Tuning

Balance performance and scalability concerns. For example, full table-lock could improve performance but hurt scalability

Improve concurrency
  • Keep your transactions short
  • Do bulk processing off-hours
Understand your database system's locking strategy
  • some lock only on writes
  • some locks on reads
  • some escalate row locks on table locks

Performance improvements
Limit the amount of data you pull into the application layer
  • Limit the number of rows
  • Select only the column you need

Tune your ORM and SQL

  • consider search criteria carefully
  • avoid NULLS in the where clause - NULLS aren't index
  • avoid LIKE beginning with % since index might not be used

Spring Data Access Configuration:

Pick a transaction management that fits your application needs
  • Favor a local resource DataSourceTransactionManager
  • Using a container managed DataSource can help during Application Server support calls
  • JtaTransactionManager using XA transactions is more expensive. Warning: XA transaction sometimes needed whenJMS and database access used together
Why XA are more expensive? Because of its setup and its overhead
Setup includes:
  • run a transaction coordinator
  • XA JDBC driver
  • XA Message Broker
  • put the XA recovery log to a reliable log storage
XA has a significant run time overhead
  • two phase commit
  • state tracking
  • recovery
  • on restart: complete pending commits/rollbacks --> read the "reliable recovery log"

General transaction points:
  • Keep your transactions short to avoid contention
  • Always specify the read-only flag where appropriate, in particular for ORM transactions
  • Avoid SERIALIZABLE unless you absolutely need it

JDBC Tuning
Connection to the database is slow - use a third-party connection pool like DBCP, C3PO or native one like oracle.jdbc.pool.OracleDataSource. Never use DriverManagerDataSource

Improve availability by Configuration the DataSource to survive a database restart. Specify a strategy to test that connectionare alive
  • JDBC 4.0 isValid()
  • simple query
  • metadata lookup
Consider a clustered high-availability solution like Oracle RAC


Use Prepared Statements
  • Use prepared statements with placeholder for parameters - like select id, name from customers where age > ? and state = ?
  • Prepared statements allow the database to reuse access plans
  • Prepared statements can be cached and reused by the connection pool to improve performance
  • The JdbcTemplate can be configured using setFetchSize
  • Larger fetchSize let you lower the number of network roundtrips necessary when retrieving large results
Favor query() methods with custom RowMapper vs. queryForList().
queryForList() uses a ColumnMapRowMapper which generates a lot of expensive HashMaps

private JdbcTemplate jdbcTemplate;

public void setDataSource(DataSource dataSource) {
this.jdbcTemplate = new JdbcTemplate(dataSource);
}

public List> getList() {
return this.jdbcTemplate.queryForList("select * from mytable");

Using column index instead of column label does give a slight speed advantage but it makes code harder to read and maintain

public Actor findActor(String specialty, int age) {

String sql = "select id, first_name, last_name from T_ACTOR" +
" where specialty = ? and age = ?";

RowMapper mapper = new RowMapper() {
public Actor mapRow(ResultSet rs, int rowNum) throws SQLException {
Actor actor = new Actor();
actor.setId(rs.getLong("id"));
actor.setFirstName(rs.getString("first_name"));
actor.setLastName(rs.getString("last_name"));
return actor;
}
};


// notice the wrapping up of the argumenta in an array
return (Actor) jdbcTemplate.queryForObject(sql, new


ORM Tuning
  • Don't load more data than needed - use lazy loading
  • Can however result in many roundtrips to database - n+1 problem
  • Avoid excessive number of database roundtrips by selective use of eager loading
  • Consider caching frequently used data that's rarely updated
  • Learn specific fetching and caching options for your ORM product
Determine a good fetch strategy (Hibernate)
  • use "select" fetch mode for relationships needed sometimes - result in 1 or 2 queries
  • use "join" fetch mode for relationships needed all the time 0 limit to single collection for one entity since the results for multiple joins could be huge
  • Fetch mode can be specified statically in the ORM mapping or dynamically for each query
  • Capture generated SQL to determine if your strategy generates expected queries
  • Use to prefetch a number of proxies and/or collections
  • Caching allows you to avoid database access for already loaded entities

Enable shared (second-level) cache for entities that are read-only or are modified infrequently
  • Include data when not critical it's kept up-to-date
  • Data that's not shared with applications
  • Choose an appropriate cache policy including expiration policy and concurrency strategy (read-write, read-only ...)

Consider enabling query cache for query that is repeated frequently and where the referenced table aren't updated very often

More on Hibernate: "Working with Hibernate with Spring 2.5" with Rossen Stoyanchev


Bulk Operations
  • Perform bulk operations in database layer if possible
  • SQL statements - update, delete
  • Stored procedures
  • Native data load tools
  • From the application layer - use batch operations
  • JdbcTemplate - batchUpdate
  • SimpleJdbcInsert - executeBatch
  • SpringBatch - BatchSqlUpdateItemWriter
  • Hibernate - set hibernate jdbc.batch_size and flush and clear session after each batch

SQL Tuning
SQL is the biggest bottleneck when it comes to performance problems
  • Capture the problem SQL
  • Run EXPLAIN
  • Make adjustments
  • ANALYZE
  • Tweak optimizer
  • Add index
  • Repeat until adequate performance

Capture SQL Statements
  • Use JPA - add to LocalContainerEntityManagerFactoryBean
  • Using Hibernate - add to LocalSessionFactoryBean
  • Alternative using Hibernate with Log4J

Database Specific Tools:
- MySQL has a Slow Query Log

---log-slow-queries and --log-queries-not-using-index
Analyze Tables and Indexes
  • Use ANALYZE to provide statistics for the optimizer (Oracle)
  • Other database use similar commands
  • Learn how the optimizer works for specific database
Summary
  • Capture your SQL and run Explain/Autotrace on the slowest statements
  • Review your DataSource and Transaction configurations
  • Work with your DBAs to tune applications and database

Back at it!

I have not been able to update my blog. Towards the end of last year, my daughter, Juliana Olivas, was born. Therefore, I'm now trying to change the world one diaper at a time. Also, I'm trying to stay awake and keeping up with my current job at Up-Mobile. I am hoping to start updating my blog this week.