Wednesday, February 2, 2011

Persisting Tuning

Persisting tuning has been drilled into my head early in my career and I understand the term, "earlier is cheaper". At the beginning of my career I worked for a major bank on a data warehouse. I worked as a junior developer along side with some pretty solid data architects. I learned a lot about databases. Specially, that they are notorious for bottlenecks in applications. Later in my career, I worked on web development projects. I found love with the Spring framework and ORM (Hibernate and Ibatis). Here are my thought regarding a presentation done by Thomas Risberg, a senior consultant from SpringSource, he stated the following:
There is no silver bullet when it comes to persistence tuning. You have to keep an open mind an approach the problem systematically from a variety of angels
The presentation did not touch on "big data" and the NoSQL movement. However, there is still a lot of good stuff in it, specially if you are using Java, Spring, Hibernate, and an ODBC.

Currently, I have been working on an application that needed to support up to 200 messages per second. Below are a few strategies and process that I implemented to try to increase three major task:
  1. Performance: response time needs to be in millisecond
  2. Scalability: able to hold 200 message per second
  3. Availability: able to scale horizontally (not buying a bigger box, but getting a similar box)
DBA - Developer Relationship
When creating a database, you have two tasks as a DBA:
Operational DBA:
  • ongoing monitoring and maintenance of database system
  • keep the data safe and available
Development DBA:
  • plan and design new application database usage
  • tune queries
The Operational DBA role is the following:
  • Data volumes, row sizes
  • Growth estimates, update frequencies
  • Availability requirements
  • Testing/QA requirements
Development DBA role is the following:
  • Table design
  • Query tuning
  • Maintenance policies for purging/archiving

Database Design:

Database design can play a critical role in application performance

  • Pick appropriate data types
  • Limit number of columns per table
  • Split large, infrequently used columns into a separate one-to-one table
  • Choose your index carefully - expensive to maintain. improves query
  • Normalize your data
  • Partition your data

Application Tuning

Balance performance and scalability concerns. For example, full table-lock could improve performance but hurt scalability

Improve concurrency
  • Keep your transactions short
  • Do bulk processing off-hours
Understand your database system's locking strategy
  • some lock only on writes
  • some locks on reads
  • some escalate row locks on table locks

Performance improvements
Limit the amount of data you pull into the application layer
  • Limit the number of rows
  • Select only the column you need

Tune your ORM and SQL

  • consider search criteria carefully
  • avoid NULLS in the where clause - NULLS aren't index
  • avoid LIKE beginning with % since index might not be used

Spring Data Access Configuration:

Pick a transaction management that fits your application needs
  • Favor a local resource DataSourceTransactionManager
  • Using a container managed DataSource can help during Application Server support calls
  • JtaTransactionManager using XA transactions is more expensive. Warning: XA transaction sometimes needed whenJMS and database access used together
Why XA are more expensive? Because of its setup and its overhead
Setup includes:
  • run a transaction coordinator
  • XA JDBC driver
  • XA Message Broker
  • put the XA recovery log to a reliable log storage
XA has a significant run time overhead
  • two phase commit
  • state tracking
  • recovery
  • on restart: complete pending commits/rollbacks --> read the "reliable recovery log"

General transaction points:
  • Keep your transactions short to avoid contention
  • Always specify the read-only flag where appropriate, in particular for ORM transactions
  • Avoid SERIALIZABLE unless you absolutely need it

JDBC Tuning
Connection to the database is slow - use a third-party connection pool like DBCP, C3PO or native one like oracle.jdbc.pool.OracleDataSource. Never use DriverManagerDataSource

Improve availability by Configuration the DataSource to survive a database restart. Specify a strategy to test that connectionare alive
  • JDBC 4.0 isValid()
  • simple query
  • metadata lookup
Consider a clustered high-availability solution like Oracle RAC

Use Prepared Statements
  • Use prepared statements with placeholder for parameters - like select id, name from customers where age > ? and state = ?
  • Prepared statements allow the database to reuse access plans
  • Prepared statements can be cached and reused by the connection pool to improve performance
  • The JdbcTemplate can be configured using setFetchSize
  • Larger fetchSize let you lower the number of network roundtrips necessary when retrieving large results
Favor query() methods with custom RowMapper vs. queryForList().
queryForList() uses a ColumnMapRowMapper which generates a lot of expensive HashMaps

private JdbcTemplate jdbcTemplate;

public void setDataSource(DataSource dataSource) {
this.jdbcTemplate = new JdbcTemplate(dataSource);

public List> getList() {
return this.jdbcTemplate.queryForList("select * from mytable");

Using column index instead of column label does give a slight speed advantage but it makes code harder to read and maintain

public Actor findActor(String specialty, int age) {

String sql = "select id, first_name, last_name from T_ACTOR" +
" where specialty = ? and age = ?";

RowMapper mapper = new RowMapper() {
public Actor mapRow(ResultSet rs, int rowNum) throws SQLException {
Actor actor = new Actor();
return actor;

// notice the wrapping up of the argumenta in an array
return (Actor) jdbcTemplate.queryForObject(sql, new

ORM Tuning
  • Don't load more data than needed - use lazy loading
  • Can however result in many roundtrips to database - n+1 problem
  • Avoid excessive number of database roundtrips by selective use of eager loading
  • Consider caching frequently used data that's rarely updated
  • Learn specific fetching and caching options for your ORM product
Determine a good fetch strategy (Hibernate)
  • use "select" fetch mode for relationships needed sometimes - result in 1 or 2 queries
  • use "join" fetch mode for relationships needed all the time 0 limit to single collection for one entity since the results for multiple joins could be huge
  • Fetch mode can be specified statically in the ORM mapping or dynamically for each query
  • Capture generated SQL to determine if your strategy generates expected queries
  • Use to prefetch a number of proxies and/or collections
  • Caching allows you to avoid database access for already loaded entities

Enable shared (second-level) cache for entities that are read-only or are modified infrequently
  • Include data when not critical it's kept up-to-date
  • Data that's not shared with applications
  • Choose an appropriate cache policy including expiration policy and concurrency strategy (read-write, read-only ...)

Consider enabling query cache for query that is repeated frequently and where the referenced table aren't updated very often

More on Hibernate: "Working with Hibernate with Spring 2.5" with Rossen Stoyanchev

Bulk Operations
  • Perform bulk operations in database layer if possible
  • SQL statements - update, delete
  • Stored procedures
  • Native data load tools
  • From the application layer - use batch operations
  • JdbcTemplate - batchUpdate
  • SimpleJdbcInsert - executeBatch
  • SpringBatch - BatchSqlUpdateItemWriter
  • Hibernate - set hibernate jdbc.batch_size and flush and clear session after each batch

SQL Tuning
SQL is the biggest bottleneck when it comes to performance problems
  • Capture the problem SQL
  • Make adjustments
  • Tweak optimizer
  • Add index
  • Repeat until adequate performance

Capture SQL Statements
  • Use JPA - add to LocalContainerEntityManagerFactoryBean
  • Using Hibernate - add to LocalSessionFactoryBean
  • Alternative using Hibernate with Log4J

Database Specific Tools:
- MySQL has a Slow Query Log

---log-slow-queries and --log-queries-not-using-index
Analyze Tables and Indexes
  • Use ANALYZE to provide statistics for the optimizer (Oracle)
  • Other database use similar commands
  • Learn how the optimizer works for specific database
  • Capture your SQL and run Explain/Autotrace on the slowest statements
  • Review your DataSource and Transaction configurations
  • Work with your DBAs to tune applications and database

No comments:

Post a Comment