Persisting tuning has been drilled into my head early in my career and I understand the term, "earlier is cheaper". At the beginning of my career I worked for a major bank on a data warehouse. I worked as a junior developer along side with some pretty solid data architects. I learned a lot about databases. Specially, that they are notorious for bottlenecks in applications. Later in my career, I worked on web development projects. I found love with the Spring framework and ORM (Hibernate and Ibatis). Here are my thought regarding a presentation done by Thomas Risberg, a senior consultant from SpringSource, he stated the following:
There is no silver bullet when it comes to persistence tuning. You have to keep an open mind an approach the problem systematically from a variety of angelsThe presentation did not touch on "big data" and the NoSQL movement. However, there is still a lot of good stuff in it, specially if you are using Java, Spring, Hibernate, and an ODBC.
Currently, I have been working on an application that needed to support up to 200 messages per second. Below are a few strategies and process that I implemented to try to increase three major task:
When creating a database, you have two tasks as a DBA:
Operational DBA:
Database Design:
Database design can play a critical role in application performance
Guidelines:
Application Tuning
Balance performance and scalability concerns. For example, full table-lock could improve performance but hurt scalability
- Performance: response time needs to be in millisecond
- Scalability: able to hold 200 message per second
- Availability: able to scale horizontally (not buying a bigger box, but getting a similar box)
When creating a database, you have two tasks as a DBA:
Operational DBA:
- ongoing monitoring and maintenance of database system
- keep the data safe and available
- plan and design new application database usage
- tune queries
- Data volumes, row sizes
- Growth estimates, update frequencies
- Availability requirements
- Testing/QA requirements
- Table design
- Query tuning
- Maintenance policies for purging/archiving
Database design can play a critical role in application performance
Guidelines:
- Pick appropriate data types
- Limit number of columns per table
- Split large, infrequently used columns into a separate one-to-one table
- Choose your index carefully - expensive to maintain. improves query
- Normalize your data
- Partition your data
Balance performance and scalability concerns. For example, full table-lock could improve performance but hurt scalability
Improve concurrency
Performance improvements
Limit the amount of data you pull into the application layer
Tune your ORM and SQL
Spring Data Access Configuration:
Pick a transaction management that fits your application needs
JDBC Tuning
- Keep your transactions short
- Do bulk processing off-hours
- some lock only on writes
- some locks on reads
- some escalate row locks on table locks
Performance improvements
Limit the amount of data you pull into the application layer
- Limit the number of rows
- Select only the column you need
- consider search criteria carefully
- avoid NULLS in the where clause - NULLS aren't index
- avoid LIKE beginning with % since index might not be used
Pick a transaction management that fits your application needs
- Favor a local resource DataSourceTransactionManager
- Using a container managed DataSource can help during Application Server support calls
- JtaTransactionManager using XA transactions is more expensive. Warning: XA transaction sometimes needed whenJMS and database access used together
Why XA are more expensive? Because of its setup and its overhead
Setup includes:
- run a transaction coordinator
- XA JDBC driver
- XA Message Broker
- put the XA recovery log to a reliable log storage
XA has a significant run time overhead
- two phase commit
- state tracking
- recovery
- on restart: complete pending commits/rollbacks --> read the "reliable recovery log"
General transaction points:
- Keep your transactions short to avoid contention
- Always specify the read-only flag where appropriate, in particular for ORM transactions
- Avoid SERIALIZABLE unless you absolutely need it
Connection to the database is slow - use a third-party connection pool like DBCP, C3PO or native one like oracle.jdbc.pool.OracleDataSource. Never use DriverManagerDataSource
Improve availability by Configuration the DataSource to survive a database restart. Specify a strategy to test that connectionare alive
Improve availability by Configuration the DataSource to survive a database restart. Specify a strategy to test that connectionare alive
- JDBC 4.0 isValid()
- simple query
- metadata lookup
Consider a clustered high-availability solution like Oracle RAC
Use Prepared Statements
- Use prepared statements with placeholder for parameters - like select id, name from customers where age > ? and state = ?
- Prepared statements allow the database to reuse access plans
- Prepared statements can be cached and reused by the connection pool to improve performance
- The JdbcTemplate can be configured using setFetchSize
- Larger fetchSize let you lower the number of network roundtrips necessary when retrieving large results
Favor query() methods with custom RowMapper vs. queryForList().
queryForList() uses a ColumnMapRowMapper which generates a lot of expensive HashMaps
Using column index instead of column label does give a slight speed advantage but it makes code harder to read and maintain
ORM Tuning
Enable shared (second-level) cache for entities that are read-only or are modified infrequently
Consider enabling query cache for query that is repeated frequently and where the referenced table aren't updated very often
private JdbcTemplate jdbcTemplate;
public void setDataSource(DataSource dataSource) {
this.jdbcTemplate = new JdbcTemplate(dataSource);
}
public List
Using column index instead of column label does give a slight speed advantage but it makes code harder to read and maintain
public Actor findActor(String specialty, int age) {
String sql = "select id, first_name, last_name from T_ACTOR" +
" where specialty = ? and age = ?";
RowMappermapper = new RowMapper () {
public Actor mapRow(ResultSet rs, int rowNum) throws SQLException {
Actor actor = new Actor();
actor.setId(rs.getLong("id"));
actor.setFirstName(rs.getString("first_name"));
actor.setLastName(rs.getString("last_name"));
return actor;
}
};
// notice the wrapping up of the argumenta in an array
return (Actor) jdbcTemplate.queryForObject(sql, new
ORM Tuning
- Don't load more data than needed - use lazy loading
- Can however result in many roundtrips to database - n+1 problem
- Avoid excessive number of database roundtrips by selective use of eager loading
- Consider caching frequently used data that's rarely updated
- Learn specific fetching and caching options for your ORM product
Determine a good fetch strategy (Hibernate)
- use "select" fetch mode for relationships needed sometimes - result in 1 or 2 queries
- use "join" fetch mode for relationships needed all the time 0 limit to single collection for one entity since the results for multiple joins could be huge
- Fetch mode can be specified statically in the ORM mapping or dynamically for each query
- Capture generated SQL to determine if your strategy generates expected queries
- Use
to prefetch a number of proxies and/or collections - Caching allows you to avoid database access for already loaded entities
Enable shared (second-level) cache for entities that are read-only or are modified infrequently
- Include data when not critical it's kept up-to-date
- Data that's not shared with applications
- Choose an appropriate cache policy including expiration policy and concurrency strategy (read-write, read-only ...)
Consider enabling query cache for query that is repeated frequently and where the referenced table aren't updated very often
Bulk Operations
- Perform bulk operations in database layer if possible
- SQL statements - update, delete
- Stored procedures
- Native data load tools
- From the application layer - use batch operations
- JdbcTemplate - batchUpdate
- SimpleJdbcInsert - executeBatch
- SpringBatch - BatchSqlUpdateItemWriter
- Hibernate - set hibernate jdbc.batch_size and flush and clear session after each batch
SQL Tuning
SQL is the biggest bottleneck when it comes to performance problems
- Capture the problem SQL
- Run EXPLAIN
- Make adjustments
- ANALYZE
- Tweak optimizer
- Add index
- Repeat until adequate performance
Capture SQL Statements
- Use JPA - add to LocalContainerEntityManagerFactoryBean
- Using Hibernate - add to LocalSessionFactoryBean
- Alternative using Hibernate with Log4J
Database Specific Tools:
- MySQL has a Slow Query Log
Analyze Tables and Indexes
---log-slow-queries and --log-queries-not-using-index
- Use ANALYZE to provide statistics for the optimizer (Oracle)
- Other database use similar commands
- Learn how the optimizer works for specific database
- Capture your SQL and run Explain/Autotrace on the slowest statements
- Review your DataSource and Transaction configurations
- Work with your DBAs to tune applications and database
No comments:
Post a Comment