El Rincon - Marcelo's Corner

Friday, February 13, 2015

Java SE 8 for the Really Impatient Book Review

Last year I started working on Java 8, but knew very little of it. I needed a way to dive into the new features, best practices, while working on a project that was going on production with a hard deadline. I decided to grab the Java SE 8 for the Really Impatient by Cay S. Horstmann. I put so many tabs on so many great topics/ideas that a picture is well worth it.

Few things that I picked up from the book is composing Optional value functions with flatMap

The chapter that I focused was on streams. In my opinion, is the best feature from Java 8. Here, I learned how to transform a stream into a Map result and some of the issues that you have to keep in mind. For example, if there is more than one element with the same key, the collector will throw an IllegalStateException.
You can override the behavior by supplying the a third function argument that determines the value for the key, given the existing and new value.
Get the book! I'm sure you guys will enjoy it.

Friday, January 9, 2015

Java 8 Lambdas Book Review

O'Reily's Java 8 Lambdas by Richard Warburn is a "must have" if you are starting with Java 8 or functional programming. The book covers the different features of Java 8, and how using lambdas make you a better programmer. Here are some topics that catched my attention:

Streams - separate the "what" from "how"
Patterns to refactor using lambdas
Higher order functions
Lambda expressions can be used to make many existing design patterns simpler and more readable, especially the command pattern
Lambda-Enabled Concurrency
Lambda-Enabled SOLID principles

I recommended the book to my team and in my Meetup group - I'm the organizer of the Miami JVM Meetup. I use the book constantly while doing code reviews. Again, any developer, team leader, or architect who is starting to use Java 8 should get this book. I really enjoyed the ending and finally understood the Reactive Programming approach. As the author mentioned,

The critical design tool for software development is a mind well educated in design principles. It is not...technology. - Craig Larman

Monday, December 29, 2014

BDD in Action - Book Review

Finished reading Manning's BDD in Action (behavior-driven development) by John Ferguson Smart which I found it very insightful. There are four things that struck me about this book:

Don't write unit tests, write low-level specifications
Favor outside-in development
Learned about Spock framework
There is a difference between a story and a feature

I used RSpect before but without a clear understanding of BDD, so I wrote unit tests (test scripts) rather than low-level specifications. The book explains why BDD is important along with details steps and examples. BDD is when we write behavior and specification that then drive the software. One of the key goals of BDD is to ensure that everyone has a clear understanding of what the project is trying to deliver, and of the underlying business objective of the problem. BDD is TDD but with better guidelines or even total new approach to developing. This is why wording and semantics are important: the tests need to clearly explains the business behavior they're demonstrating. It encourages people to write tests in terms of the expectations of the program's behavior in a given set of circumstances.

When writing user stories we were told to use this template:
As a <stakeholder> I want <something> so that I can <achieve some business goal>.

Once you have a story, then you have to explore the details by asking the users and other stakeholders for concrete examples.

In BDD the following notation is often used to express examples:
Given <a context>: describes the preconditions for the scenario and prepare the test environment
When <something happens>: describes the action under the test.
Then <you expect some outcome>

Example:
Story: Returns go to stock

In order to keep track of stock
As a store owner
I want to add items back to stock when they're returned

Scenario 1: Refunded items should be returned to stock
Given a customer previously bought a black sweater from me
And I currently have three black sweaters left in stock
When he returns the sweater for a refund
Then I should have four black sweaters in stock

Scenario 2: Replaced items should be returned to stock
Given that a customer buys a blue garment
And I have two blue garments in stock
And three black garments in stock.
When he returns the garment for a replacement in black,
Then I should have three blue garments in stock
And two black garments in stock

As the book Specification by Example mentioned, instead of waiting for specifications to be expressed precisely for the first time in the implementation, successful teams illustrate specifications using examples. The team works with the business user or domain experts to identify key examples that describes the functionality. During this process, developers and testers often suggest additional examples that illustrate the edge cases or address areas of the system that are particular problematic. This flushes out functional gaps and inconsistencies and ensure that everyone involved has a share understanding of what needs to be delivered, avoid rework that results from misinterpretation and translation.

Besides understanding the difference of unit tests and specifications, the book also talks about the difference of features vs. user stores. They are NOT the same. A feature is a functionality that you deliver to the end users or to the other stakeholders to support a capability that they need in order to achieve their business goals. A user story is a planning tool that helps you flesh out details of what you need to deliver for a particular feature. You can have features without having stories. Is a matter of fact, a good practice is to summarize the "Given When" sections of the scenario in the title and avoid including any expected outcomes. Because scenarios are based on real business examples, the context and events are usually stable, but the expected outcome may change as the organization changes and evolves the way it does business.

Besides the language syntax, I discovered the Spock framework. It lets you write concise and descriptive tests with less boiler plate code than would be needed using java. The syntax encourages people to write tests in terms of your expectations of the program's behaviors in a given set of circumstances.

Example:

While I was reading this book, two quotes came to my head:

Always code as if the guy who ends up maintaining your code will be a violent psychopath who knows where you live - M. Goldin.

Programs must be written for people to read, and only incidentally for machines to execute - Adbson and Sussman.

The other insightful thing that I learned is that BDD favors an outside-in development approach. Which they include:

Start with a high-level acceptance criterion that you want to implement
Automate the acceptance criterion as pending scenarios, breaking the acceptance criterion into smaller steps
Implement the acceptance criterion step definition, imagining the code you'd like to have to make each step work
Use these step definition to flesh out unit tests that specify how the application code will behave
Implement the application code, and refactor as required

There are many benefits of outside-in development, but the principle motivations are summarized here:

Outside-in code focuses on business value
Outside-in code encourages well-designed, easy to understand code
Outside-in code avoid waste

As I mentioned, I enjoy the book and I found it very insightful. Some (if not all) the ideas of the book has been around for decades. I believe that this book is great for an architect, programmer, testers, project manager, product owner, and scrum masters.

Monday, November 10, 2014

Validate Map Collections via Matchers

I was introduced to Hamcrest Matchers by the 3C team. I am really liking it. Today, I stumbled on a validation of a Map. Here is how I usually solved the problem and then how I solved it using the syntax sugar of Matchers.

Sunday, November 9, 2014

Lambdas and Java 8

Java 1.8 introduces the concept of streams, which are similar to iterators.

Why Lambdas are good for you:

Form the basis of functional programming language
Make parallel programming easier
Write more compact code
Richer data structure collections
Develop cleaner APIs

Lambdas Expression Lifecycle - think of them as having a two stage lifecycle:

Convert the lambda expression to a function.
Call the general function

Streams have two types of operations: intermediate and terminal.
Intermediate operation: specifies tasks to perform on the stream's elements and always results in a new stream.
filter: Result in a stream containing only the elements that satisfy a condition.
distinct: Result in a stream containing only the unique element.
limit: Result in a stream with the specified number of elements from the beginning of the original stream.
map: Result in a stream in which each element of the original stream is mapped to a new value (possibly of a different type).
sorted: Result in a stream in which the elements are in sorted order. The new stream has the same number of elements as the original stream.

Terminal operations initiates processing of a stream pipeline's intermediate operations and produces results.
forEach: Performs processing on every element in a stream.
average: Calculates the average of the elements in a numeric stream.
count: Returns the number of elements in the stream.
max: Locates the largest value in a numeric stream.
min: Locates the smallest value in a numeric stream.
reduce: Reduces the element of a collection to a single value using an associative accumulation function (e.g. a lambda that adds two elements -- in Scala this is the "map" operator).

Mutable reduction operations: creates a container (such as a collection or StringBuilder)
collect: Creates a new collection of elements containing the results of the streams's prior operations.
toArray: Creates an array containing the results of the stream's prior operations.

Search operations
findFirst: Find the first stream element based on the prior intermediate operations; immediately terminates the processing of the stream pipeline once such an element is found.
findAny: Finds any stream element based on the prior intermediate operations: immediately processing of the stream pipeline once such an element is found.
anyMatch: Determines whether any stream elements match a specified conditions; immediately terminates processing of the stream pipeline if an element matches.
allMatch: Determines whether all of the elements in the stream match a specified condition.

Examples:

Refactor:

This has done so many different changes to some of my code. Here are some example of before and after: Before: I wanted to printout some of the results and so I leveraged Spring's CommandLineRunner.

Here's the after code: Another example fetching a collection of records from a database: I can refactored it doing this:

Thursday, October 16, 2014

Changing Java SDK in IntelliJ IDEA 13

We just migrated to Java 1.8. In my personal computer, I installed the JDK 1.8 and make sure that Maven was running fine in using the latest Java version. I'm using a Mac, so when I ran it in Terminal everything worked. However, when I ran it in my IntelliJ IDEA, it said that it was running Java 1.7.

To changed it, go to the following menu: File, Project Structure, then click on Project.
Here is where you can set the SDK for your project. Just change it to the right SDK and that's it.

Happy coding.

Wednesday, September 3, 2014

Learning Cassandra

I just finished reading Practical Cassandra. I enjoyed this book and it helped with my presentation at Rokk3r Labs in Miami Beach. You can tell that Russell Bradberry and Eric Lubow spent sometime thinking about this book. I like that it's straight to the point for a developer, but it is also useful for sysadmins and managers. I enjoyed the troubleshooting and "use cases".

The book mentions, "where Cassandra fits in". This is a question that I constantly get when talking about Cassandra. Many people want to know, "why not [NoSQL database of your choice]?". The short answer is: if you want fast writes, multi-data center support baked into your system, a truly scalable system with tons of metric, then you should consider Cassandra. However, I always follow my answer by saying that the best way to know if Cassandra fits into the role, is to understand it. When I started using it, I had to stop myself thinking about all that I know about data modeling with RDBMS. Most of the stuff that we learned in RDBMS is actually an anti-pattern for Cassandra - normalization, build your model first, index with high-cardinality, leverage joints. Don't think of a relational table, think of a nested, sorted map data structure.

Tunable Consistency and Polyglot Databases

Many people don't understand that you can tune the consistency of Cassandra. The followings are the configuration that you can have for reads and writes:

ANY: is for writes only and ensures that the write will persists on any server in the cluster.
ONE: ensures that at lease one server within the replicate set with persist the write or respond to the read
QUORUM: means the read/write will go to the half of the nodes in the replica set plus one.
LOCAL_QUORUM: it's just like "quorum" except that it is only for the nodes in that data center.
EACH_QUORUM: is like "quorum" but ensures a quorum read/write on each of the data centers.
ALL: ensures that all nodes in a replica set will receive the read/write.

One of the things Cassandra does not do, is joins or ad-hoc queries. This is a something that Cassandra simply doesn't do and other tools do it better (Solr, ElasticSearch, etc). This is what people are calling to Polyglot Data.

Gossip vs Snitch

Practical Cassandra helped me understand the difference between the "gossip" and "snitch" protocol. This is something that I struggled time and time again. Gossip is the protocol that Cassandra uses to discover information about the new nodes. When bringing a new node into the cluster, you must specify a "seed node". The seed nodes are a set of nodes that are used to given information about the cluster to newly joining nodes. As you can imagine, the seed nodes should be stable and should point to other seed nodes.

The snitch protocol helps map IPs to racks and data centers. It creates a topology by grouping nodes together to help determine where data is read from. There are few types of snitches: simple, dynamic, rack interfering, EC2, and Ec2MultiRegion.

Simple snitch is recommended for a simple cluster (one datacenter as one zone in a cloud architecture).
Dynamic snitch wraps over the SimpleSnitch and provides an additional adaptive layer for determining the best possible read location.
RackInferringSnitch works by assuming it knows the topology of your network, by the ocftets in node's IP address.
EC2Snith snitch EC2 Snitch is for Amazon Web Service (AWS)-based deployments where the cluster sits within a single region.
EC2MultiRegionSnitch is for AWS deployments where the Cassandra cluster spans multiple regions.

Node Layout

Prior to Cassandra 1.2, one token was assigned to each node. Whenever you had a node that would have a lot of load of data, that would be consider a "hot spot". Most of the times, you will just add another node to leverage the "hot spot", but then you had the "rebalance" the cluster. Virtual nodes or vnodes, provide a Cassandra node with the ability to be responsible for many token ranges. Within a cluster, they can be noncontiguous and selected at random. This provide a greater distribution of data than the non-vnode paradigm.

Performance

The performance chapter was also another very interesting chapter. Being a developer it introduced me to common *nix tools like vmstat, iostat, dstst, htop, atop, and top. All of these tools provide a picture of usage. It also explained how instrumentations goes a long way. Also, if one node becomes too slow to respond, the FailureDector will remove it.

An easy optimization for Cassandra is putting your CommitLog directory on a separate drive from your data directories. CommitLog segments are written to every time a MemTable is flushed to disk. You can do this setting in the cassandra.yml by setting the data_directory and commitlog_directory.

Metrics

Cassandra goes out of her ways to provide lots of metrics. With all these metrics you can do capacity planning. Once you start getting all these metrics, you'll be able to differentiate trends and be able to proactively add or remove nodes. For example, you can monitor the PendingTask under the CompactionManagerMBean to know the speed and volume with which you can ingest data, you will need to find a comfortable set of threshold for your system. Another example is to monitor the high request latency, which can indicate that there is a bad disk or that your current read pattern is starting to slow down.

These are some of the metrics that you can get via JMX:

DB: monitors the data storage of Cassandra. You can monitor the cache and the CommitLogs, or even information about the ColumnFamily.
Internal: these cover the state and statistics around the staged architecture (gossip information and hinted handoffs).
Metrics: these are client request metrics (timeouts and "unavailable" errors).
Net: these metrics monitored the network (failure detector, gossiper, messaging service, and streaming service).
Request: these are metrics about request from the client (read, write, and replication).

There are still a lot of stuff that I need to learn about Cassandra. Specially about the data model. It's very tricky to start thinking about your queries (pre-optimized queries like Nate McCall calls them). In all, the book does covers the basics .