— BEN BLOCK —

Case Study

Learning from Re-architecture Gone Bad

The Problem

Engineers die for re-architecture projects. They love them, they dream about them, and they all want to dive into one. After all, who wouldn’t want to design and build his or her own system from the ground up. This is especially true, I’ve found, when the engineer in question wasn’t part of the legacy build that the re-architecture is intended to replace. The legacy system is always viewed as a drag on productivity -- oh, if we only we had the chance to build everything from the ground up, life would be so grand!

Some time ago I got to sponsor a project that was, in fact, so grand. A new VP of Engineering I hired convinced me and then successfully lobbied the rest of the executive team around the glories of a new environment. He won the mandate to lead the project and went away to plan the perfect new platform.

The Solution

He had it all planned out, he wanted to use Ruby because it was the hot new language. The legacy system was written in PHP and we both thought it would be easy to hire Ruby devs, plus our existing team would appreciate the opportunity to learn a new language. The main mandate from the executive team was that performance be the primary consideration from an architectural standpoint. The new platform had to be fast, lightning fast.

We ran some benchmarks and found that the typical Ruby environment was not going to be fast enough, in fact, it did not bench very well when compared to our existing environment which was plain old php-cgi.

So we needed something to juice things up, enter the JVM. Our system administrator suggested we look at deploying our app into a JVM, which he thought we could tune successfully to provide the absolute best performance for the end user. However, as said, nobody wanted to write any Java code, we were all set on Ruby.

Fortunately (at the time), a new application server known as TorqueBox had just be released in a stable open distribution. This allowed Ruby apps to be deployed and run in the JVM. Our prayers were answered, we were going to build the fastest Ruby app ever.

The Result

Disaster. Well, at first, everything was fine. Our plan was to replace our existing website one page type at a time. The first page type took about 3 months to develop. We knew there would be a learning curve and this would take some time. Unfortunately, once the page was build everything went sideways. We started running stress tests against the new app and it couldn’t handle even a fraction of the traffic we were going to need in production. To keep it from crashing we needed the JVM’s heap size to run 30GB or so, larger than we had on many of our production web servers running good old PHP. Weeks went by as we refactored the code, switched out DB drivers, experimented with different caching schemes.

As our self imposed deadline slipped, the team started to lose confidence in the architectural choice and quietly began to oppose the project altogether. I actually had developers coming to me saying we should go back to the legacy system.

We canceled the project and decided to re-architect the code base using most of the existing stack, upgrading everything of course. The lesson for all of us was clear however: don’t build on what you don’t know. We had a high traffic web app that peaked at hundreds of requests per second, there wasn’t a lot of room for surviving learning on the job and growing pains. Bottom line, I’ll never rebuild a platform in an environment my team and I don’t know inside out again. Lesson learned.

Get in Touch

info@
benblock.com

New York,
New York

linkedin.com
/in/bblock