Common Performance Testing Mistakes

At my lab recently we have all been benchmarking our applications. Most of our work has to deal with throughput in distributed applications. A lot of the time I see people making common mistakes when performing tests so I decided to blog about it.

  1. Never test from the same machine, your testing program is taking away significant resources from your application!
  2. Is concurrency being tested (depends on your goals, most likely, YES)
  3. If testing for real world applications, network latency is a HUGE factor in performance and affects applications in many different scenarios, some quick tips are:
    1. Never test using a wireless network unless that is part of your tests!
    2. Make sure you are not hitting your network cap or your packets are being changed by your internet provider
    3. If testing a cloud service, do not test from the same service since there could be no network/connection delay. This can be due to several factors but it could be simply because the test tool might be under the same virtual machine as the application!
    4. If you are using a reverse proxy for spoon feeding, make sure you test with and without it!
  4. Most importantly, calculate performance based on real values instead of approximations. Most of the times approximations are NOT true as they are extrapolated to higher values.

In the web, performance is very important, if you don’t think so, ask Google: http://glinden.blogspot.com/2006/11/marissa-mayer-at-web-20.html

Marissa started with a story about a user test they did. They asked a group of Google searchers how many search results they wanted to see. Users asked for more, more than the ten results Google normally shows. More is more, they said.

So, Marissa ran an experiment where Google increased the number of search results to thirty. Traffic and revenue from Google searchers in the experimental group dropped by 20%.

Ouch. Why? Why, when users had asked for this, did they seem to hate it?

After a bit of looking, Marissa explained that they found an uncontrolled variable. The page with 10 results took .4 seconds to generate. The page with 30 results took .9 seconds.

Half a second delay caused a 20% drop in traffic. Half a second delay killed user satisfaction.

Pomodoro Productivity Tools

I have been using GTD for a couple of years and it works very well for me. To get focused on activities I use a combination of tools. Most of the tools are not bound to my computer because I found myself checking them all time. Instead, the tools are on my mobile devices because I can take a quick glance at them whenever I want.

I recommend the following tools:

Due for iPad, iPod touch or iPhone (needs iOS 4).
Due application

Pomodoro for Windows Phone 7. Here is a link to a description. The application is free and you can easily get it on the marketplace by typing “pomodoro”.
Pomodoro Windows Phone 7

PowerShell

PowerShell is an amazing framework designed to streamline tasks. The main benefits of PowerShell is that it built and integrated to the .NET framework and provides core functionaly to COM and WMI.

If you have done any siginificant work with Linux then most likely you are experienced with Bash Scripts and in DOS batch files, of course there are also PowerShell scripts.

In addition, in PowerShell, there are small programs called cmdlets. The programs are then compiled into a DLL which then can be loaded as part of another script. It is a way of reusing and pipelining processes (think pipes and filters pattern) with objects instead of text input!

If you are used to Bash/MS-DOS it will make you happy to know most of the typical commands have aliases and you can use them right away!

  • List all files = ls/dir
  • Current Dir = pwd/cd
  • Read file = cat/type
  • Clear screen = clear/cls
  • Copy = cp/copy
  • Move = mv/move
  • Delete =rm/rmdir/del
  • Rename = mv/ren/rename
  • Write = echo
  • List processes = ps/tlist/tasklisk

PowerShell is great because if you have some experience with .NET and scripting you are a pro right away.  To conclude, here my favorite script for parsing long log files courtesy of PowerShell.com:

foreach($line in (Get-Content $env:windirwindowsupdate.log `
-ReadCount 0 -Encoding UTF8)) `
{ if ($line -like '*successfully installed*') `
{ ($line -split ': ')[-1]}}

Testing Code Contracts .NET 4.0

Why use Code Contracts?

By providing pre-compiled code contract interfaces other developers can adhere to signatures and also expected behavior. This is specially important due to the Liskov substitution principle.

Dino Esposito wrote a great article on the topic called Code Contracts Preview: Interfaces.

Testing

To test that all the right contracts are in place a test project can be created.

Testing preconditions is possible by catching the exceptions created by those preconditions.


        [TestMethod]
        [ExpectedException(typeof(ArgumentOutOfRangeException))]
        public void ModelNegativeBlance()
        {
            Account acc = new Account()
            {
                AccountName = "NewAccount",
                //0 or positive is expected
                Balance = -99,
                CreationDate = DateTime.Now

            };
        }

Testing postconditions is a bit tricky because the exceptions raised by postconditions are not meant to be caught. Therefore plain strings have to be used.


    public static class TestHelpers
    {
        public static string ContractExceptionName = "System.Diagnostics.Contracts.__ContractsRuntime+ContractException";

    }
    public class RepositoryTests
    {
        public RepositoryTests()
        {

        }

        [TestMethod]
        public void InsertModelWithNoParitionKeyDueToBadRepository()
        {

            var repo = SetBadRepo();
            Account acc = new Account()
            {
                AccountName = "NewAccount",
                Balance = 0,
                CreationDate = DateTime.Now

            };

            try
            {
                repo.InsertAccount(acc);
            }
            catch (Exception ex)
            {
                Assert.AreEqual(TestHelpers.ContractExceptionName, ex.GetType().FullName);
            }
        }
    }

Do not use $_SERVER['PHP_SELF']

I have seen many times that logos and links to a home page of a site use the following:

<?php echo $_SERVER['PHP_SELF']?>

Sadly, its is very dangerous because it might carry additional trailing data. This is specially an issue when doing SEO.

For example:
It will work fine for: /moo/daaa/boo.php
It will possibly redirect to self: /moo/daa/boo.php/mooo/daa.htm

According to some people the behavior is erratic on different configurations.

To prevent all problems I recommend using __FILE__. The following solution which works on PHP 4.0.2 and up:

<?php echo basename(__FILE__) ?>

Azure VS Heroku – FIGHT!

About 2 years ago I started playing with distributed code and large scale architectures. Recently most of my development time has been under Azure and Heroku. The following are my thoughts on when either Azure or Heroku is better under specific circumstances.

What I like about Azure:

  • Python/Ruby/PHP/ETC
  • Lots of built in options to fit storage needs – Blobs, SQL Azure, Table Storage and Queues
  • Great for working with interns/students with .NET
    • If the project does not compile the IDE will not let them deploy!
    • Amazingly fast growth of learning resources for them to learn and use
    • IntelliSense to get them started quickly
    • When a new intern arrives we can use the nice built in debugger to step in code explain how things work

What I like about Heroku:

  • Free development environment
  • GIT for deployment
  • RubyGems
  • Deployment and startup time is fast (compared to Azure)

I make my choice  based on of the following questions:

Will interns/students have to work with this project? If interns are involved I most likely choose Azure because they are bound to make less mistakes. They have more resources everyday including MSDN and also they can learn by poking around with IntelliSense.

What is the budget? If budget is an issue, Heroku is a great service. The fact is that Azure is kinda expensive even during development stages. Developing a big distributed architecture on Azure is not free. The good part: Microsoft has listened to a lot of people and is currently having a few free trial promotion in the USA.

Development speed of middleware components? Heroku  - in my personal opinion it is faster and cleaner to develop middleware components under Sinatra than on the .NET counterparts.

What libraries and languages are necessary? If it is a collaboration project where multiple languages and libraries are used, Azure has tools to fit other developers. For example the new Azure PHP tools are coming up nicely.

Conclusion

Those are just a few of the points that I use to decide if Azure or Heroku are the right service for a project. After all, both have great strengths which have to be leveraged to maximise the returns. The right tool for the job.

What’s wrong with Captchas?

Captchas are the very annoying things where you have to type a set of letters and numbers in order to prove that you are human rather than robot. Why are they necessary? Because so many things have become automated and some companies don’t want to provide service to robots or waste their resources on fake requests. Similar to Turing tests, but still very different.

Existing captchas are generally in the form of text presented as images. (There are other methods but this seems to be the most popular one at the moment.) This in itself presents a heck of a lot of problems. The biggest problem is that they’re not accessible to people who can’t see and depend on screen readers. The workaround has been to provide an audio version of the text but have you tried listening to the sound clips? They’re read so fast and there’s so much background noise that I don’t know how you can figure out what they’re saying. I suppose that if you’re used to using screen readers (which can read at super high speeds), you may have less problems hearing the text.

Another big problem with the captchas is that even if you can see the image, you’re not always sure what the text says. There are algorithms in place that distort the text so that a computer can’t use image recognition on it to detect words. In attempting to fool a computer, it’s also fooling a human being. It can get quite frustrating and time consuming to type the words numerous times just to complete what should have been an easy task (like logging in).

Technology has become so advanced that it is no longer about trying to tell if you are human or a robot, but rather that we have to prove that we are human. The burden has been put on us to prove that we’re not a robot. The existing solutions are a good attempt at keeping away from robots but they’re also keeping away humans. I know that there is research happening that is trying to resolve the problems such that it would be easy for humans but difficult for robots (Captcha.net and this paper from Towson University and University of Notre Dame have some examples). It is an extremely difficult problem and may take a long time to completely solve.

Recently, I had gone to Hong Kong and while there, Facebook required me to prove that I am me since I was logging in from an “unusual” location. It showed me a picture of one of my friends on my Facebook friends list and asked me who was in the picture. This process was repeated numerous times until the system decided that I really was the person I said I was. I thought this was a really good way of proving our identity. Yes, it would be frustrating to go through this every time we logged in but I feel it is heading towards a good direction. It’s a better solution than those security questions which we don’t always remember the answers of (even when we created them ourselves). It requires access to your personal information, which you don’t want all websites to have access to. The solution is not perfect and does not work for all situations, but it certainly is unique.

All your clients should go on a diet

During the last few months my graduate supervisor and I have been doing a lot of demos to the major canadian telecommunications companies. In the demos we show different devices interacting with each other (which is something people love at demos). One of the things we get asked fairly often is: “How long did it take to build each one of these applications?” and our usual answer is about less than a day. This usually shocks people.

How can this be done? Simple, we use thin clients and the cloud (or some robust and scalable servers).  By having outsourced the client logic to the cloud it is possible to build thin clients.  The thin clients just do a few data calls (or get their data pushed into them) to exchange information.

Of course, this requires extra effort at front when building the server logic but it saves A LOT of work later.

Just as a note, in all cases many client side optimizations should not be overlooked. As an example, client side caching is a good thing when possible.

Don’t copy-and-paste code. Don’t use debug driven development!

I see a common copy-and-paste trend specially with junior developers. Many of them just copy-and-paste random internet code into their projects. After a while I ask them about their implementation… very few understand what they copy-and-paste and the implications of that implementation!

I inquired further and it was clear that many used debug driven development to get the copy-and-paste code to work. Debug driven development is basically 90% random changes and 10% thinking. This is a popular trend within .NET developers with visual studio because the IDE is amazing…very little planning is required to get things to work.

After a few weeks of seeing this trend I did the following:

If they asked me a question that I believe is due to copy-and-paste or debug driven development I reply with the following questions

  1. Did you google the problem? (some  of them didn’t google the problem!)
  2. Did you copy and paste code? Is that code giving you the error? If you did copy and paste, what does the code do?
  3. What are 2 different approaches to this implementation and why did you choose this implementation?

After answering the questions above, they were able to solve the problems themselves. That makes them happy and it makes me happy!