Using Test Driven Development and College Algebra to Improve Your Software

Chris Breazeal, April 28, 2015

What did you do with your old College Algebra text book after the semester was over? Tear all the pages out? Throw it away? Burn it? Hopefully after reading this article you'll wish you had kept it!

As both a technologist and a College Algebra teacher, I'm always looking for and pleased when I find a single problem that is excellent for teaching both programmers and college algebra students at the same time. For programmers, demonstrating concepts such as Test Driven Development (TDD), and for college algebra students, the concepts being taught in class. I found such a question when reviewing an application to a Data Science boot camp. I've modified the problem slightly for clarity but the intent is the same.

Write a method that accepts a positive integer as a parameter and returns the sum of all positive integers up to but not including that parameter that are divisible by 3 or 5.

The academy hosting the boot camp was asking applicants to write a program, in the language of their choice, to solve the problem posed by the question as part of a "Test your skills" section.

If we use an example to illustrate, where the parameter is 1000, the total we're looking for is:


3 + 5 + 6 + 9 + 10 + 12 + 15 + 18 + 20 + 21 + ... + 995 + 996 + 999

A Loop Will Solve this Problem

I choose Java as my language and proceeded using Test Driven Development. I first created a test for an input of 2, saw the test run red, then did the simplest thing possible, i.e., change the method that calculates the sum to return 0, so that the test runs green. I slowly added tests and constantly re-factor everything to build up a suite of tests like the ones shown here:


private SumCalculator sumCalculator;
    
@Before
public void setUp() throws Exception {
    sumCalculator = new SumCalculator();
}

@Test
public void test2() {
    assertEquals("sum up to 2 is 0", 0, sumCalculator.sumUpTo(2));
}

@Test
public void test3() {
    assertEquals("sum up to 3 is 0", 0, sumCalculator.sumUpTo(3));
}

@Test
public void test4() {
    assertEquals("sum up to 4 is 3", 3, sumCalculator.sumUpTo(4));
}

The SumCalculator class contains a method sumUpTo, that takes an integer as input and returns the sum of all positive integers, up to but not including the input, that are divisible by 3 or 5.

When finished the entire set of tests were as follows and all tests ran green:

Test Case InputExpected Result
20
30
43
53
68
714
914
1023
1133
1545
1660
31225
1000233,168

The final method looked like this:


public int sumUpTo(int max) {
    int sum = 0;
    for (int i = 3; i < max; ++i ) {
        if (i % 3 == 0 || i % 5 == 0) {
            sum += i;
        }
    }
    return sum;
}

The approach was to form a loop, iterating over the positive integers from 3 up to but not including max. We start at 3 since 1 and 2 will never be evenly divisible by either 3 or 5. A variable to hold the sum is created and initialized, and if a number is encountered that is evenly divisible by either 3 or 5, we add that number to the sum.

However this approach has problems. First is the potential for integer overflow. If the incoming value for max is large enough, the value accumulated by sum will overflow its integer buffer. I can fix that by simply changing sum and the return type to long.


public long sumUpTo(int max) {
    long sum = 0;
    for (int i = 3; i < max; ++i ) {
        if (i % 3 == 0 || i % 5 == 0) {
            sum += i;
        }
    }
    return sum;
}

The second problem is that the method is horribly inefficient for large numbers. A quick timing test passing in 1000000000 for the max takes, on average, 5900 milliseconds for the method to complete.

For curiosity sake I timed the method again by passing in an even larger number, Integer.MAX_VALUE, which of course in Java is 231-1, or 2,147,483,647. This time the method took over 12 seconds to complete on average. Not good. Obviously the brute force approach I've applied by using a loop over the positive integers, while it produces the correct answer, is so slow that it makes the method unusable for sufficiently large numbers.

So how can the method be improved so that it's faster? And not just faster, but fast enough so that it's usable.

College Algebra to the Rescue

This is where College Algebra comes in. Let me introduce you to arithmetic sequences and series.

An arithmetic sequence is a sequence in which each term after the first differs from its predecessor by a fixed constant d, called the common difference. So the common difference is found by d = an+1 - an.

The nth term of an arithmetic sequence is given by an = a1 + (n - 1)d where a1 is the first term of the sequence and d is the common difference.

The sum of the first n terms of an arithmetic sequence is called a finite arithmetic series and is denoted by Sn and can be found by the following formula:

Sn =
n(a1 + an) / 2

where a1 is the first term of the sequence, an is the nth term, and n is the number of terms in the sequence.

The positive integers divisible by 3, up to a given number, form a finite arithmetic sequence, as do the positive integers divisible by 5. Thus, the sum of the first n terms of the positive integers that are divisible by 3 form a finite arithmetic series. Likewise, the sum of the first n terms of the positive integers that are divisible by 5 also form a finite arithmetic series.

That means we can use the above information to solve the problem, and hopefully in an efficient way.

But we have a major re-factoring job to do and the benefits of doing Test Driven Development will pay off nicely. It is invaluable to have a suite of tests in place that will ensure that our re-factor is correct.

So, to find the sum of the first n terms of an arithmetic sequence, I need the first term, a1, the last term, an, and the number of terms, n.

Let's take an example

Let's focus only on the sequence of the numbers that are divisible by 3 up to but not including 1000. That series looks like this:


3 + 6 + 9 + 12 + ... + 999

The above shows that a1 = 3, the common difference is 3, and an = 999.

How many terms are we adding in the example? If we look at the formula an = a1 + (n-1)d and solve it algebraically for n, we end up with

n =
an - a1 + d / d

Plugging in the numbers from the example yields

n =
999 - 3 + 3 / 3
= 333

The above formula can be reduced since a1 = d = 3, so we end up with a neat formula to find the number of terms in a finite arithmetic sequence:

n =
an / d

We'll use this later.

Also in our example it's pretty easy to see that the nth term is 999, but in general, to find the nth term, take the passed in parameter, subtract 1, divide by 3, truncate any fractional part, then multiply that result by 3.

Here's what that looks like in Java code, where d = 3:


int an = (max - 1) / d * d;
Since division and multiplication are of the same precedence, Java will work left to right, first performing the division, i.e., integer division of which will truncate the fractional part, then multiply that result by d to achieve the largest integer multiple of d that is less than max.

Now we're moving. Back to our formula for finding the sum of an arithmetic series, Sn:

Sn =
n(a1 + an) / 2

and substituting an/d for n results in

Sn =
an(a1 + an) / 2d

And since a1 = d

Sn =
an(d + an) / 2d

This is the formula that will be implemented in our code.

To show it in use, the sum we saw earlier


3 + 6 + 9 + 12 + ... + 999

is equal to

Sn =
999(3 + 999) / 2(3)
= 166833

Now we have to repeat for the finite arithmetic series formed by the integers divisible by 5.


5 + 10 + 15 + 20 + ... + 995

Sn =
995(5 + 995) / 2(5)
= 99500

So we should be able to add 166833 and 99500 to get the final answer, right? Wrong.

Putting it all together

In each sequence we have counted the multiples of 15, thereby counting them twice, so we need to subtract those numbers once to even things out.


15 + 30 + 45 + 60 + ... + 990
Sn =
990(15 + 990) / 2(15)
= 33165

Finally, when the parameter is 1000, we should get 166833 + 99500 - 33165 = 233168.

So, in the Java code, create a private method series to calculate the sum of a finite arithmetic series, and call that method from sumUpTo for each of the series we need to find:


public long sumUpTo(int max) {
    return series(max, 3) + series(max, 5) - series(max, 15);
}

private long series(int max, int d) {
    long an = (max - 1)/d * d;
    return an * (d + an) / (2*d);
}

To show that my algebra is sound and my new code accurately produces the correct answer, I need to verify it by running all of the tests that I constructed at the beginning of the exercise. Doing so, all tests run green so I'm confident the refactor is correct and the code is calculating correctly.

Now for the important part: how much did we improve the speed?

Re-running the timing tests specified previously and comparing to our first implementation, we get the following results:

max Using a loop Using a series
100,000 7 ms < 1 ms
10,000,000 61 ms < 1 ms
1,000,000,000 5900 ms < 1 ms
Integer.MAX_VALUE 12200 ms < 1 ms

Quite a drastic and amazing improvement! The re-factored code using arithmetic series concepts runs in less than 1 ms for any integer value passed in, including Integer.MAX_VALUE! How much time could be saved in an application that uses the method millions of times a day? It's a huge win!

Conclusion

With this exercise I not only got to experience the benefits of TDD by using it to drive out a solution but also create a suite of tests that protect the code from re-factors. I also got the satisfaction of applying my College Algebra knowledge to re-factor inefficient code into an amazingly fast solution. It's very fun to see an elegant algorithm completely outperform such a brute force solution.

If you were a member of the Data Science academy responsible for evaluating applications, would you favor an applicant who turned in the brute force loop or the arithmetic series implementation? As for me, I know which I prefer.

So, next time you encounter an arithmetic problem that you're tempted to implement a brute force solution, pull out your old College Algebra text book and see if you can find a more elegant solution.

Reach Chris at

chris@breazeal.com
Chris @ Linkedin
Tweet