When concurrency might actually hurt performance!

On my current journey to learn Go and the craft of software development, I am rewriting a CLI tool from Bash to Go. As usual, tools that are supposed to be temporary become permanent, and as a result, it has grown too big to be in Bash. I was lacking ideas for projects, so I just decided to go with it.

One of the core functionalities of this tool is to get a list of systems from an API. It then performs a search on the list based on given system name(s) and presents relevant information about them.

While at it, I thought that it might be worth exploring the possibility of making the search concurrent. Not that performance matters for this particular tool, but because I saw an opportunity to get a bit more out of my comfort zone and experiment with something that is still quite confusing to me. It could potentially improve the tool even though it is uncertain whether the performance gain would be significant enough to justify the implementation. At the end of the day, the worst that could happen is to get the code thrown away and still learn something.

The first test

The search is implemented as a simple linear search function. Therefore, I decided to take the easiest and most obvious approach, making the least amount of changes to the code in order to call the search function concurrently and compare it to the initial code.

For the first attempt, the goal was to test two functions that perform the same task: one non-concurrent and one concurrent. I used a sample list with 65 systems loaded from a JSON file.

As a self-proclaimed lazy noob, instead of completely changing the original function, I created a new one based on the original and made the desired changes.

// Initial Code
func searchName(name string, systems []System) *System {
	for _, s := range systems {
		if s.Name == name {
			return &s
		}
	}

	return nil
}

func SystemsByName(fileName string, names []string) (systems []System, err error) {
	allSystems, err := AllSystems(fileName)
	if err != nil {
		return nil, err
	}

	for _, name := range names {
		if system := searchName(name, allSystems); system != nil {
			systems = append(systems, *system)
		} else {
			return systems, fmt.Errorf("%w: %s", ErrorNameNotFound, name)
		}
	}

	return systems, nil
}

// The new funcion
func GoSystemsByName(fileName string, names []string) (systems []System, err error) {
	allSystems, err := AllSystems(fileName)
	if err != nil {
		return nil, err
	}

	systemsChan := make(chan System, len(names))
	errorChan := make(chan error, len(names))
	var wg sync.WaitGroup

	for _, name := range names {
		wg.Add(1)
		go func(systems []System, name string) {
			defer wg.Done()

			if system := searchName(name, allSystems); system != nil {
				systemsChan <- *system
			} else {
				errorChan <- fmt.Errorf("%w: %s", ErrorNameNotFound, name)
			}
		}(allSystems, name)
	}

	go func() {
		wg.Wait()
		close(systemsChan)
		close(errorChan)
	}()

	for system := range systemsChan {
		systems = append(systems, system)
	}

	if len(errorChan) > 0 {
		return systems, <-errorChan
	}

	return systems, nil
}

For the benchmark, I randomly searched for some names that I knew existed in the list, as well as one name that didn’t exist.

// Benchmarking
func BenchmarkSystemsByName(b *testing.B) {
	for i := 0; i < b.N; i++ {
		_, _ = SystemsByName("randomNameX", "randomNameY", "randomNameW", "randomNameZ", "foo")
	}
}

func BenchmarkGoSystemsByName(b *testing.B) {
	for i := 0; i < b.N; i++ {
		_, _ = GoSystemsByName("randomNameA", "randomNameY", "randomNameB", "randomNameG", "foo")
	}
}

The results:

$go test -bench=.
goos: linux
goarch: amd64
pkg: toolbox/pkg/govcm
cpu: 11th Gen Intel(R) Core(TM) i5-1135G7 @ 2.40GHz
BenchmarkSystemsByName-8            2815            395575 ns/op
BenchmarkGoSystemsByName-8          2424            491929 ns/op
PASS
ok      toolbox/pkg/govcm       2.409s

Non-Concurrent: 2815 operations, with an average of 395575 ns per operation.

Concurrent function: 2424 operations, with an average of 491929 ns per operation.

Well… This was unexpected! The non-concurrent version of the search performed better than the concurrent one. But… this doesn’t seem right. The concurrent search should be faster, right? What am I doing wrong?!

As someone who is driven by curiosity, I made the decision to delve further and explore the next most apparent thing.

Moar Data!

Maybe its just my code that sucks, but for sake of simplicity, let’s just assume not, after all, when making the function I added some prints to it and actually saw it working as intended and it passed all the unit tests in the original code! When did this ever go wrong riiiiight? let’s move on …

The next most obvious thing is to test it with more data and try to look for a change in the benchmark values. From my understanding, concurrency also adds some overhead, so it may not provide any benefits for smaller amounts of data.

After reading a bit more about benchmarking in Go and finding benchstat, I decided to change the way I was doing the benchmark a bit, ensuring that everything is tested in the same way, making it easier to analyze the results. I refactored the tests to only test the SystemsByName() function, and the concurrent and non-concurrent functions have the same name, with one of them always commented out depending on what I want to test.

For the test cases, I decided to benchmark against datasets with 100, 1000, 10000, 100000, and 1000000 entries. I will search for the first element in the list, an element in the middle of the list, the last element of the list, and a non-existing element.

package main

import "testing"

const (
	file100 = "data/systems_100.json"
	file1000 = "data/systems_1000.json"
	file10000 = "data/systems_10000.json"
	file100000 = "data/systems_100000.json"
	file1000000 = "data/systems_1000000.json"
)

// Benchmarking
func BenchmarkNameSearch(b *testing.B){
	b.Run("100 Systems", func (b *testing.B) {
		names := []string{"system0", "system50", "system99", "foo"}
		for i := 0; i < b.N; i++ {
			_, _ = SystemsByName(file100, names)
		}
	})

	b.Run("1000 Systems", func (b *testing.B) {
		names := []string{"system0", "system500", "system999", "foo"}
		for i := 0; i < b.N; i++ {
			_, _ = SystemsByName(file1000, names)
		}
	})

	b.Run("10000 Systems", func (b *testing.B) {
		names := []string{"system0", "system5000", "system9999", "foo"}
		for i := 0; i < b.N; i++ {
			_, _ = SystemsByName(file10000, names)
		}
	})

	b.Run("100000 Systems", func (b *testing.B) {
		names := []string{"system0", "system50000", "system99999", "foo"}
		for i := 0; i < b.N; i++ {
			_, _ = SystemsByName(file100000, names)
		}
	})

	b.Run("1000000 Systems", func (b *testing.B) {
		names := []string{"system0", "system50000", "system999999", "foo"}
		for i := 0; i < b.N; i++ {
			_, _ = SystemsByName(file1000000, names)
		}
	})
}

Now that I have determined how I want to benchmark, I just need the actual data to run the benchmarks against. So, I have created a quick and simple Python script to generate multiple JSON files with different numbers of entries. This data also resembles the data returned by the original API.

#!/bin/python

import json
import uuid
import random
import sys

total = int(sys.argv[1])
file=(f'systems_{total}.json')
regions = ["eu-north-1", "eu-west-1", "eu-west-2", "us-west-1"]
states = ["STATE", "PLAN_FAILED", "RESUME_FAILED", "PAUSED", "STORAGE_CREATION_FAILED", "FOO_FAILED"]
auxStates = ["PLANING", "RESUMING", "PAUSING", "TO_BE_PLANED", "TO_BE_PAUSED"]
types = ["team", "lab"]
versions = ["1.2.3-gec414e3c24", "2.3.2-bde194c123a", "3.4.0-foobarbaz"]
items = []

for i in range(total):
    item = {
        "id" : str(uuid.uuid4()),
        "name" : (f'system{i}'),
        "region" : random.choice(regions),
        "state" : random.choice(states),
        "auxState" : random.choice(auxStates),
        "apiEndpoint" : (f'https://system{i}.mydomain.com'),
        "systemType" : random.choice(types),
        "systemVersion" : random.choice(versions)
    }
    items.append(item)

data = {"items": items}

with open(file, "w") as f:
    json.dump(data, f, indent=2)

Now that I have the data, it’s time to run the tests. This time, I will also run them with different settings: default, 20 seconds, 30 seconds, 20 times, and 30 times. It is important to note that the test environment is not perfect, but neither is the world, and I am not looking for super detailed results. All I want to find are some hints about whether it’s worth using concurrency on linear searches and, if so, at what point.

Benchstat

I will not provide all the results here, but the results were consistent across all the different benchmark settings.

$benchstat test_default.txt test_concurrent_default.txt
goos: linux
goarch: amd64
pkg: concurrency
cpu: 11th Gen Intel(R) Core(TM) i5-1135G7 @ 2.40GHz
                             │ test_default.txt │     test_concurrent_default.txt      │
                             │      sec/op      │    sec/op     vs base                │
NameSearch/100_Systems-8           248.9µ ± ∞ ¹   262.5µ ± ∞ ¹       ~ (p=1.000 n=1) ²
NameSearch/1000_Systems-8          2.445m ± ∞ ¹   2.604m ± ∞ ¹       ~ (p=1.000 n=1) ²
NameSearch/10000_Systems-8         26.01m ± ∞ ¹   26.33m ± ∞ ¹       ~ (p=1.000 n=1) ²
NameSearch/100000_Systems-8        259.4m ± ∞ ¹   259.0m ± ∞ ¹       ~ (p=1.000 n=1) ²
NameSearch/1000000_Systems-8        2.553 ± ∞ ¹    2.534 ± ∞ ¹       ~ (p=1.000 n=1) ²
geomean                            25.36m         25.97m        +2.42%
¹ need >= 6 samples for confidence interval at level 0.95
² need >= 4 samples to detect a difference at alpha level 0.05

Even though the benchmark results indicate that the non-concurrent search is generally more efficient than the concurrent search, it is possible to observe that an inversion starts to occur after reaching 100,000 items. At this point, the concurrent search becomes slightly more efficient, although the difference is not significant.

However, just because concurrency is widely available and relatively easy to implement, it does not mean that we should always use it. We should be mindful of when and how we should use it because it may actually harm performance.

Wait, Karen, wait, before you start throwing rocks… when I say performance, I don’t just mean code performance. Your team’s performance may also suffer. When opting for a concurrent solution, your team now has to deal with the added complexity in the code when things go wrong, without any real benefit. One could argue that when dealing with data sets larger than 10k items, it may be better to study and learn about other algorithms instead.

Perhaps that’s a story for another time!

If you’ve made it this far in this article, congratulations! I hope it was at least useful. To me, these results still don’t make a whole lot of sense, but I guess it’s time to study a bit more.

Are my results far off from reality? Am I completely wrong? What am I missing? Feel free to reach out to me and let me know what you think!

The code and test results can be found here