You never know what rabbits you’ll get out of the magician’s hat. This time we’ve got two of them.

Meet the twins: Beanie and Bunnet.

About two years ago, in my previous blog posts, I outlined the mechanisms by which you can automate the serialization of Pydantic data objects into a MongoDB. Fortunately, I didn’t have to implement the object-document mapping (ODM) library all the way to the end. Someone has made this thing ready in the meantime, for us everybody!

Beanie - alongside its buddy, Bunnet - is exactly the library I envisioned back then when I sketched mechanisms for connecting Pydantic objects with MongoDB. Or maybe the library already existed at the time, but it just didn’t catch my eye. Well, never mind. Then it only means I don’t bear any responsibility for what crawled out of the magical hat.

Bunnet provides a layer for data objects based on Pydantic, which allows them to be stored and loaded from the database in a simple and synchronous way. Bunnet has also a twin, Beanie, which is an asynchronous version of the same library. But should you use Bunnet or Beanie?

When is synchronous better?

There are many use cases for synchronous processing. Namely, in data processing, as in many other things, it is often the case that you can’t start the next thing until the previous one is finished. For example, you can’t start tying your shoelaces until you’ve put the shoes on, and so on. Or of course you can try it, but then you can go for a walk some other time.

In addition to putting on clothes, you may have to convert raw data from some measuring device along the way into reduced format that can be more easily displayed in graphs. This may also have to be done sequentially, step by step, as the data is refined into its final form.

This was also the case in a playful example from a previous blog post of mine a couple of years ago, where log data from the washing machines is processed. This can be applied to anything. For example, for processing data from devices that measure electricity consumption.

In this case, the background process can continue to convert the measurement data to different derivatives in its own isolated container. A synchronous operating model can be suitable for this, when there is one CPU per container and each container processes a separate dataset.

What about when there are many measuring devices?

When we consider that the data of each device forms its own sequenced processing entity, the system can be made really scalable when each run takes place in its own container. Each container processes one measuring device conversion iteration at a time.

Perhaps, for example, processes the latest data that has entered the database since the last iteration. When more devices are added to the cluster, the number of containers is only scaled up. This is really easy with Kubernetes, which is like it was created for this type of work. Each container processes one data source at a time and only one container processes one measuring device at a time.

Sequential but asynchronous code

It’s easy to confuse the concepts of sequential vs. concurrent and synchronous vs. asynchronous. The thing is, synchronous code can only be executed sequentially, but asynchronous code can be executed both sequentially and concurrently. It can be forced to run sequentially with await keyword before each call. Such asynchronous code works almost in the same way as synchronous code. The diference is, however, that it doesn’t block the execution of the event loop within the same program. And that’s a big difference!

Or is it?

If the runtime environment only has one CPU, it cannot be magically made to run faster by writing concurrent code with asynchronous functions. The end result will not change because parallelism is just an illusion. One CPU can still only do one thing at a time. If you observe anything else, it’s just an illusion made via ultra rapid context switching.

When your container only has one CPU, there is no point in making the code error-prone and difficult to understand by writing “concurrent” code. The end result will not be any faster regardless of whether you execute two things faster in succession but slower “in parallel”. One CPU does not become two. Therefore, it is clearer and more honest to run synchronous code than to pretend to multitask. If you need real multitasking, use a second CPU. Start another container where you also run synchronous code.

What about I/O Tasks?

Even if there is only one CPU in use, but the process also includes writing to the database, concurrent execution can still speed up execution. Because while waiting for the database write to complete, the CPU can be made to do something else. Now, as with any operation where data cannot be written until it has been processed, you have to think very carefully about whether concurrent execution really provides so much benefit that it is worth sacrificing, for example, the readability of the code.

However, if your code heavily leans on database interaction, asynchronous execution may actually provide significant benefit.

But yet again, even if you can make the code do something else while waiting for the database operation to complete, there’s no point to do something else in the meantime, if there is nothing to do. Meaning, the following operation is dependent on the new state in the database. A bit same thing as with the shoelaces.

Container-based concurrency

When parallel processing is already implemented at the container manaegement level, there is no particular need to mess things up and duplicate parallelism within individual containers. In principle, synchronous code is much easier to understand. In real life, sequential execution is also easier to understand. For example, when you go for a walk, it is a clear thing. First you put on your coat, then your hat, then your shoes, tie your laces and finally put on your gloves.

Asynchronous mess

Have you ever seen that movie where one hero, Donald Duck (not the other Donald), tries to cook and starts one thing after another with a terrible fuss and soon there is a horrific mess in the kitchen? Asynchronous code can be exactly like this at its worst. You wonder when an error will occur at any given time, when there is no guarantee in which order each operation starts and ends!

Meet the twins: Beanie and Bunnet

Is your use case a good fit for asynchronous processing? Or are you inclined toward the synchronous approach? You decide! Luckily, the synchronous library Bunnet has a twin named Beanie. If you need asynchronous processing, such as handling API calls, choose Beanie. If you do background processing, consider Bunnet.

I’ll be playing with my new pets Bunnet and Beanie and maybe even share my experiences with you.

Let’s see!