Hacker Read top | best | new | newcomments | leaders | about | bookmarklet login
Unravelling `Async for` Loops (snarky.ca) similar stories update story
50 points by genericlemon24 | karma 1959 | avg karma 11.95 2021-09-13 04:39:13 | hide | past | favorite | 40 comments



view as:

I love for await and async for but I can't seem to figure out whether it's going to be easy to understand for beginning programmers. Any thoughts?

If you're looking for alternate ways of designing an asynchronous coroutine API, Zig (https://ziglang.org/) and Go (https://golang.org/) have different implementations which might be worth reading into. Both avoid special syntax in the common beginner case, until you know enough to correctly write concurrent code (then you need async in Zig, and goroutines in Go). :)

as someone who has written a lot of python but doesn't really work in the backend/services world, async itself felt like a huge mess to figure out when I was exposed to it while mucking around in FastAPI internals to prove out an idea. This was mainly because of the infectious nature of async and the fact that the callback function I was passing up had to be synchronous while also calling async functions while the entire thing was running in an async loop. That meant I had to add a dependency just to call a damn function https://pypi.org/project/nest-asyncio/.

Rant over. `for await` and `async for` to me seem like a fairly natural extension once you understand async. I think the problem comes in where once you run into async the first time you are forced to fully grasp it (and potentially rewrite a ton of your code) to move forward which is really counter to the rest of the Python learning experience.

[1] http://journal.stuffwithstuff.com/2015/02/01/what-color-is-y...


The infectious nature is crazy. They also jump between boundaries of libraries and projects through dependencies. Thanks to async, most libraries in Rust now force addition of tokio which in turn convert more libraries into async. There should be some kind of compatibility layer that lets non async functions call async without jumping through hoops. As it stands async is more infectious than T-virus and Covid.

The infectious nature is just how things work with types. Imagine if you had to annotate a function with IO every time you do something that reaches for something else outside of the program. You would either design things differently so that IO usage is minimized or you would make everything infected by IO.

With Async, people are still figuring things out.

I've encountered a bunch of code bases that are polluted with async when a better design could have removed it completely.


I don’t really see how “infectious” a sync can be solved in a reasonable way. The function is async, can we really pretend it’s not? If it must be used in a sync context then it can be turned into a blocking call explicitly. This is not possible in JS so I can see the argument there, although they are now making the top level async too!

Well, "easy" is a relative term, but I would say with good certainty that it is going to be a barrier for a lot of people, because it forms an inner platform with the host language: https://infogalactic.com/info/Inner-platform_effect

That is, there's the way to run one statement, then the next in the synchronously-colored portion of the language:

    oneStatement()
    thenTheNext()
then there's the way to run one statement, then the next in the inner-platform asynchronously-colored portion of the language:

    await oneStatement().andThen(theNext)
Or whatever spelling you locally like. One way to write the synchronously-colored for loop, one way to write the asynchronously-colored for loop, different ways to branch on error, different ways to handle exceptions, etc.

This does not, on its own, mean "async" is bad. It is a complicated decision with many pieces. But this particular aspect of it means that there will always be a barrier to a programmer encountering this for the first time, simply by virtue of adding a new dimension they have to constantly account for to constructs that they're still getting a handle on.

(Having languages that don't color this code different has its own tradeoffs, and its own challenges. One of which is precisely because there isn't the separate color in the language itself, it's much easier to blunder into problems without even realizing it! People posting "why is this code broken?" to /r/golang, getting the answer "it's a race condition", and then the poster replies back with basically "what's a race condition?" is at least an every-couple-of-months sort of thing. I don't think there's an easy way to introduce concurrency to new programmers, honestly.)


In my opinion, async is such a mistake because it exposes a complicated abstraction to developers that could (and should!) be totally hidden by the runtime. Async "feels" cool and powerful because you get to learn about the intricacies of concurrency via coroutines, but at the end of the day, you just produce a tangled cobweb of difficult-to-follow code that is absolutely infested with explicit references to the concurrency. And it's so easy to get wrong.

Golang gets it right with goroutines. Languages should have extremely light and performant concurrency that efficiently map to OS threads. And the developer should only care about synchronization, not personally accounting for every possible context switch.


The problem is that that completely hiding it in the runtime has to be baked into the language's design, or at the least you have to have designed your language in a way that makes it easy (Haskell is a good example here - because functions are pure, you don't have to worry about implicit concurrent modifications of the same data).

Python, as a language, has a lot of reliance on global context and a lot of dynamic functionality. Imports happen at runtime, not compile time. You can modify the attributes of a class. You can reassign a function. And so forth. You can do nonsense like this, where a isn't even defined until the second-to-last line:

    $ python3
    >>> def f():
    ...  yield 5
    ...  yield a
    ... 
    >>> x = f()
    >>> next(x)
    5
    >>> a = 10
    >>> next(x)
    10
All of this needs some answer when writing concurrent / parallel code; the answer of "Don't worry about it" just gets you racy and unreliable code. The answer of "Make it impossible" changes the language (which is a fine answer, but you're better off starting from scratch like Go than adapting Python). So the remaining answer is "Make the programmer explicitly acknowledge concurrency."

https://glyph.twistedmatrix.com/2014/02/unyielding.html (which predates Python having proper async-await support by a few years, and references the "Tulip" project that became asyncio) makes a good argument that "Don't worry about it" doesn't work and "Make the programmer explicitly acknowledge concurrency" is fine in practice.

Anecdotally, my (limited) experience writing concurrent Python code in Trio has required no learning of the intricacies of coroutines and the implementation and also relatively easy-to-follow code, but yes, with explicit references to concurrency. Also, anecdotally, since my performance problem is just "I would like to parallelize multiple HTTP requests" and not "I am an OS scheduler and need to optimize every cycle" (in which case I wouldn't be writing Python), I'm happier writing code that accounts for context switches but in turn never has to care about locking than trying to properly manage locks and lock ordering and think about types of locks and all that.


I mostly don't disagree with you. The only point I really disagree on is that Python should have stuck with their limited threads instead of enabling this kind of "safe" concurrency. I think they are letting a genie out of the bottle with async, and it's going to result in making it very easy for people who don't understand what they're doing to write even more complex code that nobody wants to touch because they don't understand it either.

Yes you can do very magical things with Python, and no they're not very safe, but at least that level of magic requires some level of skill, and the assumption that with that skill comes a responsibility for putting up warning signs and providing guidance. But most of the time, people should not be doing magic, and if they are, it should be hidden so completely, that it appears no magic is actually happening.

Anecdotally, my experience with working with engineers who are trying to understand new async code, has been dismal. I used to think "oh, these engineers just don't want to put in the work to understand what is happening in the codebase." But it's actually hard to explain what is happening to other engineers who have not been "initiated" in async. So they end up writing broken solutions and then feeling frustrated, because there's a resistance to the nature of the abstraction. Now you could argue that they should just hunker down and learn more, but personally I think if new abstractions do not come with a natural intuition, then they are not good abstractions.


Explicit yielding of execution is often very important, for example, UI code or any case where exact OS thread matters.

I personally prefer the async model over synchronization because for me, it's a lot simpler to manage the async/await than having to synchronize. I find golang to be more difficult than javascript in this aspect. Golang has better concurrency though.

If you don't want to synchronize, you need to run your app on a single OS thread. That's the case for node.js. You can configure golang runtime to run on a single thread as well, I suppose that it would remove any need of synchronization either.

There are plenty of languages with both async model and multicore runtime support. For example Kotlin on JVM. You still need to synchronize on shared resources (or use other patterns like message passing).

Async does not magically save you from synchronization. It's just a quirk of node.js implementation.


NodeJS has more than one single thread, network IOs notably are on another thread. That's the beauty of it.

Why would network IO be on another thread? libuv gives node non blocking network IO (or so was my understanding).

Unsure about network (if it's true that it's on a separate thread, perhaps it's because network operations take cpu time even when readied asynchronously--e.g. even after you select/[e]poll to determine what events to trigger next, the actual read/write/sendfile/whatever ops do take a little time). However, filesystem IO is performed in threads in libuv on many platforms, due to the unavailability of stable async IO capabilities.

> not personally accounting for every possible context switch

In Python land explicit is better than implicit as this is cooperative multitasking after all.

It's also worth reading through https://vorpus.org/blog/notes-on-structured-concurrency-or-g...


Goroutines make sense for Go, but it is a bit much to say goroutines > async. Nothing is free and in this case FFI hurts and boy does it hurt in Golang. Python strives off of C code.

> not personally accounting for every possible context switch.

Context switches remain expensive.

I'm kind of on the fence about coroutines. They can sometimes simplify writing performant IO-dependent code, or can make it harder to debug. It feels that they've not reached their "final form", nor has the right set of structured programming norms developed around them yet.


I think the problem is they are really good for writing proxies like nginx, linkerd, haproxy, that everyone uses, so you have these really high profile, high performance siren songs. But they have a crucial design advantage- it is a tight loop making a small number of decisions before punting the work off to the OS.

Then people start trying to apply coroutines to a bunch of arbitrarily complicated and unbounded business logic things and everything blows up.


IMO it's a beautiful abstraction. You don't need to think about threads, just that your code will continue later when the result is available. In threaded environments like .net you can also create a CPU bound task that will be run in the background, yes in a thread pool, but you don't really need to know the specifics, it's something that you create, release the current thread (e.g. UI thread). For some IO tasks there's no even a waiting thread, so in that case thread would be the wrong abstraction.

If you want to unravel async loops without leaks etc. I think you may want to steal this function:

https://github.com/Qbix/Platform/blob/01604218d06ed158c921ab...

(There are many other cool things in that js file btw)


I went through the asyncio docs multiple times and could just not get it. Only after working with it and having to debug and test things did parts of it make sense. A very difficult api to work with imo.

This reminds me of something I’d like to see in JavaScript: Loops that accept `await` in their body without pausing the whole loop.

This is already possible with:

  await Promise.all(arr.map(async item => {
    await item()
  })
… but it requires turning the iterable into an array first and it requires calling a function.

Is anyone working on a proposal for this?

  async for (const item of iterable)
    await item()

A little more verbose than your example but https://github.com/tc39/proposal-async-do-expressions would help with this.

Thanks! I think it wouldn’t await the whole loop though

  for (const item of iterable)
    async do {
      await item()
    }
This would be equivalent to

  Promise.all(arr.map(async item => {
    await item()
  })
Notice the lack of await before Promise.all

There is already this:

    for await (bar of foo) { }
Useful if you have an array of promises you want to handle sequentially.

`for await` is for async iterables, which is different. The loop body will still block the loop. Compare it to the all/map example I wrote.

True, it is sequential and not parallel. I think this syntax with an async generator could be made to do what you seek.

Async iterables are async generators. They call .next() on the iterable, that returns a promise, the promise resolves with either a new item to loop with, or with `done`, which will stop the loop.

After reading the comment again, isn’t the first example equal to:

    await Promise.all(
      arr.map(item => item())
    );
The async keyword is just syntactic sugar for immediately returning a Promise for whatever the function eventually returns, so the async-await inside the map is extraneous if item() is already async.

That’s just an example piece of code. If I used your code there wouldn’t be an `await` in the loop and people wouldn’t understand.

Yeah, but the pattern of awaiting in an async parallel loop in general reduces to what I wrote, does it not?

The only case where you would need to await inside the mapper function is when you would further process some asynchronously returned value. This seems like it would add bloat and do unnecessary sync processing inside the async function. It is highly advisable to separate the concerns using function composition, e.g.

    await Promise.all(
      arr.map(async item => handleItem(
        await item()
      ))
    );
I’m all for enabling ”await arr.map()” with implied Promise.all(), heh. There actually was a proposal for that under the ”await*” syntax in 2014.

One might be tempted to use the thenable interface (don’t!):

    await Promise.all(
      arr.map(item => item()
          .then(handleItem)
      )
    );
Seems like it would work, but it actually only waits for the promises returned from item() and not the .then() part! This introduces a race condition, where the last promised items to get resolved may have their then-callbacks run asynchronously after Promise.all() has already resolved, and other code has executed.

> the pattern of awaiting in an async parallel loop in general reduces to what I wrote, does it not?

No.

    async item => {await item()} // Promise<void>
    item => item() // Promise<ReturnValue<typeof item>>
In general that's how you use maps (by returning something), but here the intent was not to map an array, it's just how to implement an "async for".

> The only case where you would need to await inside the mapper function is when you would further process some asynchronously returned value

Which is what map is for. Also, once again, that was just how the async loop can be implemented. If anything you're saying that it's "a misuse" of map and thus we need an "async for"

    >  arr.map(async item => handleItem(
    >    await item()
    >  ))
That doesn't make any sense, there's still an `await` in the function’s body, so you're still handling it — and there's nothing wrong with that.

> await arr.map()

That's actually a good QoL improvement, but it doesn't address either of the points I'm talking about (only works on arrays + it requires a function)

> but it actually only waits for the promises returned from item() and not the .then() part

Wrong. The mapper function returns a Promise that resolves with the return value of handleItem. `await a.then(b).then(c)` awaits the 3 promises in row.

To achieve that race condition, you should not return nor await the return value of `then` inside the map method:

    await Promise.all(
      arr.map(item => {
        const p = item();
        p.then(handleItem); // "Race condition"
        return p;
      })
    );

I didn’t give any examples with block function bodies, of course they are different from what I expressed. You changed the semantics and claim wrongness :/

Your words:

> One might be tempted to use the thenable interface (don’t!):

The code that followed was fine and it had no race condition.

> Seems like it would work, but it actually only waits for the promises returned from item() and not the .then() part!

This is the wrong part. The blockness is irrelevant.


I like Python. We use it as the primary language on an industrial agricultural automation device (raspberry pi form factor, running on Debian). Python’s been great for development, and the self hosted nature has made rapid in-field development/debugging awesome.

My belief is that async is going to complicate things. I do not foresee myself using it. This is just not a problem point I’ve wanted to solve enough that I’ve wanted the two color problem. And I find coroutines quickly difficult to follow.

I’ve been learning Elixir these last few months. When I need to solve the kinds of problems async supposedly address, I find the basic fundamental nature of Erlang/Elixir so much better suited to this.


JS recently got ‘async for’ too as for `await...of`

Oh god, Python has completely lost its way. Seriously hope I never have to work with this crap. Async/await is a toxic pollution of any language's syntax and really seems like the wrong model anyway. Erlang/Elixir/Go's concurrency model is so much more intuitive and doesn't require these hideous things to be bolted onto the syntax.

Legal | privacy