There isn't too much activity on this book [1] but I definitely think that more documentation about async programming in Rust is needed. Just recently I wanted to do something in async Rust and it's just such a PITA. I'm writing Rust since 3-4 years now and async throws me back to those first days where I didn't know how to cope with the error messages. Hopefully async/await syntax will improve this experience, but even then I think that documentation is needed. The futures crate is severely underdocumented. I'd love to have an example snippet next to each combinator etc.
Hi, you might be interested in a crate I wrote called desync: https://docs.rs/desync/0.3.0/desync/ - it provides a very simple yet expressive API for performing asynchronous operations and has full support for the futures crate. It can be learned really quickly.
Desync takes a slightly different approach to asynchronous programming: instead of being based around the idea of scheduling operations on threads, and then synchronising data across those threads, it's based on the idea of scheduling operations on data.
There's only two basic operations: 'desync' runs an operation on some data in the background, and 'sync' runs one synchronously.
All operations are run in order and 'sync' returns a value so it's a way to retrieve data from an asynchronous operation. It's sort of like setting up some threads with some data protected by a mutex and sending results between them using mpsc channels, except without the need to build any of the scaffolding. ('sync' also makes borrowing data from one task to use in another effortless)
> What is going on with futures 0.3? Why is everyone still using 0.1?
Futures 0.3 uses nightly only-features that are landing in Rust within (hopefully) the next two releases of Rust. Namely, Futures 0.3 is a way to experiment with async programming using the async/await syntax.
> Why is everyone still using 0.1?
Futures 0.3 is still in flux—but settling down in recent weeks—and is relying on nightly-only features. Futures 0.1 is used heavily in Hyper and Tokio, but we intend to move to Futures 0.3/std::future::Future when they're available on stable or shortly thereafter.
(The Tokio and Hyper projects take backwards compatibility _extremely_ seriously.)
(Disclaimer: I help maintain Tokio/Hyper, but am nowhere near are prolific as the main authors.)
It's the open source approach to upgrading. All the cool kids are focused on version N+1, which doesn't work yet. The users still on version N don't get support any more because only losers use version N.
You see this pattern frequently in open source. The Python 3 debacle spent five years in that state.
Commercial products tend to avoid this. Sales of version N go way down before version N+1 is generating revenue. Overall revenue drops during the transition. That's not good.
Note that this isn't what's happening here; tokio is explicitly supporting "version N" in your terminology. Which is why your parent is asking why people still seem to be using the "old" version.
(Also, there's a compatibility layer, so even the people that want to play with the shiny new N + 1 can do so, even though it's not explicitly directly supported.)
At least 0.2 has been yanked. I started learning futures when 0.2 was already defacto dead (which wasn't documented anywhere, but luckily people at the local Meetup knew), and I kept wondering why no packages were using it, and a lot of compatibility errors cropped up.
I think I'm correct in saying you don't need tokio for async but it seems all non-toy code uses it. Are there any alternatives to tokio out there for writing real async code or is the idea to build everything on it? As if it was std, but it's not... right?
There are systems that for various reasons can't use Tokio. In my own projects, I know there are people who need things to be generic across any executor implementation, as they can't use Tokio.
Tokio itself is great though, so if you have no strong reason not to use it, I'd recommend it.
This approach to async programming feels like a much more leaky abstraction than the 'it's basically semaphores' stuff for m:n threads. Though being able to do so much as a library is nice.
How does async translate calls to other async functions? Is refactoring into smaller async functions less efficient? If not, how does it deal with (possibly indirect) recursive function calls? Does it give up or select a loop breaker?
And what is the purpose of the pingpong between executor->Waker->push onto executor?
I am also still unsure what the approach to multithreading might be. Multiple executors with work stealing or one dispatch executor with worker threads or something else still?
From the second blog post I actually found https://github.com/tokio-rs/tokio/pull/660 which switched tokio from 1 reactor+worker threads to n reactors with work stealing.
Not to over simplify, but when you say code complexity, are you referring to the code you read? Like, the dev UX?
If so, I'd argue that long term once async/await have landed properly, the code largely looks and behaves the same. With that said, I've not even used it yet, because I've got no clue when this is landing enough that I can reasonably use it.. and I'm on Nightly lol.
> 3. Async polution, a function that uses async must be async too?
Coming from JS, that's a non-problem in Rust. You can easily make a function blocking by creating a event loop and resolving the future you get from another function in it. So when I refactor my code to be async, I'm starting by making a single function async, and the moving the event loop from function to function, until as much of the code is async as I want.
To add to that, not too long ago I was wishing Rust was more like Go on the Async front. Where the scheduler was more built into the language, and I didn't have to use "ugly" async/await stuff everywhere.
In hindsight, I prefer async/await. My reason is primarily that like your example points out, it really lets me be in full control over the scheduled behavior. I could even take non-io work and make it "async". Ie, some long processing application takes a break every million iterations to let other tasks steal some work. That's just cool!
Arguably a similar thing could be designed in Go if every million iterations you used some type of IO primitive, like sending some data over a channel, but the behavior of Rust's model is more fine grained.
Disclaimer: My understanding of Futures is limited.
> 1. Make sure, manually(?), that all things are async / non-blocking.
You'd have to make sure any IO you do is using Futures - ie, use a package to provide async IO primitives for disk and network access. You would also need to use the appropriate await syntax call on any future using methods - that would require a bit of overhead to know, but at least the compiler has your back on that.
> 2. Implementing Future.poll / wrapping types in Future?
In most cases I don't think you'd have a use case to implement a Future - would you? Ie, main IO calls are the big ones for wasting threads - and libraries like mio/hyper/etc provide your IO primitives.
> 3. Async polution, a function that uses async must be async too?
Yea, my understanding is that this is definitely an issue. I am already planning on using `async` tags on basically all my functions, because everything I use bound to IO in one form or another.
On the bright side, I believe(don't quote me!) that you can drop ugly `fn foo() -> Futures<Item=Result<A,B>>` wrapping, since I believe `async fn foo() -> Result<A,B>` does the same thing. .. again, the syntax is not finalized haha.
> 4. Setup some scheduler that maintain how many concurrent async operations one thread has?
If you're using Async I'd imagine you'd already have chosen a scheduler. I believe Tokio will be the defacto - though Rayon might be involved here, not sure.
> 5. More verbose error-messages / stack-traces?
Errors themselves would be unaffected, if you're talking normal error values - remember those are just values in Rust, like Go, so not much special there.
Though as you said, I imagine if you dump a trace it would look different, no idea.
None of this post was meant to counter you in anyway. I just hoped to provide a bit of clarity on the tiny things I can contribute to. I hope I helped more than hurt. Have a nice day :)
Hyper gets 7,013,819. It's async. Iron gets 109,815, and is synchronous. That's 63x. Iron uses hyper under the hood, so that should be a good comparison.
async isn't only about performance, but has other advantages, like reduced resource consumption. In addition to that async io also gives you better control over how to cancel io reads and writes on systems where the IO is not interruptible.
But you are correct, if you don't have a specific need, async is generally harder than using threads for concurrency. Ideally the async/await work in Rust is going to make that trade-off less extreme than it is today, which may mean more people will feel comfortable using it as it should reduce boiler plate.
You can think of a task as being a thread, but it has one single allocation that’s the exact possible stack size. No more, no less. This uses less memory than spinning up a thread with the default stack size. Yes, you could use the proper APIs and get the correct size too, but you have to figure that size out by hand for each thread. It just implicitly happens with tasks.
(I also misinterpreted the context as I read the top level comment as async vs a single thread with blocking code, but after rereading it that makes more sense.)
It's all good; it's one of the things that's specific to our implementation. Other forms may or may not do this, but I'm pretty sure that it's novel to at least Rust, and maybe C++; there's some discussion that I think it can do this in some circumstances as well.
Hm, maybe I misunderstand what you're getting at; you're talking about one thread per core, not one thread per unit of work? Sure, if you only have that few threads, then it's not that big of a difference, but if you want to spin up a few hundred thousand of them...
> you're talking about one thread per core, not one thread per unit of work?
Yes, a thread pool, consisting of one thread per core/computing unit. The units of work are then scheduled between the threads. Units of work here being some kind of IO, e.g. servicing HTTP requests.
> but if you want to spin up a few hundred thousand of them...
Hm. Thought there was a limit for work that can be done concurrently by the CPU, based on the number of cores/hyper-threads available. Found this on threads and IO performance [1], it seems to make the same point.
What kind of work load is common to spread over so many threads (on the same machine)? Does the OS switch efficiently between hundred of threads on regular CPUs? Genuinely interested.
Yeah okay. That’s a way to do things too, of course. I have some stuff to say but I’m about to go get some dinner, I’ll reply for real later. (And I’m sure others in this thread have opinions too)
Okay so, there are lots of ways to do this kind of stuff. A threadpool is a pretty classic one. Apache being the poster child here in an HTTP server context.
> Thought there was a limit for work that can be done concurrently by the CPU,
Right. But in an IO bound scenario, the CPU isn't doing work; it's waiting on IO. So, because threads are generally heavy, you don't want a ton of them, taking up memory, doing nothing.
But, when you have lightweight threads, you can spin up one per connection. This ends up being simpler, and you don't have the large memory usage. This is what nginx does, in a sense. It still has a worker per core, but each of those workers can handle thousands of requests simultaneously, because it's all non-blocking.
That limit to concurrent work is exactly why non-blocking architectures are so important, and task systems fit into them really nicely.
Excellently said, Steve. This is a great thing to know in this context, “Latency numbers ever programmer should know”: https://gist.github.com/jboner/2841832
A fun example is the slow loris attack. You send your requests character by character with 20+ seconds between packets. For mobile 20 seconds can happen so if the server doesn't use a running window to check timeouts it can't kill the connection.
Now you do this 1000 times you use barely any resources but the server uses 1GB in stack allocations which might be the maximum and noone else can connect.
reply