Hacker Read top | best | new | newcomments | leaders | about | bookmarklet login
Things you can’t do in Rust (and what to do instead) (blog.logrocket.com) similar stories update story
303.0 points by weinzierl | karma 23117 | avg karma 5.55 2021-05-15 10:27:44+00:00 | hide | past | favorite | 291 comments



view as:

I’ve read quite a lot of the logrocket posts recently and the writing is always a satisfying read. Logrocket and fasterthanli.me are my two favourite resources for rust opinion stuff. Opinionated but fair rather than ideological.

For my first brush with rust i decided i’d do some graph processing - a topic i’ve historically written buggy code for and i wondered if all the hype around rust’s compiler might help. Experienced rust people know where this is going but suffice to say i WOULD recommend this for someone new to rust if you want rapid exposure to what the borrow checker can and can’t do for you, for a rapid introduction to Rc & Arc, along with Box etc. I think i got a quicker introduction to rust by picking a hard use case.


Logrocket finds writers on other sites like Medium and approaches them and pays them reasonable rates (2-500$ per post iirc) to write for them.

Source: I have written for them in the past.


Aha! That’s quite a neat approach, i wouldnt know what logrocket does otherwise but i feel pretty aware of their product these days.

It's a great way to get decent content, I think DigitalOcean and Linode use (still?) do that. Great marketing tactics to boost website traffic :-).

I understand the message they're trying to convey, but a better title would perhaps be "things you can't do without unsafe Rust".

There's nothing inherently bad or wrong about unsafe Rust. It just leaves it up to you to manage invariants like lifetimes and aliasing contraints – just like in C.


Well, titles are hard.

All of those things are possible in Rust. But they're not easy or approachable.


What I appreciate coming from D is fine grained control of mutability. In Rust it is all or nothing. You cannot have a mutable container of immutable items. In D a string as a mutable array of immutable characters, for example. You can append things but not change the existing parts.

> In D a string as a mutable array of immutable characters, for example. You can append things but not change the existing parts.

Nothing precludes designing a Rust string that way. You're still designing the mutation interface of the collection, and having a `&mut` to a structure doesn't really give you anything if the structure doesn't allow mutations. Hell, you could actually design your own pseudo-string, which derefs to an immutable string, and only provides for a select number of mutations.

But, given Rust's rules around mutation, what useful property would that yield exactly? I think that's one of the most interesting effects of Rust's strict ownership, very much an "if a tree falls" situation: if you mutate a string in place but there's no way for anyone else to observe that mutation, does it matter?

I can imagine that "append-only strings" would be useful in a world of multiple-ownership where you could delay COW copies without affecting co-owners (though even that seems… somewhat limited), but that's not really a thing in Rust. Given the cow-ownership would be not one but two layers of external wrappers (an Rc/Arc pointer, and some sort of internal mutability shim), it's not like you could not flatten the entire thing and build thread-safe append-only COW pseudo-strings.


Rust can have mixed mutability - see Interior Mutability (Cell/RefCell/Atomic/Mutex).

But in Rust the distinction is actually about shared vs exclusive access. This is the aspect that really matters for safety. Access is properly of the code, not data, so it can be dynamic from data's perspective (the same data can be borrowed exclusively or not at different times)

When you have exclusive access, immutability becomes irrelevant, because nobody else can observe the mutation (so e.g. exclusive reference to a Mutex can skip locking!)


I like rust a lot, but coming from C++ and python there are times inheritance saves so much time enabling quickly creating small variations in classes. In rust you have to duplicate all the base logic, or abstract it to another common type to use composition. Which is fine but it’s painful at times knowing that time sink lies ahead. I understand everyone hates OO now but I still miss it at times when it would save me a lot of effort.

In my programming career, I had several situations where a base class made a lot of sense, with a dozen derived classes only implementing specific functionality. They were mostly algorithm-related.

OO makes a lot of sense in hierarchic structures.


It's a good practice to make the base class abstract.

With that in mind, this pattern cleanly maps onto a trait with default implementations for methods plus several structs that implement the trait.


To be fair, a trait gets you 98% there. It can have default implementations for most functions, you just overwrite the ones you want to change. You'll still need to declare the underlying struct though, so a bit of extra code

Unconstrained templates can also sometimes be a massive time saver.

Can you imagine a non-inheritance solution to this problem? If you used this way of solving things your whole life then I understand that changing paradigms is a challenge.

Since there was no answer, and no concrete example where inheritance was needed, I'll just provide one possible solution: you can store a function pointer that implements the desired varying functionality as a part of the struct whose implementation you want to slightly change. That function can accept self as the first argument like any method.

Performance-wise this is the same as a vtable generated by c++ and doesn't require creating more entities.


If you can use C++20, I'd encourage you to look into using concepts. Since I've started using them, it feels like a much more natural choice than class hierarchies when one needs to enforce type constraints on interfaces.

Instead of making a parent class that has a few key methods you need (e.g. send, recv) with the signatures (e.g. takes a pointer, returns a number), one can encode the use of those functions in a concept (e.g. a "socket" concept). This decouples the implementation from the use in the method, with the one big advantage being that it is easily testable -> no more need to carefully craft type hierarchies to carefully be able to substitute some mock object when you can just directly call it!

OO hierarchies certainly still have their place (even though so many conference talks seem to be about getting rid of them), but I'm glad I can relegate them to a dark corner of my toolbox until I absolutely need them.

In fact, come to think of it C++ concepts are basically Rust traits!


Yea for sure, it’s a lot like swift being protocol oriented too. It’s a major paradigm shift I have to enforce in my head manually still at this point tho

Note that you can do Swift's protocol oriented programming / portocol extensions in Rust, too:

  trait Test {
      fn value(&self) -> i32;
  }

  impl dyn Test {
      fn multiplied(&self) -> i32 {
          self.value() * 2
      }
  }

What does `impl dyn` mean, exactly? That this only works for trait objects?

That it works for any type implementing the trait, via vtables. Otherwise, it’s a method specific to the particular type implementing the trait (a concrete implementation but distinct to the specific impl).

I often find the opposite to be true. Where I have to spend a lot of time clearly defining a hierarchical structure, which after implementation needs to restructured. Or fighting against an already defined hierarchy, usually unchangeable and made by someone else.

Rust could have inheritance - perhaps limit it to single parent

I've started to learn rust for the 2nd time. The first time was a really frustrating. In my defense though, I'm pretty sure the compiler inferred a lot less back then (maybe 5-6 years ago now?)

This time, it's going much better and I've managed to build something non-trivial. I quickly ran into the self-referencing issue. A simple structure to hold an [normalized] phrase and a list of words within that phrase:

    struct Input<'a>{
      words: Vec<&'a str>,
      phrase: String,
    }
I ended up asking in https://users.rust-lang.org/ where people are super fast, friendly and helpful. I understand that it isn't allowed, but I still don't quite understand why it isn't. It seems completely straightforward and unambiguous to me.

Something else I'm surprised the article didn't mention is the combination of `Box<dyn X + Sync + Send>` which is a mouthful - both in terms of having to declare a type with this, and in terms of everything is implies/encapsulates.


(Not sure if this is the official explanation, but makes sense to me)

The issue is that you have references to the String (it's borrowed). That means it can't change, which isn't/can't be expressed in this struct. Imagine clearing the String: Suddenly all references in words are invalid which means you're referencing something invalid, very much not memory safe and a big rust no-go


Yeah, I tried picking rust up a few years ago and ran into a few weird issues that would block me from making good progress. I've now used it to create a parser that reads data and converts it to other formats (for my bespoke needs) and it's so fast it's crazy. I had much less issues this time, CLion integration with Rust is really helpful as well. The tooling around rust is now great (rustup, cargo, etc).

I just tried that in nightly and it seems to work? Not sure what the issue with that is.

It works, but it isn't what they want. The compiler won't allow borrowing the str slices from the owned String field.


This specific code compiles, you just can't create an instance of this struct and store references to `phrase` in `words`.

The "why is this an issue" has already been mentioned ("what if `self.phrase.clear()` is called? that creates dangling references in self.words"), but how to solve it is relatively straight forward:

    struct Input {
        words: Vec<(usize, usize)>,
        phrase: String,
    }

    impl Input { 
        fn word(&self, i: usize) -> &str {
            let (first, last) = self.words(i); 
            &self.phrase[first..last]
        }
    }
Now your Input type doesn't even need to be generic over the reference 'a anymore (not that it mattered before because that wouldn't work).

Now even if self.phrase is resized, you never get memory unsafety, because the `self.phrase[...]` does a bounds check and panics on out-of-bounds. So the worst you can get is a "logic error" in your program that shows up with a really nice backtrace that's super easy to debug, instead of a miscompiled program (due to UB) that's often very painful to debug.

So sure, you don't get "self referential struct with references" in Rust. But so what?

The fix for when you want this is always the same: use offsets like I did above. Offsets are much better than any kind of pointer, because they allow you to just "memcpy" the struct around (the address of a field changes on move and copy, but the offset to that field from the beginning of the struct always remains constant).

If you were to use a pointer to a field, you'd need to correct that pointer during copies and moves. C++ allows this, and this "feature" has many undesirable consequences for Rust:

- moves in Rust are O(size_object), moves in C++ can be of any algorithmic complexity, so any algorithm that moves has to take that into account this

- rust collections have one code path, C++ collections have two code-paths: one for objects that are "simple" to move, and one for objects that are "hard" to move

- move constructors that throw in C++ interact with exception safety in complicated ways

- probably many many more interactions that I'm missing here

There might be ways to improve the ergonomics of self-referential structs in Rust, but given that offsets are simple and always work correctly without many downsides, whatever improvement one considers for Rust would probably not be worth it if it has any of C++ downsides.


> If you were to use a pointer to a field, you'd need to correct that pointer during copies and moves.

The "transfer" crate allows for this, though not in a way that's directly compatible with C++ move. (The latter has been discussed somewhat in https://mcyoung.xyz/2021/04/26/move-ctors/ but that's only a first look at the problem area, not aiming for 100% accuracy just yet.)


Another bonus of the offset approach: it can be smaller. The size of a &str will always be two words (pointer and length), but if you're sure that your strings will never be longer than 4 gigabytes, and you're willing to put up with the necessary integer casts, you can store a (u32, u32), which is half the size of a (usize, usize) or a &str when your target uses 64-bit words. And if you're certain that your strings are never bigger than 65535 bytes, you can go even further and use a (u16, u16), and so on.

If you’re applying arbitrary invariants that aren’t codified in the type system, you may as well just use raw pointers.

Offsets cause bugs when you do things like assume your strings are of a particular length; that’s why they’re a bad choice.

Use a crate that provides a safe abstraction, don’t roll it by hand yourself, and the safe abstraction can be implemented in whatever way you want; but it’s not safe if it doesn’t enforce the invariants.


> If you’re applying arbitrary invariants that aren’t codified in the type system, you may as well just use raw pointers.

This isn’t really true, as the character of bug produced is quite different between the two cases. If you make a mistake with offsets, you’ll get a runtime panic that tells you exactly where the problem occurred. If you make the same mistake with raw pointers, however, you’ll get corrupted memory somewhere which won’t necessarily show up as a problem until long after the erroneous code is executed.


Who honestly cares? I sure don’t.

Yes, there is a distinction, yes one is worse than the other, but if you got a panic, your application terminated.

You’re screwed either way, web server or desktop app; fortunately there’s a simple solution: use whatever means you like to write a safe abstraction that enforces invariants.

I’m not advocating particularly for unsafe code; but avoiding unsafe and writing code that blatantly panics at the drop of a hat is ridiculous.

Invariants should be enforced by the type system.

I don’t care how; if you’re not doing that and just assuming your strings are shorter than 256 characters, you’re writing code that is unsound.


> You’re screwed either way,

Not really.

If the application terminates, it is infinitely easier to catch this during testing.

If the application silently continues, that might be an exploitable security vulnerability, might result in corruption of pretty much anything (data-bases, loss of user data, ...).

Pretty much everybody with an exposed web-server would prefer it to terminate with a nice debug message over the alternatives that using raw-pointers would cause.

> you’re writing code that is unsound.

No you are not, code using offsets is perfectly sound because it does not exhibit undefined behavior (which is how "soundness" is defined in Rust).

Your suggestion of using unsafe instead of offsets here would indeed be unsound. Don't do that.

> but avoiding unsafe and writing code that blatantly panics at the drop of a hat is ridiculous.

This is the most retarded thing I've read all day. `assert` is a tool used on _many_ standard library API to ensure soundness.

> I don’t care how; if you’re not doing that and just assuming your strings are shorter than 256 characters, you’re writing code that is unsound.

I really have no idea why you sound so angry, but the claim that someone anywhere in this thread is "assuming your strings are shorter than 256 characters" is completely made up. Nobody has suggested that.

The offset optimization that the OP proposed is trivial to implement correctly even without bound checks (just put an assert on the methods that take &mut self to ensure that the string doesn't grow beyond that).


> If you’re applying arbitrary invariants that aren’t codified in the type system, you may as well just use raw pointers.

You have to correct raw pointers on move and copy.

Offsets remain correct on move and copy.


How would you adjust this approach if the words needed to be keys? For example, if words were a map from a word to its count, and you could get the count for a &str?

I've ran into that exact same problem before too. My solution would be just to have a Vec<Range<usize>> instead. Then you can generate the substrings on the fly when you need to use them

Let's assume for a moment self-referential references were allowed. Let's further say we have such a struct containing self-referential references and we move it.

Moving actually moves things in memory, including the thing those references reference. This means the references now point at the old and now incorrect location. Accessing them would trigger undefined behavior.

Self-referencing is only acceptable, if the struct is guaranteed not to move. Rust does provide a mechanism to provide such a guarantee with std::pin but outside of async code, you probably don't want to be using it.

std::pin: https://doc.rust-lang.org/std/pin/index.html

If you're curious about pinning and have a few hours to spare, https://www.youtube.com/watch?v=DkMwYxfSYNQ is a great video that explains it in detail.


Self-reference is allowed, but it prevents moving.

  fn main() {
      // ok
      let mut inp = Input { words: vec![], phrase: String::from("foo bar") };
      inp.words.push(&inp.phrase[0..3]);
      inp.words.push(&inp.phrase[4..7]);
      println!("{:?}", inp);

      // this would give "cannot move out of `inp` because it is borrowed"
      //let x = inp;
  }

It also prevents borrowing `inp` or `inp.phrase` as mutable which makes it useless for most cases.

> I'm pretty sure the compiler inferred a lot less back then (maybe 5-6 years ago now?)

If it was post Rust 1.0 then it did not. The more likely case is the combination of NLL and match ergonomics, which has made a lot of code way more comfortable, respectively by relaxing local borrows and avoiding a lot of minutia in pattern matches.


I understood the “inference” comment as “the Rust compiler is more helpful now”. Which it is. @ekuber and gang have put a lot of effort into better error messages over the years and continue to do so.

Match ergonomics was one of the greatest improvements to the language, I’d personally put it almost on par with NLL.

For anyone who’s interested:

In Rust, you tend to do a lot of pattern matching and destructuring. But you also often only have a reference to your data, because copies can be expensive.

Combining pattern matching and references was always possible, but confusing: you had to place lots of “ref” annotations in various places, which I never got right on the first try. But now, you can magically destructure e.g. a reference to a tuple into a tuple of references. You don’t have to think about it, it simply works. Combined with Rust’s algebraic types, this allows you to write very elegant, almost fp-like code that’s also zero copy.


Those match "ergonomics" really tripped me up when first trying to learn Rust.

I was trying to figure out references and deref, and I kept thinking I "got it" and trying to make sure by writing examples that should and shouldn't work.

However, because of all the deref "ergonomics" magically spread around thinks like match, it was really hard to confirm whether I had the right mental model or not. Lots of things that surely shouldn't work somehow worked, and I couldn't explain why. I kept having to go to forums and people just told me "oh, this is a special magic case we added". Very frustrating.


Yes, while "match ergonomics" made many "just work", the magic it implies also obscures the language, and code, a lot.

I remain of two minds about it.

If you want to make sure you have the right model you can always set edition to 2015 tho. You will also lose NLL and a bunch of other conveniences, but match ergonomics will be disabled.


NLL is in Rust 2015 these days. (As of Rust 1.36.)

> If you want to make sure you have the right model you can always set edition to 2015 tho.

That's a really interesting idea. I wonder if that would work well for teaching languages in general. Like teaching Java 1 instead of 16 where the language was much simpler.


I can totally see how this kind of convenience might hinder learning. For me, the fact that “&a” does so much magic type conversion stuff had a similar effect. One thing I sometimes find helpful is to run an IDE (like VS Code with the Rust Analyzer plugin) and hover over every single variable to see its type.

Out of curiosity, since I expect to give a short Rust introduction in the near future, what was your language background when learning Rust? How familiar were you with C-style pointers and/or C++-style references?


A similar trick I learned from Jon Gjenset is to assert something has the unit type and read the error.

    let foo: () = complex_thingy();

> Out of curiosity, since I expect to give a short Rust introduction in the near future, what was your language background when learning Rust?

I have a background in lots of high level languages, like Python and Haskell, but never thought too much about how the types were represented.

But prior to learning Rust I spent some time writing C (I've never done C++), so I guess I came in with a simple idea of how pointers work.


Works for me:

  #[derive(Debug)]
  struct Input<'a>{
      words: Vec<&'a str>,
      phrase: &'a str,
  }

  fn main() {
    let s="foo bar baz";
    let input = Input {
        words: s.split_whitespace().collect(),
        phrase: s,
    };
    dbg!(input);
  }

   Compiling playground v0.0.1 (/playground)
    Finished dev [unoptimized + debuginfo] target(s) in 1.15s
     Running `target/debug/playground`
  [src/main.rs:13] input = Input {
    words: [
        "foo",
        "bar",
        "baz",
    ],
    phrase: "foo bar baz",
  }

Wouldn't a Box<&'a str> put the reference itself, not the contents, in the heap?

Input isn’t self-referential in your example, because both of its fields are referencing the same data owned by the main function.

For it to be self-referential, Input needs to take ownership of the string in phrase, and words needs to then be referencing phrase.

You can do this with Pin, but it’s not simple.


Yep, it can be done with Pin and unsafe only, AFAIK. In theory, Rust can be extended to have relative references, but it's much simpler to use arrays and indexes and ranges instead.

With Pin:

  use std::pin::Pin;
  use std::marker::PhantomPinned;

  #[derive(Debug)]
  struct Input<'a>{
      words: Vec<&'a str>,
      phrase: Box<str>,
      _marker: PhantomPinned,
  }

  fn test<'a>(s: String) -> Pin<Box<Input <'a>>> {
    let bs = s.into_boxed_str();

    let mut input = Box::pin( Input {
        phrase: bs,
        words: Vec::new(),
        _marker: PhantomPinned,
    });

    unsafe { 
        let phrase: *const Box<str> = &input.as_ref().phrase; 
        let words = phrase.as_ref().unwrap().split_whitespace().collect();
        input.as_mut().get_unchecked_mut().words = words;
    }

    input
  }

  fn main() {
    let s="foo bar baz".to_string();
    dbg!(test(s));
    
  }

Another aspect is that in Rust, all structs have "move semantics" in that you can copy the raw bytes of a struct to another location in memory without anything breaking. This is very nice in general but does rule out creating structs in safe code which contain pointers to themselves.

I’d just like to say, don’t feel bad.

The responses you’ve received aren’t particularly kind and as a rust user, I’m sorry for that.

It’s entirely fair to say you don’t understand why it’s not possible to have self referential structs, because it is possible;

If you have raw pointers you can have self referential types... so why not refs?

The issue here is that, like references passed to closures, the type checker isn’t sophisticated enough to track the references and ensure things are correct.

You could very plausibly have a struct in the form described above where moving ‘phrase’ is forbidden until the vec is cleared.

You can literally do that inside a function.

So... the problem here is the rust compiler, and that it’s difficult to implement.


I'm watching with both amazement and amusement how Rust is undergoing the exact same process we went through with C++ in the late '80s and early '90s, pushing the belief that the same language should be used for both low-level and high-level programming. Back then, just as it is now, that idea was sold by programmers who are on the more capable end of the spectrum and who also enjoy spending a lot of time thinking about clever ways to express programs in a rich programming language. Then, as now, they were people who hadn't had much contact with the average software developer and with the realities of software mass-production. The few of them who had, also tried convincing the industry that the system should change, and that software should be developed by small elite groups who are masters of the discipline as well as discipline (I'm not talking about the people who like Rust, or C++, for that matter, for low-level programming, but about those who try to sell low-level programming for application development), but, of course, the economics always win.

Those who go down that path will experience the same disillusionment we did, and end up with applications that end up costing so much more to maintain if only because of the cost of the maintainers they require. Software costs work better when low-level and high-level programming are kept separate. The former is costlier but tends to be both smaller in size and in prevalence, while the latter is the opposite.

Of course, the number of people who make that mistake of mixing high- and low-level programming won't be nearly as high now as it was back then -- many of us have learned our lesson -- but it would still be interesting to watch this unfold again, even at a small scale. And, thirty years from now, some will be telling us why, while using Rust for high-level programming was indeed, a costly mistake, Muju is completely different, and this time a low-level language really and truly is appropriate for application programming.


Every generation must go through these cycles. Visual programming, no-code solutions, thin clients, thick clients, static typing, duck typing, functional programming, etc.

Time is a flat circle.


> more capable end of the spectrum

Actually rust is a grate language for programmers of the less capable end of the spectrum.

While it allows clever abstractions it generally discouraged and recommended to write simple code.

The borrow checker might seem like a clever abstraction, but it's just a build in code analytics tool you should use anyway when using C or C++ or well any language, and it generally nudges you to better structured code.

> Then, as now, they were people who hadn't had much contact with the average software developer and with the realities of software mass-production.

Except that more than just a few decisions in the rust design process where strongly influenced by the feedback of average developers, including people which just got started with programming.

Wrt. C++ I can completely agree, it's a completely over engineered language with endless gotchas with silently sneak into you code without you noticing and potentially causing UB if compiled with some compiler in some optimization levels for some target. Making such bugs completely bonkers to track down.

On the other hand in (safe) rust many of the problems user have come from the language helping you to not run into any of this and other bugs.

And all "hidden" complexity rust has is normally designed in a way so that you don't need to know it to use rust or even things related to it as long as you don't want to create some low level primitives which involve unsafe code.


That is exactly what we thought about C++. We, too, believed that C++ fixed all those problems with both low-level programming in C, and the "messy" programming of the "application generators" of the time; it lets you clearly express intent and yet helps you avoid shooting yourself in the foot as happens in C. Anyone who truly believes that Rust is a great language for the average applications developer, clearly has not met one.

I think you have a point that yes, programming languages depend on the domain a program is in. If the program only handles up to 10k customers, it can be programmed in javascript and can indeed keep its entire database in RAM. sqlite even has a mode for this. A LOT of custom software is in this "small" domain. However, when you get to millions or billions, maybe you have to optimize and create a backend. What should this backend be written in? Previously the answer might have been C++. But now it's Rust.

Personally, being a Rust enthusiast, I think that Rust's benefits don't just unleash when you think about performance, but also when you think about large codebases. Such codebases are usually the scene for refactorings. In a dynamic language like js, these are extremely hard to pull off. In Rust, the compiler won't stop yelling at you until you have cleanly finished the refactor. I love refactoring Rust codebases, even ones I didn't write, while in C++ or other languages it's extremely annoying.

And note that while C++ enthusiasts might have believed that everything should be C++, not everyone followed them. Many people still stayed with C. Many still do even with Rust around. But there is a small group of people for whom Rust is such a great improvement that they take the portability cost and switch to Rust. Linus is heavily anti C++ for example, but he's open to adding Rust to the kernel.


> However, when you get to millions or billions, maybe you have to optimize and create a backend. What should this backend be written in? Previously the answer might have been C++. But now it's Rust.

Neither Facebook, nor Instagram or Slack are primarily built on C++ or Rust though. And many other products that I can't as confidently remember from the top of my head aren't.


Facebook originally did compile their PHP to C++ though. https://en.wikipedia.org/wiki/HipHop_for_PHP

Only later did they introduce Hack and HHVM, which compiles a PHP dialect in a just in time fashion. My point is that once you reach scale, and your computer bill becomes significant enough, you'll really want to use efficient languages.


> However, when you get to millions or billions, maybe you have to optimize and create a backend. What should this backend be written in? Previously the answer might have been C++. But now it's Rust.

Now it's Rust based on what? Uber backend is almost entirely built in Go. And so are many other services handling billion plus requests a day written in other languages.

Perhaps Rust advocates know something that Uber engineering does not.


Go usage proves my point. It's statically compiled. In fact, Google wanted a language in the same performance class as C++ but that was easier to teach to college graduates. Rust is also such a language. Furthermore, when Uber got started, Rust hasn't even been available.

Rust being easier to teach than C++ is a very low bar to clear.

And for the record, when Uber got started Go wasn't available either.


> Anyone who truly believes that Rust is a great language for the average applications developer, clearly has not met one.

What's a better general-purpose language for this mythical "average" developer? Java/C#? Rust has the same level of memory safety, and adds data-race safety that these languages do not have out of the box. Go is in the exact same boat, btw - memory safe for sequential code, but concurrent code is tricky since unsafe shared access is ubiquitous in real-world Go code. Haskell, Agda and Idris could provide more safety but are not approachable to the "average" dev.


> Haskell, Agda and Idris could provide more safety but are not approachable to the "average" dev.

You (rightly, I think) place Idris (and Agda, although that's more a proof assistant than a PL) beyond the reach of the average developer, and also, apparently, Haskell -- arguably a simpler language than Rust, but never mind -- but not Rust. And that's my point and, I guess, our point of disagreement. If you think that, on that spectrum between, say, Python and Idris, Rust is at a good level for the average application developer to the point where maintaining high-level applications written in Rust would ever be cost effective, then I think you are out of touch with the realities of the software industry and the economic forces that shape it. I thought the exact same thing as you twenty five years ago: C++ was obviously the right language for the average developer -- even more so than, I don't know, Delphi or Visual Basic -- wasn't it?

> What's a better general-purpose language for this mythical "average" developer?

I don't believe any low-level language will be cost-effective for application development for the foreseeable future, if ever, and the average developer is anything but mythical (I mean, embodied as a single person, perhaps, but I'm talking about teams).

> Rust has the same level of memory safety

It doesn't (due to much more prevalent reliance on unsafe as well as FFI), but that's very much beside the point because, as you acknowledge, safety is not what it's about.


simplicty is important, but i believe it is not the most important metric for a PL and have no problem with complex PLs. the question; is it worth the trouble? c++s failure is not the complexity but c++ being one of the least orthogonal languages out there. in c++ you are able to compose two great things/ideas (other than my only options are c and c++ this is the biggest reason i prefer c++ over c) yet the end result is not more but less.

> Haskell -- arguably a simpler language than Rust

From a PL research point of view maybe.

But from a POV of what is simpler to use for a average or new programmer, Haskell is far worse as far as I can tell.

> due to much more prevalent reliance on unsafe as well as FFI

Where? In embedded programming? Or programming from primitives (which I would argue a average programmer never should do, there are existing libraries)?

In some use-cases you always end up with a FFI, but that is also true for other languages.

Besides that usage of unsafe is both strongly discouraged and uncommon.

There are a lot of rust use cases where using unsafe code is never necessary to a point where people simple ban any direct usage of unsafe code in their project.


> I thought the exact same thing as you twenty five years ago

Mr Pressler, would you mind reminding everyone how old you were “25 years ago” to put this statement into perspective?

Could you please stop pretending you're some old hat who's seen this before in 90s or even 80s[1], and trying to give your wisdom to the young generation, this is ridiculous.

[1]: in your initial comment in this thread : > we went through with C++ in the late '80s and early '90s,


To put it simply - Rust empowers a master like dtolnay to create something as powerful, zero cost, fast and easy to use as serde … while also enabling a noob like me to write simple, relatively bug free code.

As a bonus I can pull in serde with one line and use it without worrying about shooting myself in the foot.


I observe that the last sentence of your first paragraph is strongly at variance with the Rust leadership's self-image.

(See for example the "Empowerment, empowerment everywhere" RustConf 2020 keynote.)


"Software should be developed by small elite groups" is pretty much the opposite of the Rust developers' goal. I do agree with you that it's a bad goal to have, which is why I'm optimistic about Rust's success.

didn't lisp prove the opposite? you can go lower than c and higher than any language we currently have in the same language?

Yes but where is Lisp market share? It's mostly an academic language these days.

If anything Lisp current adoption proves that aiming to tackle both ends of the spectrum in a single language might not be a good idea.


do adoption or market share matter in this context? if you can have best of the both worlds why would you want to deal with an another language?

There are downsides to using the most powerful option at all times.

https://en.wikipedia.org/wiki/Rule_of_least_power

Also, Lisp is the most powerful at constructing programs but that means it's the least powerful in the field of making sure your program actually does what you wanted it to - that's what types, borrow checking, proof assistants and so on are for.

(Some people think Lisp is declarative because it's homoiconic but that's not true - you can't find out what a Lisp macro does without executing it.)


I do mostly embedded programming. Performance and ability to access memory directly is critical! But I like taking advantage of modern tooling, and abstractions like pattern matching, traits etc, and clear syntax. Changing languages through an FFI barrier would add complications.

There's something to this. Rust was built to do low-level programming well, and as a result it's quite hard. It's fundamentally more work to do things than in a GC'd language of similar modernity. That's a perfectly reasonable tradeoff.

But somehow, Rust got cool, and now a large fraction of its community wants to use it to write web services. So there's a lot of gravity dragging the ecosystem, and even the standard library and the language, in that direction. Even though it is a poor choice for almost all web services.

One day, the shine will wear off, and it will be easy to see that it's a poor choice for almost all web services, and we will feel some regret over the amount of time and energy that went into building in that direction.


Agreed - I built a rust frontend framework, and tried to use it on the backend for a while. Ended up switching back to Python and Javascript for web - I use rust for embedded mostly now.

Like Uber, some do manage to avoid the pitfall of trying to shoehorn hard languages everywhere. I recently learned their microservice fleet is almost entirely in Go [1]. Could also have been C#, Java, Python or Node but the point is it is not Rust or C++. Probably for the reasons you stated.

[1] https://news.ycombinator.com/item?id=27120689


For companies like Uber, Rust would probably be a good fit if they started today: at this scale you spend more time optimizing some otherwise simple logic in order to scale up without wasting too much resources. Go has a decent optimization story, thanks to the escape analysis diagnostic which makes it quite straightforward to avoid many allocations (much more easily than in C# for instance, even if it's also doable in C# indeed), but Rust takes it step farther, and you'll have better performance to begin with, which will drastically reduce your scaling effort.

For instance, look at what Cloudflare ended up doing to get Go working as they wanted: https://blog.cloudflare.com/go-dont-collect-my-garbage/

Rust will never be a competitor to Python or PHP (but it can enhance them, because making a native extension in Rust in way easier than in C, especially for people with a web back-end background and little C experience), but Go is losing a lot of appeal now that Rust is here ready for prime time.


You have some unsubstantiated claims. Firstly, Go focuses on infrastructure, it is actually the bootloader of cloud-native wave. Rust is great but still, it can be replaced by C, C++ or Zig when it becomes stable. It doesn't have its own ecosystem nor there's any ROI coming out from there to say that it will replace Go tomorrow. On the other hand, when it comes to modern DevOps, Go became as irreplaceable as Python and Bash. Using Rust won't translate into better performance without effort. Because of its focus on soundness, I would say that you need to put in even more effort.

> On the other hand, when it comes to modern DevOps, Go became as irreplaceable as Python and Bash

Most devops don't use Go and I argue that for the ones who do, Rust is a big improvement. Sure Go is being used to make Docker and k8s, but with this reasoning Haskell is also irreplaceable since Pandoc is made with it… Ironically enough, the mere existence of docker negates the main asset of Go in a devops context: statically linked binaries. Without containers, it's a huge selling point, but as soon as you use docker it brings nothing to the table anymore.

> Using Rust won't translate into better performance without effort. Because of its focus on soundness, I would say that you need to put in even more effort.

Saying so reveals that you've probably never tried it. The biggest performance issues I've been facing where memory allocations. With Go, things are “magically” allocated or not depending on the compiler's mood of the day (or the compiler version in practice) and you need to diagnoses these by looking through the escape analysis log, see when the compiler cannot elide the allocation and refactor until it works. With Rust, you just don't allocate memory unless you're using a `Box` or heap-allocated collection (a Vec or a HashMap for instance).

In both case you can have an implementation with a good performance, but with Go you need tedious tuning, while Rust gives you an explicit control of the allocations. That's an enormous productivity difference.


> it will be easy to see that it's a poor choice for almost all web services

Time will tell. I used to think this way, but don't anymore.

That doesn't mean it's right for the majority of them, but I think it's workable for a very surprising number of them.


"workable for a very surprising number of them" seems substantially the same as "a poor choice for almost all" to me!

Time will tell :)

Well, I have been writing both high and low level code in Rust, and I must admit I've been happily falling into that "trap". The nice thing about Rust for high level software is that it lets us specify interfaces in a way that makes them actually usable without delving into the low level details.

A lot of things that are hand waved away in C++ (who owns this? How long does this need to be alive? Can I mutate this?) are completely explicit in a Rust interface. And Rust's type system is strong enough to express a lot of rules that no one would want to write down in C++ (because overriding them is just one cast away), allowing us to write hard-to-misuse libraries.

Coding on top of such libraries is a really fun experience. That's why I call Rust an all level language.


Different languages appeal to different people, and there is no doubt some individual programmers will find Rust appealing even for application programming, just as some found C++ appealing for similar purposes. But a CIO of, say, an insurance company who decides that internal applications will be written in Rust would be making a very imprudent financial decision.

Rust is not at all a high-level language in any sense and will not replace the places Kotlin, Go, TypeScript etc occupy. If I wanted to write a super-fast server with some time constraints, I would choose Go without thinking twice. Go is fast enough. Rust is only needed when you need things to be super-duper fast and memory efficient. Those two criterias are the only reasons to use Rust.

why do high-level languages almost always run inside an interpreter/vm? (except swift/go)

is it a good or a bad thing?


I have to give it to logrocket's articles, using Rust for a marketing push and being on the frontpage of HN.

Seems like I should make a Rust article just for it to soar to HN's frontpage, just for developer marketing of my product that is unrelated to Rust.

Bravo.


Author here. I blog about Rust as a hobby (and have been doing so since 2015 on my personal blog), and LogRocket is so nice to pay me to have my content hosted on their blog. And yes, their product is only tangentially related (if you use Rust on the frontend with WASM).

It's probably a little jarring to see this many Rust articles to most people, but it's a very exciting time in the Rust community. Not only has the Rust Foundation recently been established, but the crate ecosystem is now starting to finally include higher-level software like game engines and SDKs. Rust has actually entered production, a mere 7 years after it was conceptualized and prototyped. Now that it's starting to mature, Rust is starting to look like a great choice for a high-performance native development stack.

I worked at a company that had a developer blog with articles like this.

It was fun, but our conversion rate from the blog was zero. Maybe there was some downstream brand recognition effects, but we treated it like a fun thing for developers to do to share some knowledge.

The only real downside was that a couple people got hooked on the idea of using the blog for thinly veiled self-promotion, so it required some clear expectation setting. It worked best when posts were treated as group efforts by the team instead of individuals posting to the company blog with everyone else as silent editors.


Looks like this is only the second one: https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que.... That's not excessive.

Rust posts have gotten excessive of course, but that's different.


> Rust only ever allows one owner per data, so this will at least require a Rc or Arc to work. But even this becomes cumbersome quickly, not to mention the overhead from reference counts.

This takes 50 LOC, gives you a doubly-linked list that's memory safe, thread safe, and that has pretty much the same efficiency as a doubly-linked list using raw pointers. (Feel free to prove me wrong here, but I've written one, and I couldn't measure the difference on x86, i'd expect ARM to be even better because it has weaker atomics).

Compared with a Vec<T> or a HashMap<T>, what dominates the performance of a doubly-linked list is the pointer indirection to access the object, and the cost of that is pretty much the same whether you are using Arc<T>/Rc<T> or a raw pointer.

Also, a doubly linked list only makes much sense for relatively large objects and when you want O(1) splice, so whether you store 64-bit or 128-bit wide pointers doesn't matter at all, because the objects are big, and O(1) splice just modifies 4 pointers...


Yeah, the article is weird about "But even this becomes cumbersome quickly, not to mention the overhead from reference counts."

You pick the linked list because of better scalability as the data size grows bigger, you choose them precisely because you think the constant overhead vs contiguous lists is worth it.

The most common example is an object that wants to remove itself from a big list.

E.g. you have a OS with 10000 processes. 1000 processes die per second and they already know their position in the linked list so they remove themselves in O(1). An array would require leaving the slot empty and swapping in the last element (unordered list) O(1) or shifting entries to fill the gap (ordered list) O(n). The former breaks ordering, the latter leads to O(nm) removal costs where n is the number of live processes and m the number of dying processes. Paying reference counting costs is not a problem.


A linked list is still a pretty terrible choice. You'd probably be better off with a Vec with tombstones and some kind of free list.

It really depends on what you are doing. If inserts into the middle are common, readjusting a vector gets really expensive.

Theoretically expensive, but modern hardware is optimized for this case. Random access lookups and cache misses from the linked list will tank performance, even in most cases where a linked list is theoretically equivalent or preferred.

If you have a long vector, say, 1000 elements, and you need to copy 500 of them to shift them over for an insert, you are going to be very slow compared to a linked list. Besides, linked lists aren't that bad if the nodes are close enough in memory (read: consecutive), or they are hot enough to be in already in cache. But as always, measure instead of guessing.

> If you have a long vector, say, 1000 elements, and you need to copy 500 of them to shift them over for an insert, you are going to be very slow compared to a linked list.

FWIW the original suggestion mentioned tombstoning so you'd have a Vec<Option<T>> instead of a `Vec<T>`, and "removing" an item would just set the `Option` to `None`.

Of course this would increase iteration cost (as you'd have to skip the `None`), and you'd probably want to compact the vec once in a while.


I probably shouldn't even be in this thread but isn't that creeping back toward garbage collection?

Sure, it's just a mark and sweep, but manually implemented (which is fine).

It’s completely normal to have this sort of things in low-level code bases. Both C++ and Rust have refcounting smart pointers in their standard libraries, and bespoke refcounting schemes are regular occurrences in C codebases (the Linux kernel has tons of refcounting, so much so that it has a utility file for reference counters: https://github.com/torvalds/linux/blob/master/include/linux/...)

thanks for the response! I'm a big fan of doubly linked lists, so it was interesting to see that there was extra work to get them going. I'm certainly going to read up on this "ownership" thing. I get the impression that its there for really good reasons, and I'm not going to argue with experts who are protecting users.

I have a question for you, would you say that the increase of difficulties with doubly linked lists are incidental in nature, or is there something inherent with doubly linked lists that impact security? Like I said, I like the doubly linked lists for certain things, and the more I know about the impact of using them the better!

I certainly will be reading up on this!


Or a Vec<T> for storage and a Vec<usize> for tombstoning. Lots of ways to solve this, and my experience is that you can beat a linked list approach with the 'Vec plus Vec' approach for a lot of data sizes/ operations.

Or a custom Vec that has a lower maximum bound, at which point you can start doing things like integer encoding/ pointer swizzling.


> If you have a long vector, say, 1000 elements, and you need to copy 500 of them to shift them over for an insert, you are going to be very slow compared to a linked list.

It takes almost no time at all to do that, depending on the size of the elements. But you have a starting assumption there that the vector must always be stored sorted. Challenge that assumption. If insertion is perfectly balanced with traversal it may be true, but that's also rarely the case. It's fairly trivial to instead track if the vector is already sorted or not & sort when sorted access is actually needed.

> Besides, linked lists aren't that bad if the nodes are close enough in memory (read: consecutive)

Which doesn't really happen outside of a controlled benchmark. If the linked list is stored in a single consecutive allocation, then resizing has observable side effects (eg, pointers to elements become invalidated). You can do things like allocate chunks at a time so that some elements are consecutive, but you won't have all elements consecutive or close in memory.


>> you need to copy 500 of them

> It takes almost no time at all to do that,

Uh, 500 loads and dependent stores, at minimum, in the best case hitting L1 cache, which is 3 cycles. So you are talking about 1500 cycles to....avoid a single cache line miss due to chasing a linked list pointer? A miss to main memory is 100-200 cycles, maximum. More likely you are going to L2/L1 which is 12-50 cycles. So no, in no circumstances would I expect copying 500 elements to be faster than chasing a single pointer.

> Challenge that assumption.

You don't get to pick the application behavior. If it needs to a do a lot of inserts and traversals, it just does.

>> are close enough in memory (read: consecutive)

> Which doesn't really happen outside of a controlled benchmark.

There is no supporting information for this statement at all. Of course a list created all at once using a bump-pointer allocator is going to be consecutive. And a (moving) garbage collector is going to dynamically reorganize the nodes of a list, generally in a breadth-first way, depending on the marking algorithm. That can result in the nodes of the list being laid out consecutively after compacting, no matter where they were originally organized.

Memory behavior of programs is complicated. I find it surprising that we're having a conversation that we can't or shouldn't use linked lists now because vectors are universally better, or we can lazy sort or some other crazy workaround. I'm not sure what motivates this whole line of argumentation. Sometimes lists are just the best damn thing.


> 500 loads and dependent stores, at minimum, in the best case hitting L1 cache, which is 3 cycles. So you are talking about 1500 cycles

This is absolutely not how modern CPUs work, and I think you misunderstood where the dependencies are. All the load/store pairs are independent from each other, which means they can be executed in parallel. Which means that this code is throughput limited. Modern CPUs tend to have at least 2 load/store ports, so we're talking a throughout of one copy per cycle, or 500 cycles for the entire operation (plus warm-up time).

Furthermore, this is a pure memmove in many cases, which means a real memmove implementation that has been optimised using vector instructions can be used. Now you're talking about moving 32 bytes per cycle, or 4 array entries if they're pointer-sized, which brings us down to 125 cycles plus warm-up. Which is on the order of a miss to memory...


Great, now we're talking! I realized in the background that yes, there aren't dependencies between the copies...if the regions of memory don't overlap. But they do overlap, especially if you are just moving them over one element, so the simple analysis doesn't hold. But you're right that's it's just a memmove, so there is an optimized vector implementation (which, incidentally, is probably going to have relatively large setup costs for 500 elements).

And oops, now your vectors are 1 million entries.

I appreciate the discussion. It's a bit of a rathole for something that you shouldn't be optimizing if you can completely avoid it by using the right data structure for your needs.


> And oops, now your vectors are 1 million entries.

As soon as you traverse that linked list of 1 million entries, oops.

In fact, here, I put together a quick benchmark. It inserts in the middle of the list and then traverses it (since, after all, ordering is irrelevant if there's no traversal).

https://paste.ofcode.org/7Buz2Mua9xna55TC3uFaQp

Test system is a Ryzen 3700x. I tried to avoid all amounts of compiler optimizations and believe I succeeded, but by all means happy to take a review of the above. Would love to see you do a similar test in whatever runtime you want, especially one with that juicy super-amazing compacting GC that makes linked-lists so fast.

At 100 elements vector wins (40ns vs 160ns). At 1000 elements vector wins (~300ns vs. ~1000ns). At 10,000 elements vector wins (~2700ns vs. 12,000ns). At 100,000 elements guess what? Vector still wins (29us vs. 130us).

You are either vastly underestimating how slow linked-list traversal is or vastly overestimating how expensive memmove is.

But you did say 1 million entries so let's give that a shot:

    Vector size 1000000  took 286,123ns
(Vector at 1 million is only twice as slow as a linked list at 100,000, linked lists aren't off to a strong start here...)

    Linked-list size 1000000  took 2,670,542ns
Oh my god it's a bloodbath. Vector wins by 10x. Still. At 1 million entries the O(N) insert structure is still faster than the O(1) insert one. And the gap is getting bigger!

And this is the basically best case for linked lists of inserting in the middle happens equally as often as traversal, I'll point out. If traversal is rare obviously a lazy sort would crush these vector results, and if traversal is common the linked lists results get that much worse.

Now really these results shouldn't actually be that surprising if you really dig into it. After all, the vector version of this is storing (and therefore traversing) 4MB of data. That's all 1 million ints really is, it's not very big. Now sure you can store bigger things, but much bigger and you're probably storing a vector of pointers instead which only bumps that to 8MB of data. The linked list version, on the other hand, is storing 2 pointers + an int for each node. That's 20MB of data total. So resizing the vector involves a read of 2MB of data, a write of 2MB of data, and a traversal of 4MB of data - 8MB total. Simply traversing the linked list is hitting 20MB of data - over double the amount of memory bandwidth utilized. That's a big difference. In fact on my system at 1 million entries that happened to be the difference between the vector version comfortably fitting in L3 with loads of room to spare (especially since the resize made part of it nice & hot) vs. the linked-list version not coming close to fitting (specs on this CPU claim 32MB of L3, but it's really 16MBx2 - hitting the far L3 isn't much slower than hitting main memory). So more data to hit and it's pointer chasing? Modern CPUs just really hate that. Like a lot.

Since I don't intend to cheat at happening to have sufficient L3 for the use case you laid out (combined with the benchmark naturally keeping this hot), I also tried adjusting it. I changed the containers to hold intptr's instead, and hit it with 10 million entries. Neither one comes close to fitting in L3 now. Still 80MB of data for the vector vs. 240MB for the linked list but maybe not having that copy and with both of them thrashing L3 the linked list might finally show something to redeem it.

    Vector size 10,000,000  took 3,680,111ns
    Linked-list size 10,000,000  took 26,460,333ns
That's a big fucking oof right there.

> I appreciate the discussion. It's a bit of a rathole for something that you shouldn't be optimizing if you can completely avoid it by using the right data structure for your needs.

And a linked list is rarely the right data structure, so yes you should avoid trying to optimize for that without measuring.


I think one important variable to tune would be the element size (since supposedly linked list shines more with bigger data). Curious to see what the numbers are at 100b/1kb

My guess is that the linked list impl remains almost the same while vec sees a linear ish slowdown.


That's true. And there is seriously optimized software out there using linked lists (e.g. in the Linux kernel).

However, with bigger data in my experience a very good solution is often to have a vector of pointers to the actual data. This loses some locality of course, but during traversal the fact that you don't have dependencies between iterations can still be a win.


Oh, absolutely! Linked lists do have a quite useful strength in that they are agnostic to data size and, potentially more importantly, that pointers to the data are not invalidated when a linked list changes size as they are in vector. Both those properties combined make linked lists good data structures for holding large data as more of an allocator than a list type usage. Where it's not the O(1) inserts-anywhere that matters so much as the O(data-never-moves) that does.

But since you asked I adjusted the benchmark to instead contain a 100byte & 1kb payload (just an array of ints). And in the loop I just use the first number of the array in the summation. I assumed an iteration isn't accessing the entire payload but say instead an ID or whatever.

I also added a vector that is pointers to the data instead of the data itself (always a new allocation, none of the pointers are the same)

For 100 bytes:

    Vector; payload sizeof=100 of count 100 took 142ns
    Vector-of-pointers; payload sizeof=8 of count 100 took 150ns
    Linked-list; payload sizeof=100 of count 100 took 180ns
    
    Vector; payload sizeof=100 of count 10,000 took 18,796ns
    Vector-of-pointers; payload sizeof=8 of count 10,000 took 8,670ns
    Linked-list; payload sizeof=100 of count 10,000 took 32,853ns
Gap shrunk, but possibly not by as much as you would have expected.

For 1kb Results:

    Vector; payload sizeof=1000 of count 100 took 1137ns
    Vector-of-pointers; payload sizeof=8 of count 100 took 549ns
    Linked-list; payload sizeof=1000 of count 100 took 387ns
    
    Vector; payload sizeof=1000 of count 10,000 took 120,709ns
    Vector-of-pointers; payload sizeof=8 of count 10,000 took 10,681ns
    Linked-list; payload sizeof=1000 of count 10,000 took 105,927ns
Linked-list finally manages to pass up vector, but then vector of pointers runs away from linked-lists in the 1kb test.

Think about it like this: Traversal speeds of linked lists are bottlenecked by memory latency. Traversal speeds of vectors are bottlenecked by memory bandwidth. Only one of those metrics has been improving over the years.

Code: https://paste.ofcode.org/hYTf6PYPJegWLReSGQyUBg


That's interesting data. I guess I nerd-sniped you.

One use case where I think linked lists would be far superior would be merging two sorted linked lists. That can be done completely in-place with a linked list, consuming no intermediate memory. With vectors, you end up with an intermediate copy, either of one of the lists, or both. If you take the result of merging two lists, then merge it with a third list, then take that and merge it with a fourth list, etc, you can't really do this with a vector-of-lists representation that might come to mind.


> One use case where I think linked lists would be far superior would be merging two sorted linked lists.

Benchmark it & prove it. I expect you're still vastly underestimating how slow it is to traverse a linked list.

To visit a node in a linked list you must first load the node before it. Every iteration is a dependent load. You get no pipelining here, so every node lookup is ~100ns. Meanwhile for vectors each index is independent, so they are nice & pipelined. Yeah you're copying, but you've got vastly more memory bandwidth than memory latency.

Also to zipper merge 2 linked lists means updating 2 * N pointers. That's essentially a copy of the entire list right there, depending on the size of the contained object.


There are situations where you can have a linked list perform faster than a vector. It's good to consider linked lists if you have the time to think about your solution and profile multiple options... but the point is that memory latency is a huge hurdle to overcome.

For context: A cold memory fetch is ~200 cycles. That's around 10 incorrectly-predicted branches. That's around 100 correctly-predicted branches. You need to be skipping a lot of work to justify injecting 100s of cycles of lag inside a loop of some sort.

(And, of course, those situations would need to be the common, relevant case. Don't follow a 10x performance speed-up in a one-off situation at a time no-one cares about to justify a 5x slowdown in a critical, hot loop.)

If you have big objects, then this might also start to become a concern. However, in those cases, a vector of pointers (or structs of pointers + metadata) would appear as a new challenger. In that case, you would only take the latency hit if you actually need to go to the object (and those objects could be held in contiguous storage, etc. etc. etc.).

Chances are that, unless you have data for your specific case showing otherwise, vector will probably win. It should be the default unless you have a reason not to, and evidence to back it up.


Heh. I guess I have an interview question to update.

One of my quick warm-up questions is asking about what data structure to choose if inserting an element after an element you already have a reference to is a common operation that you want to optimize for.

Of course, I don't consider there to be a right or wrong answer, but just make sure that people can intelligently discuss it. Some people start off with vec or the like, and then we discuss how fast that would be compared to a linked list. I generally stop after we get to the O(n) vs. O(1) comparison.

But maybe if we have some extra time I should also talk about the tradeoff with actually traversing the list.

After all, I suppose this is why things like gap buffers are popular for text editors, rather than linked lists.


> So you are talking about 1500 cycles to....avoid a single cache line miss due to chasing a linked list pointer?

No I'm talking about 1500 cycles to avoid 1000 cache line misses. Ordering only matters if you're traversing, and traversing a linked list sure as shit isn't a single cache line miss as I'm sure you know.

> And a (moving) garbage collector is going to dynamically reorganize the nodes of a list, generally in a breadth-first way, depending on the marking algorithm.

We're talking about Rust & C++. There isn't a moving GC.

> You don't get to pick the application behavior. If it needs to a do a lot of inserts and traversals, it just does.

If it's doing that linked lists perform terribly. Traversals of linked lists that suddenly become a million long are insanely slow. That's dependent loads and cache misses all the way through. And even if your hypothetical GC works flawlessly giving you maximum compactness, you're still blowing likely half your throughput & cache size on pointers to neighboring nodes. Even more if it's a doubly-linked list.

> Sometimes lists are just the best damn thing.

It's potentially possible, sure, as I even acknowledged but you seem to have skipped. No, the argument is that they are rarely the right thing, so the focus on them being difficult in Rust is irrelevant.

If you actually know doubly linked lists are what you need then Rusty's unsafe is almost certainly not a deterrent.


Nobody here is talking about vectors of integers. From the first post, this is all assuming that the objects in the vector are "big".

A modern CPU has 200Gb/s of bandwidth. If the objects in your array are 200 Gb in size each, then swapping the position of two elements in a vector is going to take O(seconds) at memcpy speeds, while doing this with a linked list is going to take ~1 * 10^(-11) seconds.

That's a performance difference of 11 orders of magnitude...

You can make the objects in the array smaller (e.g. 200Mb each instead of 200Gb each) and your two-element swap would still be 10^8 times faster with a doubly linked list.

---

In the parallel threads, people seem to just be arguing that a vector is always faster than a list, without understanding "why" this is the case, and making claims about "what the hardware does" without doing basic rule-of-thumb calculations to verify their claims.


the good news is that a vector of any object can be reduced to a vector of integers by `Box`ing, and then you only incur a miss when you access the data, but not during every step you take to find the data

> A modern CPU has 200Gb/s of bandwidth. If the objects in your array are 200 Gb in size each, then swapping the position of two elements in a vector is going to take O(seconds) at memcpy speeds,

Not necessarily. If those items are aligned to a memory-page boundary, memcpy will probably do a page table update instead of actually moving the values in RAM.


If objects know where they are, why do you need sorted ordering? If you need sorted ordering, it sounds like you're (regularly!) iterating over it. But iterating over a linked list is incredibly slow.

This is where Big-O just really doesn't have a strong relationship with performance. Iterating over a linked-list is O(N), same as a vec. But in the real world it's nowhere close to as fast.

So then if you're not regularly iterating over it a bag implemented with a vec is likely a better option. Then just have a bool that tracks if it's sorted or not, and in the (relatively) rare case you need sorted traversal just sort() it first.


Ordered is not the same thing as sorted. If you do swap-removals, there's no way to sort the array back into the original order unless you also keep track of insertion time, which adds memory overhead.

Storing pointers in a linked list also adds memory overhead, too. Insertion time in nanoseconds is only 8 bytes - same as a single pointer on 64-bits. So half the memory usage as a doubly linked list.

You can get the best of both worlds - fast iteration and fast insert speed - by using a tree. Essentially use a b-tree as a list, and in internal nodes count the number of items contained in each child. Inserts traverse down the tree to find the insert position, insert there and then traverse back up the tree updating indexes. Each leaf should contain a small array of items, and when inserting you can memmove the existing items to to make space. Tuning the size of that array lets you balance the cost of memcpy vs the cost of pointer indirection & allocations on insert.

The whole construction ends up extremely efficient in practice for this kind of thing, since you get log(n) indexed lookup and log(n) insert time. I don’t know why they aren’t used more.


Trees don't give you fast iteration, though. You still end up with lots of pointer chases during iteration. The lookup is algorithmically fast, but it's even faster on the more CPU-friendly hashset/hashmap.

You can do what you describe without the tree by also having your linked list instead contain small arrays of items instead of individual items, but you then lose the property that you can stably point at an item in the list across inserts. And both then suffer from the problem that finding the element to remove (or insert at) is no longer O(1) as a result. Not as bad on the tree, of course, but you're into the realm of you really need to benchmark a specific implementation of the idea & outline the specific constraints to see if it really beats just the good ol' vector and/or hashmap.


B-trees[1] are pretty fast to iterate through as long as the branching factor is fairly large. If the branching factor is 100, you would only need to chase one pointer every 100 nodes (and in practice, even larger branching factors are common).

They wouldn't be used so heavily in filesystems and databases if they were slow to iterate through.

[1] https://en.wikipedia.org/wiki/B%2B_tree


Hashmaps don't help at all if what you want is a list - because the items aren't ordered. The point isn't to use a hashmap with incrementing integer keys or something wacky like that. Its to use a b-tree (or a skip list) with all your list items stored in their desired order in the tree. In reality iterating through a b-tree is fast because the depth is small (~5-10), and all the parents stay in CPU cache during the iteration.

Iterating through a b-tree with a large branching factor (and multiple items at each leaf) is significantly faster than iterating through a linked list in practice, because main memory lookups dominate the linked list performance. With a b-tree you only need to do a main memory read after iterating every ~100 items (or whatever the leaf size is). With a linked list as you're describing you need to do it every item. Inserting is still O(1) - the branch size at each leaf is a constant factor. And its a small constant factor (memcpy is faster than you think it is.)

You can make linked lists fast for arbitrary insertions by storing forward pointers that skip more than 1 element, and by storing multiple elements at each leaf. If you do you end up with a skip list - which is a data structure with basically the same performance characteristics as a b-tree but a bit less performance stability.

Linked lists with a single item per node are basically never the appropriate data structure to use for any application. The only time I've seen them used in a sensible way was through clever use of intrusive lists in a physics engine, to avoid allocations & smooth out any spikes in allocator load from reallocating vectors. Even then the justification was bit dubious. Oh, and I've seen them used in JS where its much harder to implement high performance, complex data structures.

Anyway, I highly recommend anyone who's interested actually benchmark some different scenarios. Its remarkable how much of a difference good data structures make. The big insights I've had are that memcpy is way faster than I think it should be, and memory reads are way slower than I think. Vectors do perform well in practice, even with arbitrary inserts because of how fast memcpy is. But that gets beaten out by well written b-trees if you're doing inserting at arbitrary positions quite quickly (usually around when you have about 10k items in my experience).


I appreciate this comment, as there was (semi) recently som comments/posts about how linked lists in rust is hard - but this makes it sound more like "a linked list in rust looks a bit different (and is safer) than a linked list in c" (or: "rust is actually its own language, work with the grain").

Rant disclaimer. Proceed at your own risk.

I want to like Rust, if only because anything has to be better than C++. C++ is godawful and it sucks that all these decades later it’s what serious big software is written in.

But I haven’t found the explanation of Rust and it’s trade offs and philosophy written by someone who can write serious Haskell and Lisp and C++ and is like “Rust. This is the way. Here’s how and why.”

I’m sure there are much better blog posts than this one that shed light on how parametric as opposed to ad-hoc polymorphism is natural in Rust with no runtime cost. Or how Rust’s nested angle-bracket hell to get a pointer to a piece of memory is actually a deep and profound algebra that exposes std::move for the fraud it actually is or whatever.

Where do I find the explanation of why Rust is less awful than C++ written by someone who has written a lot of C++ out of necessity, doesn’t take non-browser JS seriously, doesn’t think 8 boxes need kube, doesn’t stand to make consulting revenue off having been involved in Cargo, and generally uses C++/Python because they’ve got work to do?

Where do I go to see the light as someone who has some idea how brilliant Eich is and also knows how ridiculous it is to use JavaScript when you’ve got an alternative better than Lua? (I’ve written a JavaScript compiler, I know how unfortunate node.js and Electron are).

I want to be sold! Sell me!


Sounds like you might like Zig. It is not perfect either, but many things are done the proper way there. The only thing I really like about Rust it is that it tries to solve many problems at their roots. It failed in my opinion, but at least it tried to.

What’s the link that takes me down the Zig rabbit hole?

The Road to Zig presentation:

https://youtu.be/Gv2I7qTux7g

C but with the problems fixed.


Queued. Thank you!

It's the only language existing that is memory safe and provides zero cost abstraction to the CPU capabilities (i.e. no mandatory GC, values don't need to be boxed, ability to mutate memory).

The only limitation that may not eventually be removed is the lack of dependent types (although you can kind of emulate them by lifting terms to the type level and using lifetimes to represent variables).

C++ is not memory safe, Lisp has no type system and Haskell has GC, all values are boxed in a closure and it cannot mutate memory safely, so none of these languages are even in contention.


C++ compiled by clang-12, with clang-tidy and cppcheck turned all the way up, with everything owned by a std::unique_ptr, run through ASAN, UBSAN, TSAN etc (which seems to build about as fast as Rust) lets what by that Rust catches?

I’m not being sarcastic, in all earnestness educate me!


Does that warn if you do:

vector<int> v{1,2,3,4}; for (auto&& i: v) v.push_back(i);


In my little toy build I set up no, it doesn't warn me that smashing a data structure while iterating over it at the same time gives UB. I'm 50/50 on if there's a flag for that, don't actually know.

Again, in the spirit of becoming more educated, what's the equivalent Rust code? I have `rustc`/`rustup` on my box so I can run that too.


This should be about equivalent. It does not compile, as expected. https://play.rust-lang.org/?version=stable&mode=debug&editio...

Another fun feature of rustc is you can execute this command: rustc --explain E0502

It gives you a small example of what the error is, explains it, and tells you how to fix it.

This is a very nice feature for new people (like myself).


That's definitely useful, and I'm not completely mystified by the notion that simultaneous borrowing of something as both mutable and immutable is something that Rust watches for.

In C++ you (usually) do this with `const`, which is fairly low friction, very statically analyzable, and I'm a little unclear what bug hardcore borrow-checker static stops me or my colleagues from making that `const` can't catch?


> In C++ you (usually) do this with `const`, which is fairly low friction, very statically analyzable, and I'm a little unclear what bug hardcore borrow-checker static stops me or my colleagues from making that `const` can't catch?

Rust catches the bug where you forget to use `const`, because the C++ compiler doesn't force you to use it. Your argument boils down to "I don't need Rust's borrow checker, I just have to remember to use a dozen tricks, follow various patterns and guidelines, run several static analyzers and runtime checks, and I'm almost there".

This is not meant as an attack against you. I used to program in C++, and I'm mostly using Rust now. The borrow checker frees up those brain resources which had to keep all those tricks, patterns, and guidelines in mind.


Const in C++ is entirely unlike immutability in Rust. Even ignoring the existence of “const_cast”, C++ const is vastly weaker than Rust immutability in several extremely fundamental ways that should be immediately obvious if you’re a C++ professional with even a surface level introductory understanding of Rust. At this point you honestly should just take some time to learn a bit of Rust, and it will likely all make sense. If not, it’s possible you’re relatively amateur C++ developer (no shame in that) lacking experience of the many ways in which C++ const falls very short.

But to directly answer your question, suppose I have a “struct FloatArrayView { float* data; size_t size; };” Of course this is a toy example, but humor me for now and consider the following:

1. Without excessive custom effort (e.g. without manually unwrapping/wrapping the internal pointer), how do I pass it to a function in C++ such that the data pointer within becomes const from the perspective of that function (i.e. you’ll get a compile error if you try to write to it?) Hint: It’s very complex to do this in C++, vs trivially easy in Rust. And no, passing as const or const reference to “FloatArrayView” does NOT work. The only solution in C++ for complex composite types operating with “by-reference semantics” is ugly, complex, and horribly error prone to maintain correctly. For a more concrete example, consider how “unique_ptr” works with the constness of the value it holds. A “const unique_ptr<T>” is NOT a unique pointer to const T. A “unique_ptr<const T>” is, but the relationship and safe conversions between these are not simple or easy to implement or even use in many cases. It gets even worse when you need to implement a “reference semantics” type like this that is inappropriate to be a templated type, or contains a variety of internal references unrelated to the template arguments.

2. Now suppose I solve #1 and pass this into some class method, which then stores a copy of the const reference for later. But I as the caller have no way of knowing this. So after that method returns, I later proceed to modify the data via my non-const reference (which is a completely valid operation), but this violates the previous method’s assumption that the data was immutable (will never change). This creates an incredibly dangerous situation where later reads from that stored const reference to data (that was assumed to be immutable) is actually going to actually yield unpredictably changing results. Const is not immutable. Question: How do you make it so C++ guarantees that passed reference is truly immutable and will never be mutated so long as the reference exists, and enforce this guarantee at compile time? In Rust this is easy (in fact, it’s the default). In C++, it is impossible. The closest you can get in C++ are runtime checks, but that’s nowhere near as good as compile time checks.

Edit: Removed a bunch of perhaps unnecessary extra C++ trivia which I’ll save for later :)


This is actually funny because unlike C++, Rust doesn't have true immutable types. Instead Rust restricts mutability in two ways. One is immutable bindings, and the other is the shared references. Your `unique_ptr` is example is how it ought to work actually. That way you have the ability to parameterize by mutability which Rust cannot.

To be more clear, in Rust, the `mut` you see in bindings (like `let mut blah = ...`) is wildly different from the `mut` in `&mut T`. There was discussion on whether the latter should be renamed to `uniq` (because `mut` is misleading). In any event, Rust lacks `const` types. There is no `const T` like in C++. So something like this is impossible,

  std::vector<const int> vec;
You can't have just the elements of a collection to be immutable like the above example in Rust.

Secondly, a value of a type `&T` can be mutated internally (aka interior mutability). A function taking a `&T` type and mutating it behind the scenes is very whacky. AFAII, interior mutability exists only to trick the borrow checker using the `*Cell` types (which use unsafe internally BTW).


Here’s a simple point for you to consider. Let’s assume you and your team are 10x devs with perfect knowledge and use all the right tools. What happens when you guys leave or someone who isn’t up to your standard contributes, or is a junior dev?

Imagine all the best tools and all the best and memory safe features all rolled up to one called Rust.

Lastly, with C++ there are many situations where the memory unsafe actions are not caught until they are triggered dynamically. So like others have mentioned, you’d have to have the best test suite combined with the best of fuzzing etc.

Or you can just use Rust where nearly everything is caught at compile time with useful errors and when something funky does happen at runtime it will just panic before you cross into UB/ unsafe land


I've read your comments in this thread, and with all due respect, I think what you're missing is a very simple truth: Rust is safe by default. C++ is not. This is Rust's core value proposition. It is precisely the thing that permits AND encourages designing safe abstractions even if it internally uses `unsafe`. Rust's value is in allowing you to compose safe code without fear.

If this doesn't seem like a game changer to you, then you probably don't grok its scope and impact. I don't really know how to give that to you either in HN comments. I think you maybe need a synchronous dialogue with someone.


It is a pity that none of the replies to this comment has actually clarified your doubts...

Assuming the absence of `const_cast`, C++'s `const` ensures that a data of the particular type doesn't get modified. That's it. Rust on the other hand, tracks and restricts aliasing. This means you can't get both a constant and non-constant reference to an object or anything owned by it (more precisely anything transitively reachable from it). This is helpful in preventing issues with invalid pointers (like iterator invalidation). For instance, if you have already borrowed an iterator (which is essentially a reference) you can't issue mutable operations on the vector (issuing mutable operations entails a mutable borrow), and that could potentially reallocate the backing memory of the vector. This property is also useful for preventing multiple writes to the same location in a multi-threaded application, making programs adhere to the single-writer principle by-construction.

The downsides are that you might sometimes need to contort your code or your program architecture or perform mental/type gymnastics to write some trivially safe code or even use unsafe (or else lose performance) to satisfy the borrow checker.


How would one make e.g. in Rust then ?

   #include <vector>
   int main()
   {
     std::vector<int> foo{1,2,3,4,5};
     foo.reserve(foo.size()* 2);
     for(auto it = foo.begin(), end = foo.end(); it < end; ++it)
       foo.push_back(*it);
   }

The most short and elegant example I could come up with that actually worked (since the example above really shouldn't work in Rust in the first place):

    fn main() {
        let mut v = vec![1, 2, 3, 4, 5];
        let new_size = v.len()*2;
        v = v.into_iter().cycle().take(new_size).collect();
    
        println!("{:?}", v);
    }
https://play.rust-lang.org/?version=stable&mode=debug&editio...

There are a couple of options:

- Repeating the vector using the built-in method `Vec::repeat`. This allocates a new vector, but `reserve` is likely to do the same, just implicitly.

https://play.rust-lang.org/?version=stable&mode=debug&editio...

- Using a traditional for loop. There are no lifetime issues, because a new reference is acquired on each iteration. The reserve is not required, but I included it to match your C++ version. https://play.rust-lang.org/?version=stable&mode=debug&editio...

- Creating an array of slices and then concatenating them into a new Vec using the concat method on slices: https://play.rust-lang.org/?version=stable&mode=debug&editio...


Other comments have answered this specific question, and I think it might be interesting to look at a similar-looking question that's actually more problematic for Rust. What I'll ask is, what's the Rust equivalent of this:

    #include <vector>

    void do_stuff(int &a, int &b) {
      // stuff
    }

    int main() {
      int my_array[2] = {42, 99};
      do_stuff(my_array[0], my_array[1]);
    }
That is, how do we take two non-aliasing mutable references into the same array/vector/view/span at the same time. (To be clear, none of the following applies to shared/const references. Those are allowed to alias, and this example will just work.) Notably, the direct translation doesn't compile:

    fn main() {
        let mut my_array = [42, 99];
        do_stuff(&mut my_array[0], &mut my_array[1]);
    }
Here's the error:

    error[E0499]: cannot borrow `my_array[_]` as mutable more than once at a time
     --> src/main.rs:7:32
      |                                                                 
    7 |     do_stuff(&mut my_array[0], &mut my_array[1]);
      |     -------- ----------------  ^^^^^^^^^^^^^^^^ second mutable borrow occurs here
      |     |        |
      |     |        first mutable borrow occurs here
      |     first borrow later used by call
      |
      = help: consider using `.split_at_mut(position)` or similar method to obtain two mutable non-overlapping sub-slices
The issue here is partly that the compiler doesn't understand how indexing works. If it understood that my_array[0] and my_array[1] were disjoint objects, it could maybe deduce that this code was legal. But then the same error would come up again if one of the indexes was non-const, so adding compiler smarts here wouldn't help the general case.

Getting multiple mutable references into a single container is tricky in Rust, because you (usually) have to statically guarantee that they don't alias, and how to do that depends on what container you're using. The suggestion in the error message is correct here, and `split_at_mut` is one of our options. Using it would look like this:

    fn main() {
        let mut my_array = [42, 99];
        let (first_slice, second_slice) = my_array.split_at_mut(1);
        do_stuff(&mut first_slice[0], &mut second_slice[0]);
    }
However, other containers like HashMap don't have `split_at_mut`, and taking two mutable references into e.g. a `HashMap<i32, i32>` would require a different approach. Refactoring our code to hold i32 keys instead of references would be the best option in that case, though it might mean paying for extra lookups. If we couldn't do that, we might have to resort to `Rc<RefCell<i32>>` or trickery with iterators. (This comment is too long already, and I'll spare the gory details.)

At a high level, Rust's attitude towards multiple-mutable-references situations is leaning in the direction of "don't do that". There are definitely ways to do it (assuming what you're doing is in fact sound, and you're not actually trying to break the aliasing rule), but many of those ways are advanced and/or not-zero-cost, and in extremis it can require unsafe code. Life in Rust is a lot easier when you refactor things to avoid needing to do this, for example with an entity-component-system sort of architecture.


Use a crate that provides safe functions implemented with unsafe code to do that, like https://docs.rs/splitmut/0.2.1/splitmut/

Neat! I bet we could add a macro to that crate to make it work with any (static) number of references. A variant of this using a HashSet to check arbitrary collections of keys might be cool too.

let mut v = vec![1,2,3,4]; for i in v.iter() { v.push(i);};

And that doesn't work, because the v.iter() is sugar for Vec::iter(&v), while the v.push(i) is sugar for Vec::push(&mut v, i) . I think it'd use deref coercion on the i, as Vec::iter(&v) gives you `i: &isize`. If this wasn't ints (or other `Copy` types, for that matter), you'd need to use .into_iter() to consume the Vec and get ownership of the entries while iterating, or use `.push(i.clone())` because `Vec<T>::push(&mut self, T)` is the signature, and you can only go automatically from `&T` to `T` for `Copy` types. Actually, it _may_ even need `v.push(*i)`, thinking about it. Try on https://play.rust-lang.org


So I don't have the same intuition for desugaring `vec!` that I do for desugaring `for (auto&& blah : blarg)`, but in either case if you desugar it the problem becomes a lot more clear. The Rust borrow checker errors I'm sure become second nature just like the C++ template instantiation ones do, but that is faint praise. To get some inscrutable C++ template instantiation error you have to mess with templates, and that's for people who know what they're doing. In Rust it seems like the borrow checker isn't for pros, it's C++-template complexity that you need to get basic shit done.

C++ is actually a pretty gradually-typed language, and I'm in general a fan of gradual typing. I don't mind that some people prefer BDSM-style typing, but IMHO that goes with GC e.g. Haskell a lot better than it does with trying to print something out or copy it or whatever.


It's not the same as C++ template errors. This is something that will directly cause a segfault in your code, that the Rust compiler is able to catch; AFAIK no C++ compiler would be able to catch that.

The problem here is that you’re trying to iterate over `v` and modify it at the same time. The usual fix is to first decide the changes that should be made, and then apply them after the loop:

https://play.rust-lang.org/?version=stable&mode=debug&editio...

Alternatively, you can loop over indices instead of values, which doesn’t require the loop to maintain a reference to `v` across iterations:

https://play.rust-lang.org/?version=stable&mode=debug&editio...


just code in assembly then.

Batteries included with safe defaults is a BIG feature.

Legacy C/C++ code is littered with so many memory related bugs that big companies are funding people to rewrite essential unix utils in Rust.


No argument that legacy C/C++ code is Swiss cheese on 1980s memory bugs in a lot of cases.

Modern C++ vs. modern Rust. Why is modern Rust better? Legacy C++ can be transformed into modern C++ is a semi-mechanical, semi-you-emply-someone-good-at-emacs way, at a cost in time, money, and risk way less than empty editor.


There are, for me, two problems with ASAN and friends:

1) it requires a later execution, rather than your code failing to even compile.

2) it requires excellent and complete tests, probably with quality fuzzing, as often the corner cases (like hiting buffer size limit) is where problems occur.

For me, the argument for Rust is exactly that it let me (mostly) get rid of ASAN and UBSAN and valgrind and friends, which I consider required for C++.

The other big advantage of Rust is package management, with I feel C and C++ are awful at, particularly if you want to release for Linux, Mac and Windows.


The package management point is completely on point. CMake blows and mixing in autotools makes it worse.

I think I consider the need to run under ASAN in a configuration that gets code coverage a feature rather than a bug. It's stupid monkey-brain stuff, but the forcing function of needing to be able to drive your critical path at will in my experience leads to better outcomes.


Composability bites you badly here. You need that ASAN run, with full coverage, for the entire C++ system, any time any part of it changes, even if the changed element seems irrelevant and clearly safe, the way C++ is defined that doesn't mean it didn't make the whole system unsafe.

But since Rust's Memory Safety guarantees apply to components, the composition of those components also has Memory Safety by definition.

If you have one guy writing incredibly scary to-the-metal unsafe Rust to squeeze out 1% more performance from a system that's costing 85% of your company's burn rate, you can put that tiny component through ASAN hell, looking for any possible cracks in its Memory Safety with all sorts of crazy input states.

But the sixteen other people in the team writing stuff like yet-another REST JSON handler entirely in Safe Rust don't need that effort and when you compose all these pieces together to build the actual product, you don't need ASAN again, you checked that the scary bit was safe and you're done.


Oh God, CMake and dependencies are at least 80% of why I decided to learn Rust. I decided to give a shot to C++ back in November/December for a project I'm working on and it was ridiculously painful to do something like set up a simple library with unit tests using CMake, especially coming from writing Java in my day job where it's trivial to do this sort of thing. Starting from scratch with Rust about a month and a half ago, I've been more productive in that month and a half than I was in five months of banging against CMake and trying to figure out how to bring in external dependencies.

It's not that Rust catches these issues like a sanitizer or linter would, it's that they can't exist by design. The compiler is aware of potential type-, lifetime- and ownership safety issues, and you don't need to run any code to figure out if your code is memory safe or not; static analysis is enough.

Also, even if Rust only caught by itself what Clang and half a dozen analyzers would catch, you also get a really nice language as a bonus. There is much more to Rust than just safety.


So, I'm not really sure how the distinction between `rustc` and the government-issue `clang` chain really matters unless one is a lot faster, and they're both really slow.

The by-design thing I get for like something with a port open, but as a default? It's been a few months but the last time I was trying to hack a Rust project printing stuff out hit the borrow checker. I can live without that.


You've mentioned the printing issue a couple of times in this thread, and it sounds pretty weird to me as a someone who has used Rust for a couple of years. Without knowing the specifics I can only assure you that if you know the basics of the language, you'll have no issues printing whatever you want whenever you want.

Rust is admittedly a language you can't just jump into without spending some time learning the fundamentals. If you try to do anything non-trivial without grokking the basics of ownership and borrowing you are bound to run into frustrating issues.


He probably tried to do dbg!(foo), which takes foo by ownership and consumes it. Doing dbg!(&foo) will accomplish the same thing (printing foo along with the line number and file name), but because it only uses a reference to the foo, it won’t consume the original. This is perfectly sound and logical once you know the explanation, but it is pretty weird to a newcomer.

The project was alacrity and my checkout is on a laptop I don’t have on hand at the moment so unfortunately I can’t gist my patch, but I’ll have that box tomorrow and I’ll try to remember to do so. I was attempting to use whatever all their other logging was using. ‘&’ was the first thing I tried because I’ve done the tutorials and then some, but after that got the borrow checker arguing about my new line of code as well as two others I didn’t touch, I rotated through all the other sigils brute force, IIRC it was a combination of commenting out another line and ‘’ that got me a log line. Now maybe that’s on alacrity because maybe they do ownership in a weird way. But even in Haskell, even as a novice on stackoverflow, ‘unsafePerformIO’ will get you a line printed to STDERR, and it’s not 50-80s to build 50-ish kloc every time.

I like learning programming languages, I like working on compilers, this stuff is basically my only hobby.

But as the TF2 people are learning the hard way, if a complete novice can’t get a log line in without understanding a bunch of novel context, PyTorch is taking your SO out to dinner pretty soon.

Now if Rust could build a million lines of code in 1-5s like it should be able to (and like it seems that e.g. Jai can) I wouldn’t care. I needed to get stuff done in C++ bad enough that I learned template metaprogramming, which I wouldn’t wish on an enemy. If Rust made my builds 50-100x faster in low optimization settings, I’d write it on clay tablets.

But what I actually get is a comparably slow build, a comparably “!$&@>>>>”-heavy syntax, yet another failure to do macros that impress a Lisper in a curly brace language, and now the static analyzer’s usually good advice is mandatory and can’t be turned off to do a quick experiment.

I always have to watch my cynicism because I’m a bit autistic and I offend way more often than I mean to, but if I’ve got 1MM lines of C++, I’ve got a pain in the ass on my hands. If I start porting any meaningful part of that over to Rust, now I’ve got two giant pains in the ass.


Initial builds are indeed quite slow, especially when a project has a lot of dependencies. However, in most cases further builds are not that bad. I have never built Alacritty from source so I don't really know if there are any specific issues with it, but from my experience incremental builds take just a few seconds in medium sized projects.

Additionally during development a full build is usually not necessary all the time; oftentimes it's just enough to check the code for errors with `cargo check`, which is usually significantly quicker than building a full debug binary.

Edit:

I tested building Alacritty from source on my computer (Ryzen 5600X, Windows 10). The initial build (consisting of 130 crates) took 34 seconds. Building again after adding a single print statement took 6 seconds, and `cargo check` took about half at 3 seconds. It's not ideal by any means, but IMO it's not also unreasonable for 25K lines of code, plus those ~130 dependencies.


> It's not that Rust catches these issues like a sanitizer or linter would, it's that they can't exist by design.

Not saying this is true about Rust, but I find these approaches dangerous because they can move bugs off to a different meta level where they're harder to detect.

For instance, some people make safe C dialects where signed integer overflow wraps around or variables are auto initialized to 0. Well, what if you didn't want to overflow at all or 0 was the wrong value? Now you can't find the bug because the program has been redefined to do the wrong thing instead of simply crashing.


I get how those concerns would apply to a C dialect, but not really to Rust. Since Rust is an entirely new language you shouldn't expect it to work exactly like C, unlike you would expect of a C-derived language.

Most of Rust's safety features are handled at compile time, and they are usually handled explicitly, not implicitly. A few examples about initialization:

* An instance of a struct is created by giving an initial value to each field. Leaving out any field is a compile error. Since structs are always completely initialized, you can't accidentally pass references to partially initialized structs like you could in a C++ or Java constructor.

* All local variables must be initialized. Accessing a variable before initialization is a compile time error.

* Structs have no implicit default value. The default value can be defined by implementing a trait, either by yourself or by deriving it with a macro (which would default integers to 0 and so on). Default values are only used by a fixed set on functions in the standard library, and never implicitly.

The overflowing case is one of the only cases where Rust is a bit more implicit. Integer overflow is checked at run time in debug builds (so over/underflow causes a panic) and defined to wrap around in release builds. However, the standard library also defines methods on numbers which explicitly use wrapping, saturating or checked (either return a Result value or crash) semantics.


What you've described is a way of catching some memory-safety problems in C++ codebases. There's no easy way to catch them all. Even Chromium is riddled with memory-safety issues that result in serious security vulnerabilities. [0][1] We don't have a practical way of writing large and complex C++ codebases that are entirely free of memory-safety issues.

I don't know enough about Rust to comment on whether it does a better job than C++ on reference cycles, but my suspicion is that it does.

[0] https://www.chromium.org/Home/chromium-security/memory-safet...

[1] Related discussion: https://news.ycombinator.com/item?id=26861273


Chromium is a *way* legacy codebase, I think WebKit goes back to like Konqueror or something? Chromium is a very weird example to cite for modern C++ vs. modern Rust memory safety.

AFAIK avionics software is still largely written in Ada, because it won't let you fuck up meters vs feet type stuff. And if someone said: "Rust has a slam-dunk niche: we're going to crank static analysis past helpful to downright intrusive because sshd simply can't buffer overflow", I'd be like, yeah, ok.

But at the time I stopped using it, Alacritty couldn't handle meta keys on Big Sur, and I wanted to fix that, so I spent a weekend or two that I really couldn't spare trying to unfuck it, but between `print` not being obvious (because someone had already borrowed the thing I wanted to print out) and the build being slower than C++ I timed out.


OTOH Chrome has one of the best teams in the world working on it, funded by one of the richest companies in the world, with the best tools. And they take security very seriously.

If they can’t get it right, who can?


A good point, but I imagine benreesman's counterpoint would be that things might be different if Chromium were written entirely in modern C++, strictly following modern best practices.

My suspicion is that this is too optimistic, but I can't really substantiate it.

What would be a good security-sensitive modern C++ codebase, ideally from a high-profile source like Google, to compare against?


Doesn't the definition of modern C++ change every few years? This might not be possible if any such project takes more than two years to write.

Cool that you use all those tools. Wouldn't it be cool to be able to depend on libraries knowing they all use those settings? And you can depend on libraries by putting a single line in a toml file.

I mean C++ dependency management is so bad there is a concept of a header only library so you just have to copy the file into your project with no support for managing versions.

C++ dependency management is so bad people use the OS installed dependcies because they struggle to set up a build environment otherwise.

What I'm saying is that C++ is pretty good and toe to toe C++ with all the tools you describe and Rust are fairly comparable. But with Rust the whole ecosystem uses it so the network effects make things much better. And it's snowballing.


You make a really good point about C++ headers. Textual inclusion of globs of bytes into translation units is 1970s legacy stuff that totally fucks up reasonable build times, and for all the talk about C++20 modules they haven't delivered yet.

I actually think this is one of the things that Rust could crush C++ on, because build times are becoming the whole show on big C++ projects.

It's super weird to me that C++, with this whacky 1970s constraint that totally fucks up modularity, still builds neck-and-neck or better with Rust.

In principle Rust could do *way* better on this, and that is a feature that might get me to consider Rust seriously, no matter how stupid I think the borrow checker is. But Rust still compiles dog slow...


I find if you're aware about how you define your modules, incremental compiles are usually pretty quick. Yes a complete build can take a while but tools like sccache[0] can help with that in CI pipelines and when getting a new dev environment up.

[0]: https://github.com/mozilla/sccache


>It's super weird to me that C++, with this whacky 1970s constraint that totally fucks up modularity, still builds neck-and-neck or better with Rust.

Not so fast! The compilation and linking is neck and neck with Rust so you can have an ELF in the same amount of time. But then you have to run all the C++ tools you mentioned to have parity with what the Rust compiler is doing.

> for all the talk about C++20 modules they haven't delivered yet.

Even when they're delivered, will all the libraries you depend on use them?

>In principle Rust could do way better on this, and that is a feature that might get me to consider Rust seriously, no matter how stupid I think the borrow checker is. But Rust still compiles dog slow...

The team that built the Borrow Checker deserves the Turing Award. It is an outstanding bit of computer science and engineering. I'm surprised that you think it's stupid.

It used to be dog slow but it's gotten very fast in the past two years. Another achievement.

>In principle Rust could do way better on this, and that is a feature that might get me to consider Rust seriously

Modules? Your take away from all this is that modules are the killer feature of Rust and that the Borrow Checker is stupid?

I call shenanigans. You are an elaborate troll. Well done for getting two responses out of me.


I have the opposite problem with rust dependency management, its too easy so you get deep widely branching dependency trees by differing authors with differing licenses. From a corporate use perspective this is a legal nightmare. From a security standpoint, this is a trust nightmare.

In fairness, you have those problems no matter how you include other people's code. At least Cargo as far as I'm aware uses a SAT-solver to deliver you a working dep graph unlike `pip` or `npm` or any of that nonesense. (as long as I remember correctly that Carl Lerche wrote that part. Sidebar: wicked smart guy).

You do, but clunky dependency management meant fewer less granular dependencies.

For licenses there's cargo-deny https://github.com/EmbarkStudios/cargo-deny

On the security standpoint and a wide net of crates.. that's a problem, same as with npm/yarn/pip/whatever and I agree on that. That one bugs me as well. Difficult to audit.


Cargo-deny is accurate, but it accepts no legal or financial risk if it was in some case wrong and you infected a codebase with gplv3, and thus corporate lawyers everywhere ignore it.

Edit: this post keeps getting up and down voted. I suspects it has to do with the phrase infected by gplv3. I like the license fine, but being realistic many businesses treat it like radioactive waste, so that has to be a consideration


Isn't that a complete drag? Needing to install a dozen different tools just to do something basic as maintain security? There is no way you can hold every C++ project up to this standard. If you asked me to write software in C++ I would forget to use half the tools you listed and your list isn't even complete to begin with!

Of course this is not an answer to your comment.

Even if you use std::unique_ptr you can still run into use after free bugs. std::unique_ptr is just heap allocation with a "stack allocation style" lifetime. You can still run into use after free by putting a reference to stack allocated data (this applies to std::unique_ptr as well) into a struct or global variable that outlives the stack allocated data. std::unique_ptr isn't really meant to solve use after free, it exists to ensure that heap allocated data is properly freed eventually. It saves you the discipline to match every new with a free but nothing more.

This is something that you can only solve with a borrow checker. If you were to add the ability to detect these types of problems in C++ then you would have essentially added a borrow checker to C++.


This stack would catch _most_ problems, but not necessarily at compile time. It's unsound, because it's still possible to write data races. And it's possible that two components, each on their own correct, combine to make something that isn't correct.

Rust's borrow checker makes it impossible to write code containing a data race. It does this by only accepting a subset of possible safe programs, but this is (or at least may be) a reasonable trade-off due to the guarantee you get.

It's not that you can't write safe code in C++, but you're always at the mercy of your validation to try to have confidence that you've actually managed to do what you set out to do. While Rust obviously doesn't preclude bugs at all, but it does enforce that you _can't_ write a certain class of previously-prevalent bugs.

I think this blog post from Mozilla does a good job of what that means in practice: https://blog.rust-lang.org/2017/11/14/Fearless-Concurrency-I...


> It does this by only accepting a subset of possible safe programs, but this is (or at least may be) a reasonable trade-off due to the guarantee you get.

I think this is correct but would expand on the nature of the tradeoff. Rust only accepts a subset of safe programs, and can express any program in the abstract, but that subset may not contain the optimal safe program. In some cases, differences in practical performance between the expressible safe program and the optimal safe program can be quite large.

Modern C++ cuts a tradeoff along a different dimension: it will accept unsafe rubbish but the optimal safe program is also expressible in virtually all cases. C++ has large niches where it can express optimal safe programs, and where this optimality is important, that Rust cannot. Obviously it is incumbent on the programmer to write safe code in C++, but the flexible type system is more capable of enforcing safety than I think people expect, particularly in recent versions of C++.


Agreed.

But there's optimal and there's "optimal" -- by restricting the developer, we open up more opportunities for the compiler to optimise the code (for all that Rust isn't necessarily able to take advantage of these opportunities yet) and a judicious use of `unsafe` may even help us to build an abstraction that gives us the optimal program anyway. Crucially, code that uses unsafe is still no less safe than ordinary C++ code.

From my days as a compiler engineer, I definitely prefer to write clear code and fix the compiler rather than try to convince the compiler to produce the right result by mangling code. And to build my control logic as an abstraction, even if it's only going to be used in one place, so it can be verified separately from the business logic it drives.


>impossible to write code containing a data race

https://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=Rust+data+r...


OK, I'll grant I should have added scare quotes to "impossible".

A quick check suggests that (as would be expected) they're caused by `unsafe` blocks. Specifically, by the developer claiming that a value was safe to send to a different thread while it actually wasn't. One thing to note is that these bugs neutralised the compiler's validation is these specific cases -- allowing, rather than causing, races. Using unsafe means the developer needs to ensure they maintain the invariants themselves within the unsafe block, while in C++ one has to maintain these invariants in _all_ code.


ASan, UBSan, etc. are absolutely life-changing, no doubt about it. But there's a pretty big difference between catching things statically, and catching them at runtime. If you have an issue that, say, only triggers on odd-numbered Tuesdays on Windows 7, your tests suite is probably going to miss it, and ASan isn't going to be able to help you.

Here's an example from one of Herb Sutter's talks, which I refer to frequently. He describes an issue that can come up with shared_ptr in reentrant/callbacky code, and the punchline is that you have to be careful never to dereference a shared_ptr that might be aliased: https://youtu.be/xnqTKD8uD64?t=1380

I also just wrote a giant example of my own :) in reply to your toplevel comment. https://news.ycombinator.com/item?id=27168368. TSan will be pretty good at catching some of the simple variants of this, like where you totally forget to take a lock. But it's going to have a hard time with cases where you "stash" a reference longer than you should, because later uses of the reference might appear safe for a while yet become unsafe down the road with unrelated code or timing changes.

Another issue with ASan and UBSan, is that every application has to run them (and maintain its own test suite comprehensive enough to make them useful). With static checks, a memory-safe library is memory-safe for everyone, even for applications that aren't particularly careful with their tests. The library author is able to express their safety requirements through the language, and all callers have to respect those requirements all the time.


Everything owned by unique_ptr is not realistic, some times you need shared state. Rust provides a safer alternative to references if you don't want to go all shared_ptr. Even Rc becomes better than shared_ptr due to single mutating owner limit, especially for multithreading. Even with clang-tidy turned to 11 there are still many other UB-landmines lurking not only related to memory ownership and you'd be fighting the compiler even more over contradicting requirements or opinions.

All the tooling you've just listed adds build complexity, on top of a build system that is already an abomination of makefiles, CMakeLists and lack of package manager. Let's not forget all the header files duplicating half of your implementation.


My favourite three compared to years of Haskell experience are:

- runtime predictability. Our async high-load programs are already complex, and having a much more predictable runtime (no GC) makes things feel much more straightforward

- lack of bracket need. You get destructors from your types! No more async-exception or other accidental leaks

- no exceptions. The “?” syntax is truly genius, Holy Grail of Haskell error handling coming to life


> Where do I find the explanation of why Rust is less awful than C++ written by someone who has written a lot of C++ out of necessity,

The O’Reilly book by Jim Blandy and Jason Orendorff. It is a really fantastic book on a programming language, the best I’ve ever read. The new edition is due this year it seems.

https://www.oreilly.com/library/view/programming-rust-2nd/97...


Ordered. Thank you!

Seconded, this book is so good in demonstrating Rust’s advantages as well as low level programming concepts with its beautiful visuals.

Yeah, I also like it better than the official Rust book. My favorite programming language book is still K&R's C book, though.

K&R is excellent. One of the things that makes it stand out would be almost irrelevant today, it has an excellent hand-made index. If you're wondering about, say, arithmetic conversions, the index gets you straight to the page which lists exactly what you needed. Wondering when to pick enum over #define? Again, the index has your back.

Today online you've got full text search, maybe you find two, three hits that are irrelevant, but it's so cheap you barely care. And in a book today the index is probably auto-generated (hiring somebody to write an index is not a thing these days) and so it's almost useless with dozens of irrelevant entries, but hey, like I said, full text search, so who cares?

Because C is such a small language and it was essentially finished when the book was written, they get to do a pretty complete survey while also teaching you, so you read the book once, now you understand C pretty well. The Rust book is much better than C++ books I tried, but because Rust is still immature there are big sections that are being rewritten or have already been rewritten, and of course the whole book can't be reordered and started over each time, so overall it's uneven.

I am reluctant to buy a printed Rust book because of that immaturity. My (second edition) K&R is still a pretty good survey of the language. Are there things it doesn't cover? Yes. But few of them are fundamental, whereas I feel like if I bought a Rust book today, in five years it's a historical curiosity like my Stroustrup, except hopefully better written. I still consult my K&R a few times a year, I don't even know where the books I own on other languages (including the long obsolete Stroustrup) are, I might not have unpacked them after moving years ago.


If you take a look at the Rust book from five years ago and today, it's actually 40 pages slimmer (540 pages for the first version, 500 for the second) because most of the examples for some more advanced features Just Work without needing those features explicitly, and some other things got simplified. None of that stuff is wrong, and is still useful at times, but is used less often.

The core feel of Rust is very much the same today as it was then. Just things got smoother overall. We'll see in ten more years.

(This is on my mind, as we'll eventually be releasing an updated book for Rust 2021, at some point. Probably early 2022. It will grow in size again, mostly due to adding stuff about async/await, which is a major new addition to the language, but only needed in certain circumstances.)


Honest question: what views does one has to have to doesn't take non-browser JS seriously, and at the same time use Python? And what's unfortunate about node.js compared to it?

Python has (more than one actually) trivially seamless FFI stories down to native (read: C++) code. I personally like `pybind11`, but there are others. Python is basically a convenient way to invoke numpy or Tensorflow or Torch or whatever. It's got numpy/scipy/sklearn/BLAS bindings etc., and is for better or worse how you do scientific computing in the mid-to-upper-mid range these days (the HPC folks have their own whole story).

It's a pain in the ass in it's own way, but you're kinda stuck with it if you want to do mainstream ML.

As a language it's dumber in some ways that JS, definitely slower in some cases, but also way less weird. Prototype chains and stuff are a cool demo that BE knew about Self and shit in the week he had to write JS, but anyone who claims they like that has Stockholm's syndrome worse than someone who thinks Ruby metaclasses make sense.


I've been using python for many years, but these days I would prefer to use JS (or better, TS) even for backend. I find it more enjoyable to use than Python, and faster. Sure it has its warts, but if you write idiomatic code they are not much of a problem (e.g. I don't think there is a single class in the code I write, so prototype chains are a non-issue)

I can say that if someone is sold on Python, you’re not going to convince them to move to Rust but you might convince them to move to Go. Why? Go provides better performance than Python but without the language and keyword complexity of Rust. There’s less language to learn.

That said, the more interesting question is TypeScript vs Rust. I can see how the complexity of C++ naturally leads folks to Rust. But I’ve had a hard time choosing Rust over TypeScript when trying to implement a basic service.

It’s absolutely true that Node.js web servers aren’t optimized for multi-threaded data access times and the Node.js event loop can get in the way of responsive performance.

But at the same time once you build, for example, an HTTP 1.1 web server for Node that is similar to what you’d build using semi-unsafe code in Rust and ships by default in Go standard library, it’s hard to say that JS is any less efficient than Rust when both are generally written to use async/green thread runtimes, make calls using a pool of Postgres or database connections, and either have PG return JSON or otherwise reshape and assign the results to a data model.

Given more computing power, Rust obviously wins, but if you’ve a single thread to work with, and a small buffer so after each web request is served, GC can do its thing… where are the tangible Rust benefits besides maximum speed?

I suppose I’m cheating by suggesting you could use Actix to power your Node application, but the problem I’m facing is that Rust is really complicated to learn and there are few frameworks that make it easier. The amount of published and shared knowledge there is around Go, PHP, TypeScript, Python and Ruby is enormous. Yes, Rust is clearly easier to learn than all the warts of C++, but it feels like Rust hasn’t gained enough traction to convert TypeScript fans, for example, or at least those building APIs and front ends.

I’d really like to be convinced to use Rust because of how easy it is compared to TypeScript but I haven’t seen anyone actually suggest that in practice. Go? Yes, it’s easier to learn than to understand both JS and TS and you can put up with the lack of generics by having more verbose code compared to TypeScript. But Rust? Is it just the case that these other languages are more mature such that all the blog posts have been written?

Also note that the hardest problem when getting started in any new language is package and framework selection. Unless you follow a book that has already picked a handful of packages such that you’re happily falling into the pit of success, I recognize that all languages (except maybe Rails or Spring) force the user to make hard choices upfront about their dependencies. But the only book I’ve seen that presents a coherent narrative for Rust APIs is Zero2Prod[1] and I’d feel more confident if there was at least one other similar set of learning materials.

1. https://www.lpalmieri.com/ for the book published as blog posts, https://www.zero2prod.com/ to buy.


For me, it's the overall experience. Ie: auto-generated docs for libraries, nice package manager, built-in linter and formatter, easy-to-install toolchain[s], high-level coding patterns etc.

That's a fair-sounding point. Java had this (but also a bunch of other problems).

For one thing, it's the only non-research programming language that can guarantee at compile time your program free from data races.

It seems like in your case the easiest path might be to learn how the language works and see it for yourself. Otherwise, I think these posts are roughly in the genre you want:

* http://dtrace.org/blogs/bmc/2018/09/18/falling-in-love-with-...

* http://dtrace.org/blogs/bmc/2020/10/11/rust-after-the-honeym...

* https://gregoryszorc.com/blog/2021/04/13/rust-is-for-profess...


Bookmarked. Thank you!

That last article is a great comprehensive summary of Rust's features. Will be passing that around, thanks.

When my Rust code successfully compiles, it usually works. Frankly in terms of user experience it's selling point #1. I could talk about the borrow checker, trait-based generics, native build system and dependency management, performance, relative ease of writing multithreaded code etc... But thinking about it the thing I like the most about coding in Rust is how hard I can lean onto the compiler and type system to spot mistakes and let me know about them.

C++ is not even close in that respect. Duck typed generics, lack of borrow checker, lack of sync/send markers for multithreaded code, easy-to-abuse overloading (that's even abused in the standard library, so there's no escaping it), legacy cruft that springs up in unexpected places, non hygienic macros, exceptions and frankly I could go on for quite a while.

I think for some of us Rust is effectively the promised land. I learned some Lisp and Haskell but I could never really get into them seriously because despite some cool language features I just don't like paying the performance costs of a heavy runtime, dynamic typing and garbage collection. So instead it was C and C++ for better or worse. But then Rust came out and said "hey, how about having your cake and eating it too?" and I never looked back.

But of course many people don't share the same objectives. If Haskell and Common Lisp are what you consider to be your baseline I can definitely see how Rust's very strict type system would be seen as an annoyance, especially if you don't mind paying some runtime performance cost for the sake of simplicity and speed of development.


You seem like someone who could write the blog post I'm looking for.

I appreciate the encouragement! Maybe I will, but first I need to write a custom blogging system. In Rust of course.

> When my Rust code successfully compiles, it usually works.

I think it was on a previous HN story where Rust programmers claimed that to get around borrow checker errors they just clone things. Now, the code may compile and technically be correct, but it hardly exemplifies the attention to detail normally required by a systems engineer.


I admit that I sometimes do this myself, but it's usually a last resort when I deem that a more elegant solution would probably end up being a lot more complicated and probably not a whole lot more efficient since the cloned data is only a few bytes. I use Rust on embedded devices with weak CPUs and only a hundred megs of RAM so I tend to be careful with cloning. Dealing with a lot of cloned data is also a good recipe to end up with inconsistent internal state.

It may be a bit of a No True Scotsman but in my experience hanging in Rust IRC channels and similar places, the devs who tend to clone and slap `Rc<RefCell<_>>` all over the place tend to be those who come from garbage collected languages, simply because they're really not used to thinking about ownership.

People who come from C, C++ and other non-GC languages are effectively already used to dealing with ownership and lifetimes, it's just that in those languages it's up to the developer to keep track of them. In terms of overall architecture I don't find that I write Rust much differently from C++, I use RAII most of the time and when that doesn't work out I'll carefully chose between cloning or some smart pointer/container.

IMO Reference counting and cloning should be the exception, not the rule, and I'd go as far as saying that if you find yourself cloning and boxing data all over the place in your Rust application you're either working on a very atypical application or you're doing it somewhat wrong.


> I don't find that I write Rust much differently from C++

Same here, though I do find myself deviating sometimes. Specifically there are cases in C++ where I could write code that makes less copies or handles data more efficiently but I don't because the approach would be more error prone. In rust these more risky approaches simply result in having more compiler errors instead of crashes or other runtime bugs.


As a Rust proponent I’d agree with this take 100%.

There are a lot of programmers who approach new languages by trying to write in the style of their previous language. That’s not really optimal, but it can get you up and running quickly producing passable results for a wide variety of (previous, next) language pairs.

Rust is not one of these languages. If you don’t try to “get” Rust and continue working across the grain, you’re going to have a rough time of things. Rust brings along an entirely new concept of explicit ownership that’s implicit in other languages, and if you simply try to push forward without internalizing those restrictions (and understanding their implications on how to structure your programs), the Rust compiler will never stop pushing back against you.

While there are legitimate reasons to need Rc<RefCell<_>>, to Box things, and to clone(), the vast majority of Rust code doesn’t need to fall back on them. If you find yourself instinctively and repeatedly reaching for these as a quick workaround for borrow checker complaints, it’s going to be much better in both the long and medium term to try and understand what the compiler is trying to tell you about your overall approach to design.

This is, I think, the biggest area where Rust evangelists can make some big strides. There are a lot of program designs that compile just fine in other languages but that Rust rejects (or pushes back against) for Very Good Reasons. And getting new developers over that hurdle where they understand how to internalize enough about ownership upfront that they can come up with approaches that work with the grain is a hard problem, and I’m not sure how to solve it. There’s a moment where something clicks and Rust goes from being a fight against the borrow checker to something where your designs just work out of the gate and I don’t know how to get that click to happen quicker.


> IMO Reference counting and cloning should be the exception, not the rule, and I'd go as far as saying that if you find yourself cloning and boxing data all over the place in your Rust application you're either working on a very atypical application or you're doing it somewhat wrong.

I strongly disagree. For example, I'm working on a project that shells out to a command line program to get a graph structure, and then computes things based on that structure. I want the code that traverses the graph to be efficient, but a handful of clones is trivial compared to opening a subprocess. In my case the clones are in the constant part of the big O.

Sometimes clones can let you have a more efficient algorithm as well. Reference counting is pervasive in async primitives, and (for very specific io-bound usecases) async can sometimes be more efficient.

It comes down to knowing your domain, althorighms, data, constraints, etc. and picking the right choice. I don't think you can say Arc is always a sign of a mediocre programmer.

I've never programmed embedded, but it makes sense that you've got strong memory constraints. One thing I like about Rust though is that there's a strong culture that just because you have to walk uphill through the snow on a tiny device doesn't mean you get to look down on those of us who get to run our code on a server.


Long time C++ user here.

Precisely right (for me!): It is easier to get to compile because it is easier for me to express and easier to debug type / syntax problems. Once compiled, it has a very high probability of just working.

I'll add that the exercise of learning rust has distilled the lessons learned in writing "good" C and C++ into a language that has few other ways of doing things. It's like someone is guiding me through good multi-threaded, memory management, and type design practices in C++ every time I work on a Rust project.

This experience absolutely translates back when I'm working on C++ and C.

Rust has huge potential it it can keep its libraries "upstream" of other languages. C is king of providing a one-implementation-to-many-languages (though API wrapping). If Rust can accomplish that, it'll be my language of choice. As much as I love the Unix philosophy of many interacting programs, the reality is for most systems that dynamic or static linking is how things interact.


Presumably you've read enough to know that memory safety and UB are Rust's biggest selling points? If that hasn't convinced you already, I'm not sure what would. Just keep using C++.

(I use C++ and I love it.)


I have forlorn hopes that Rust will fix build times and I'll tolerate the abusive relationship with the borrow checker if it can move the needle on that.

Fighting with the borrow checker is just a phase you have to get through, it will click. The need to write anything that can't be done without unsafety is rare (probably never unless you are writing a data structure library).

> UB

Undefined behavior? As in, you can confine it to `unsafe`?


Unsafe Code isn't allowed to have Undefined Behaviour either.

What unsafe does, is it says to the compiler:

"Hey, I know you're just a dumb machine, and you couldn't prove this code obeys all the rules that keep Rust safe, but I am very smart and I promise it actually is safe". You're also encouraged to explain yourself (to other humans, not the compiler, this isn't Wuffs) so that anybody else maintaining the code can see why exactly you believed this was OK.

Over the years, some code which was once unsafe in Rust no longer needed an unsafe block, because Rust's compiler got smarter, "Oh, I see why this is OK".

But if your unsafe code has Undefined Behaviour that's actually a serious bug, it's not what unsafe is for at all.


That's what I thought, or at least suspected, so not sure why "UB" was listed as a Rust selling point.

I am not the poster in question, but I assume that they meant Rust avoids Undefined Behaviour and this is indeed a notable selling point, compared to low-level languages.

The column for the Rust language on this page (https://scattered-thoughts.net/writing/how-safe-is-zig/) with the rows of "compile time" for various memory issues is what is selling me on Rust that I am spending time with it hoping to sacrifice some difficult current learning time and some future tedious development and compile time to save on painful debug time for some kinds of issues. The problem is that the pay-off is at the end :-).

So what I like about your comment is you’re clearly more pragmatist than zealot or ideologue.

So I don’t know Lisp or Haskell. I never learned in the college years when I might’ve had time and now I just don’t want to pay the years long learning curve. I’ve got shit to do.

But the big problem (for me) is I’ve completely turned off dynamic typing. Python did this to me and i I honestly don’t think I’ll ever go back. I refer to dynamic typing as “unit tests for spelling mistakes”. YMMV.

Another issue is I like simple grammars and relatively opinionated languages. I’m not the biggest fan of GC either. Years of tuning GC pauses of JVM have soured me on the boundless benefits of GC.

I give this as context to say that I like Go. You can learn Go in a really short amount of time. There’s usually only one way to do things in Go. That’s good. Less surprise and less arguments in teams.

An issue I have with C++ (and this applies to Lisp and Haskell too) is it allows people to be too clever for their own good. Look at 200 lines of Lisp and there might be a C compiler buried in there. Who knows? That might demonstrate the power of the language but is usually counterproductive to a team or project.

So memory safety is the big selling point of Rust. You need to buy into that and why things like static typing are good even though you might struggle with the type system and borrow checker at first.

Is better to fight the compiler than be surprised by the runtime (IMHO).

But Rust of course is not issue free and it differs from some long-standing and unfortunate design choices early in its life that are going to be hard to correct. Build times are a big one. See this [1] for more.

For the last few years I’ve written Hack (aka PHP) professionally. One thing one comes to really appreciate is cooperative async-await, which is pervasive. Writing C/C++/Java has taught me that of you every manually spawn a thread you’re going to have a bad time so you should do anything in your power to avoid this. Go is one flavor of alternative. Hack is another. I’m not sure I’ma big fan of Rust’s.

Ultimately, with Rust out any other language for that matter, you need to learn and buy into the idioms otherwise you’re going to have a bad time.

[1]: https://pingcap.com/blog/rust-compilation-model-calamity


I had an awful time when I started using Rust because I had a strong (object oriented) view of how to structure my code. At one point where nothing seemed to work and I didn't manage to get anything to compile I just was like: "ok, I give up Rust, I will do it your way".

From then on everything worked surprisingly easy.

The 3 main things that make Rust great (besides thenusual selling points) are IMO:

- the library manager/dependency managment is great

- Even if you are never gonna use Rust again, some of the lessons you have to learn when learning Rust are universally useful. Thinking about ownership and race conditions the Rust way will certainly help you tackle some parricularily hard issues with multithreaded code in other languages as well.

- The amount of things you can rake as a given within Rust code is very high. If it compiles it usually works. If it doesn't it gives helpful indications why. These indications help you to understand your code better.


After years of dynamically typed / interpreted languages, I decided I wanted to learn a statically typed language and so I to set out to trial both Go and Rust. Rust was a painful experience. I kept trying to pass around simple types and constantly hit compiler failures.

I then set up Go and got started. Fell in love with it, I was immediately 10x more productive then I was with rust. I also found reading other people's code far more easier as well, where rust would have all this (for me) difficult syntax.

I want to like rust, but I honestly don't think I am smart enough.


Coming from dynamically typed high level languages, you're learning two things at the same time when you're learning rust:

1) Low level programming 2) Typed programming

Both of which are difficult.

I would suggest learning one of Haskell or Scala and one of C or C++ first, and then trying Rust again.


As I said, I am getting on great with go now.

Rust is sometimes a painful experience because it forces you to deal upfront with an important concept other languages play fast and loose with.

You can absolutely get productive more quickly in other languages as a result. But to me, developers who take the time to get over that hurdle by and large become dramatically more productive in the long run, and that’s before you take into account the fact that their code will have near-optimal performance and be virtually bulletproof.

There’s a reason we’re regularly seeing best-in-class general purpose tools and libraries coming out of the Rust ecosystem.


The thing for me was: I realized the type system is not something to fight with, but something that you can utilize to always have certain guarantuees about what is going on, which can make it extremely easy to reason about code especially after your program reaches a certain size.

But to profit from this you have to reach a level where you are beyond fighting with the ownership model and with data types.

For small/quick scripting stuff I still go to python, because some problems can just be solved faster with it. I don't think there is anything wrong with that. It really depends on the scale of programs and the needed reliability IMO.


Thanks, I don't know that I am fighting it, its more that I just can't figure it out. Maybe I need to try again. I read the rust book last time, but large parts of it went over my head.

For a post that started with "Rant Disclaimer", this has generated kilobytes of useful, insightful responses, and I'm going to be looking for spare time everywhere I can find it to follow up on all the great resources. I do intend to reply to everyone's comments, but the 20 of you are outrunning the 1 of me on that. If it takes me a few hours to reply thoughtfully to your thoughtful replies please bear with me.

While some people complain about the Rust community being evangelical, you'll find this is common. We've built a culture of helping answer questions, sometimes to the point of being too enthusiastic about it. If you ever need help, jumping into the forums or various chat platforms will usually get you a courteous answer in minutes.

We aren't perfect, but this is one of the aspects of the Rust community I'm most proud of.


To be completely honest, I think you’ve had a bigger impact than most in making the community that way.

You were a huge boon to the Ruby/Rails community, and you’ve taken that experience and been instrumental in building a helpful, enthusiastic, and friendly community around Rust. One of these days I hope our paths cross so I can buy you a drink.


Aw shucks, thank you! That’d be great. Someday!

Maybe Rust is not for your project, it is not a universal language, far from it. It is a language designed to make a web browser, and some projects have similar requirements, which makes Rust good for these projects too, but for some others, well maybe the much hated C++ is just better.

There is a good reason why several programming languages exist.


Rust gives you c++ runtime performance with compile time memory and data race safety. that is about it. less warts syntactically just by being newer.

Personally, I became interested in Rust through early writing about Servo, a project driven by people who were frustrated by writing a lot of C++ out of necessity (in the Firefox codebase), which reminds me of what you're looking for here. Maybe you can dig up some of that writing, though it may be pretty out of date now.

You basically describe me. I've been doing C/C++ for like 20 years and usually for the right reasons (game dev, voip, embedded, desktop apps, proxy servers). When C++ is not needed, I choose Python.

I am super impressed by Rust. As of about a year ago I try to reach for it in any situation I would have otherwise reached for C++.

What stands out to me more than any particular feature is how competent the Rust creators are in designing for performance. The performance of C++ almost feels like an accident, because it was designed when computers were slower, and because for awhile it was the only game in town (even sloppy C++ often performs better than high-level languages). With Rust, performance is an ethos. It is very apparent from the language design and standard library design, that these people know what they are doing.


Here's one of my go-to examples. This is spawning 10 threads, each of which appends some characters to a shared string. First the C++ version:

    shared_ptr<pair<mutex, string>> my_pair =
        make_shared<pair<mutex, string>>();
    vector<thread> thread_handles;
    for (int i = 0; i < 10; i++) {
      thread thread_handle([=] {
        lock_guard<mutex> guard(my_pair->first);
        my_pair->second += "some characters";
      });
      thread_handles.push_back(std::move(thread_handle));
    }
    for (auto &thread_handle : thread_handles) {
      thread_handle.join();
    }

And now the exact same code in Rust:

    let my_string: Arc<Mutex<String>> =
        Arc::new(Mutex::new(String::new()));
    let mut thread_handles = Vec::new();
    for _ in 0..10 {
        let arc_clone = my_string.clone();
        let thread_handle = thread::spawn(move || {
            let mut guard: MutexGuard<String> =
                arc_clone.lock().unwrap();
            guard.push_str("some characters");
        });
        thread_handles.push(thread_handle);
    }
    for thread_handle in thread_handles {
        thread_handle.join().unwrap();
    }
===== Highlighting some visible differences between these two examples =====

- C++ doesn't actually need shared_ptr here. It would be happy to access a string and mutex on the caller's stack, and the only reason I didn't do it that way here was to keep the behavior as close as possible to the Rust code. However, Rust requires the reference-counted smart pointer (Arc = "atomic reference counted"), to avoid holding direct references to objects on the callers stack that might hypothetically not live long enough. Basically, Rust doesn't understand that the join loop makes it safe.* If you want to access caller stack variables from threads in Rust, there are ways to do it (see Rayon or Crossbeam), but the basic std::thread::spawn won't let you.

- Mutex in Rust is a container of one element, kind of like shared_ptr and Arc are. The MutexGuard that you get from it is also a smart pointer to the contained element. You dereference it to access the String on the inside, here implicitly with the `.` operator. This is a big part of Rust's safety story: it's syntactically impossible to access the String without locking.

- Because C++ copy constructors are implicit, moving `my_pair` into the closure invokes its copy constructor and bumps its reference count. But the equivalent syntax in Rust does a bitwise move, which doesn't run any type-specific code. So to get the effect of a type-specific copy constructor in Rust we use the explicit `.clone()` method, which bumps the reference count of our Arc. (If we had forgotten this and tried to move the original, it would be a compiler error, because Rust moves are destructive, and the compiler knows the for-loop will move more than once.)

===== My thoughts about this =====

- This Rust code is genuinely difficult for beginners to write, no doubt about it. You really have to familiarize yourself with all the relevant library types before you can get threading code to compile.

- That said, there are so many mistakes we could make in either case, and all of them are compiler errors in Rust. For example we could forget to use a lock entirely, initialize our lock guard inappropriatey, or accidentally take a shared/read lock when we meant to take a unique/write lock. Or perhaps(*), we might throw/panic before completing all the joins. Those mistakes will compile in C++ and fail TSan (or abort) at runtime, but they won't compile in Rust.

- I think the scariest mistake here might be accidentally keeping a pointer to the shared string past the point where you unlock. For example, maybe we pass the string to some method that implicitly takes a string_view of it, and stashes that view for later. Rust catches this too! It knows that references to the string borrow the MutexGuard, and it will not let them live past the point where the MutexGuard is destructed.

- I think it's interesting to ask why Mutex isn't a container in other languages. (Or at least, in other languages that support generic containers.) I think the previous bullet is the answer. It's nice to get a syntactic guarantee that you've locked the Mutex before you touch its contents, but when there's nothing stopping you from keeping a reference too long, you can't actually make the strong guarantees that you want to at compile time.


Here's a runnable version of the Rust code, with some type annotations removed: https://play.rust-lang.org/?version=stable&mode=debug&editio...

And here are a few broken versions to show the compiler errors you see when you make mistakes:

- Not using Mutex at all: https://play.rust-lang.org/?version=stable&mode=debug&editio...

- Replacing Mutex with RwLock, and then taking a read lock instead of a write lock: https://play.rust-lang.org/?version=stable&mode=debug&editio...

- My favorite, trying to retain a reference past unlocking: https://play.rust-lang.org/?version=stable&mode=debug&editio...

That last example is the most condensed picture of "what Rust is all about" that I know of. Rust itself has no idea what a Mutex (or a thread) is. Mutex is just some library type. But it's able to use the borrow checker to enforce a high-level guarantee, that the contained object is never accessed without holding the lock. This guarantee is checked at compile time, with no runtime overhead. Ownership, borrowing, and lifetimes are about more than just memory safety: library authors can use them to express invariants.


You mention this, but I want to clarify that you only need reference counting due to using only std, not due to limitations in the language.

Scoped threads are available in crossbeam, and they're pretty great, as sometimes it's convenient to borrow something from the stack.

Here's how I'd write that with scoped threads:

    let my_string = Mutex::new(String::new());
    crossbeam::scope(|s| {
        for _ in 0..10 {
            s.spawn(|_| {
                let mut guard = my_string.lock().unwrap();
                guard.push_str("some characters");
            });
        }
    }).unwrap();

Yeah Crossbeam is awesome. And the pre-1.0 story about a similar interface that existed in std, until it was found to be unsound, is totally fascinating: https://github.com/rust-lang/rust/issues/24292. (Prior to that, std::mem::forget was marked unsafe!)

Another detail that's worth clarifying for folks reading along: The API in this example will automatically join all threads at the end of the crossbeam::scope. (That is indeed what makes it safe.) So if your goal is to fire off a background thread that will outlive the current function, you usually need to go back to std::thread::spawn and live with its 'static requirement.


I used to write a lot of C and C++ professionally, and have long thought that C is 100 times harder than is suggested by the apparent care taken writing much of the C code we've seen.

I've also done a lot of professional Lisp, since you asked. :)

I've only done a tiny bit of Rust thus far, and am not yet ready to say it's the holy grail of systems-y software development. But Rust has enough good and interesting ideas, and what now looks like a good community process for its evolution, and I've seen enough examples of people doing good work using Rust-- that I can imagine Rust turning out very well, and I'm willing to invest, and find out. (Ideally, at some point getting paid to do so, rather than with a slow trickle of weekend projects.)

I'll acknowledge a potential conflict of interest here. Compared to C, the difficulty of Rust seems much more immediate, and harder to fake, or to naively blow past. When you have to think and learn a lot just to get nontrivial code to compile, that seems like a higher barrier to entry for new practitioners. Which means, besides the usual tech industry dynamic of always getting new keywords on one's resume, and boosting the keywords that differentiate oneself... there's the aspect of Rust simply having a better competitive moat than, say, is had by yet another slightly different Web framework. That's not an attraction for me, I don't think, but it's difficult not to realize, when we're surrounded by that dynamic. I think there are better reasons to be interested in Rust, but I'll try to look out for, and be honest with myself about, potential conflicts of interest along the way.


I still couldn't get or understand an answer to this question: https://stackoverflow.com/questions/64705654/why-i-get-tempo...

Why I can put a temporary inside a single expression, but if I bind it first just to move it inside I'm not allowed?

This works:

    use lol_html::{element, HtmlRewriter, Settings};
    
    let mut output = vec![];
    
    {
        let mut rewriter = HtmlRewriter::try_new(
            Settings {
                element_content_handlers: vec![
                    // Rewrite insecure hyperlinks
                    element!("a[href]", |el| {
                        let href = el
                            .get_attribute("href")
                            .unwrap()
                            .replace("http:", "https:");
    
                        el.set_attribute("href", &href).unwrap();
    
                        Ok(())
                    })
                ],
                ..Settings::default()
            },
            |c: &[u8]| output.extend_from_slice(c)
        ).unwrap();
    
        rewriter.write(b"<div><a href=").unwrap();
        rewriter.write(b"http://example.com>").unwrap();
        rewriter.write(b"</a></div>").unwrap();
        rewriter.end().unwrap();
    }
    
    assert_eq!(
        String::from_utf8(output).unwrap(),
        r#"<div><a href="https://example.com"></a></div>"#
    );
With the temporary vec moved outside it errors with "temporary value dropped while borrowed" on the "let handlers" line.

    use lol_html::{element, HtmlRewriter, Settings};
    
    let mut output = vec![];
    
    {
        let handlers = vec![
                    // Rewrite insecure hyperlinks
                    element!("a[href]", |el| {
                        let href = el
                            .get_attribute("href")
                            .unwrap()
                            .replace("http:", "https:");
    
                        el.set_attribute("href", &href).unwrap();
    
                        Ok(())
                    }) // this element is deemed temporary
                ];

        let mut rewriter = HtmlRewriter::try_new(
            Settings {
                element_content_handlers: handlers,
                ..Settings::default()
            },
            |c: &[u8]| output.extend_from_slice(c)
        ).unwrap();
    
        rewriter.write(b"<div><a href=").unwrap();
        rewriter.write(b"http://example.com>").unwrap();
        rewriter.write(b"</a></div>").unwrap();
        rewriter.end().unwrap();
    }
    
    assert_eq!(
        String::from_utf8(output).unwrap(),
        r#"<div><a href="https://example.com"></a></div>"#
    );

You got a pretty good answer on stackoverflow, despite a not very good question. You could try a follow up question on the rust discord, but I'd suggest massively decreasing the size of the code dump, and talking about why the proposed solution doesn't work.

I agree that the question is not very good. It is hard to make a smaller code dump, when I'm at this stage of learning the language, because I don't know what matters.

Answers do not help me to understand the difference and do not clear the confusion. What they suggested was suggested by the compiler.

It does work, when I have a call like that:

  func(vec![element!(...)]);
But if I put the vec outside it will complain about temporary element!(...):

  let v = vec![element!(...)];
  func(v);
Answers and the compiler suggest the following:

  let e = element!(...);
  let v = vec![e];
  func(v);
This however is problematic if you have many elements in the vec. It also does not answer to me how is that significant. For me if I can pass a temporary to a function, why can't I bind the temporary to the vec? The why is important to me. I would like to make second form work.

I don't deride people that answered me, I asked a few clarification comments, but I've got no follow-up or a follow-up that still doesn't answer the question. It is a good solution, but I don't understand why I have to go this way and the obvious one for me doesn't work.


I apologize. I skimmed your post and the response and fit it into some stereotypes. Your question is actually a lot more interesting than I thought. I couldn't figure it out reading the docs on my phone.

When I first tried to reproduce your issue, I got that `try_new` didn't exist. It's been removed in the latest version of lol_html. Replacing it with `new`, your issue didn't reproduce. I was able to reproduce with v0.2.0, though. Since the issue had to do with code generated by macros, I tried `cargo expand` (something you need to install, see [1]).

Here's what `let handlers = ...` expanded to in v0.2.0:

    let handlers = <[_]>::into_vec(box [(
        &"a[href]".parse::<::lol_html::Selector>().unwrap(),
        ::lol_html::ElementContentHandlers::default().element(|el| {
            let href = el.get_attribute("href").unwrap().replace("http:", "https:");
            el.set_attribute("href", &href).unwrap();
            Ok(())
        }),
    )]);
and here's what it expands to in v0.3.0

    let handlers = <[_]>::into_vec(box [(
        ::std::borrow::Cow::Owned("a[href]".parse::<::lol_html::Selector>().unwrap()),
        ::lol_html::ElementContentHandlers::default().element(|el| {
            let href = el.get_attribute("href").unwrap().replace("http:", "https:");
            el.set_attribute("href", &href).unwrap();
            Ok(())
        }),
    )]);
Ignore the first line, it's how the macro vec! expands. The second line shows the difference in what the versions generate. The first takes a borrow of the result of parse, the second takes a Cow::Owned of it. (Cow stands for copy on write, but it's more generally useful for anything where you want to be generic over either the borrowed or owned version of something.).

So the short answer is the macro used to expand to something that wasn't owned, and now it does. As for why it worked without a separate assignment, that's because Rust automatically created a temporary variable for you.

> When using a value expression in most place expression contexts, a temporary unnamed memory location is created initialized to that value and the expression evaluates to that location instead, except if promoted to a static

https://doc.rust-lang.org/reference/expressions.html#tempora...

Initially rust created multiple temporaries for you, all valid for the same-ish scope, the scope of the call to try_new. When you break out the vector to its own assignment the temporary created for element! is only valid for the scope of the vector assignment.

I took a look at the git blame[2] for the element! macro in lol_html, and they made the change because someone opened an issue with essentially your problem. So I'd say this is a bug in a leaky abstraction, not an issue with your understanding of rust.

[1]: https://github.com/dtolnay/cargo-expand [2]: https://github.com/cloudflare/lol-html/commit/e0eaf6c4234af8...


No offense taken. The code dump is pretty unfriendly.

I dabbled with it more around version lol_html 0.2 and afterwards not that much. Partially, because I felt dumb. Still I feel like I need to look into documentation almost for every dot in the code.

Thank you, especially for your thorough investigation! I thought that the macro could do something funky, but didn't know about `cargo expand`. That is a very useful thing to know. Is there a good write up about such intricacies like Cow::Owned? I would think that the problem is probably something that people may have from time to time. Was there a way to wrap every element in something to make it work with the old version?


To be honest lol_html looks like either someone new to Rust who got excited by the opportunity to do a lot of premature optimization or someone with very very intense perf needs. I strongly recommend to work with owned data whenever possible and clone freely.

It's also macro heavy for something that doesn't seem like it should need to be. It's normal to need to constantly check the docs with macros, since they're their own language.

EDIT: I realized it's by cloudflare. The architecture makes sense in terms of their needs.


> You could surely argue that even in the OO world, inheritance has a bad reputation and practitioners usually favor composition if they can.

You could indeed argue that, if you’d never been part of a community writing code in an object oriented language and your opinions about actual OOP practice were formed by reading HN.


This language just does not look like much fun at all.

Looks can be deceiving. You'll have to try it to see if you like it.

And if not, that's totally OK.


Shouldn't the java in the first example explicitly state it implements the interface?

Oh, right! Thank you for bringing it up, I'll fix this.

There are some examples which (IMHO) are not the best.

First, global variables changed silently is a tempting yet very bad idea for software engineering. Yes, we are used to them, but there are unsafe. You can easily create a mutable variable in the main, then explicitly change it with a function. So, no features are lost, but it enforces clarity and explicity.

Second, I don't find initialization with zeros to be a huge problem. I got surprised it got mentioned at all.


While I was interested in the article, the mistakes in the code examples are so frequent that, by the end, I was wondering if the array example had an error. I couldn't really make sense of the suggestion there.

These issues would easily be captured in a first draft or with a simple review of the examples.


I wish the author had gone a bit more in depth on the doubly linked lists and recursive data structures. Those are enormous pain points when trying to do conventional low level programming in Rust, and doing it right the first time might help avoid the pitfall of 'build 90% of what you want, realize you need to redo everything because there's an ownership issue you need to hack around, give up on Rust.'

`Learn Rust With Entirely Too Many Linked Lists`[0] is a great tutorial on the subject.

[0]: https://rust-unofficial.github.io/too-many-lists/


I'm somewhat skeptical of the article - I think it's not entirely accurate.

This is not a good example for a global counter:

    static mut DATA_RACE_COUNTER: u32 = 1;

    fn main() {
        print!("{}", DATA_RACE_COUNTER);
        // I solemny swear that I'm up to no good, and also single threaded.
        unsafe {
            DATA_RACE_COUNTER = 2;
        }
        print!("{}", DATA_RACE_COUNTER);
    }
because there are safe APIs for this:

    static DATA_RACE_COUNTER: AtomicU32 = AtomicU32::new(1);

    fn main() {
        print!("{:?}", DATA_RACE_COUNTER);
        DATA_RACE_COUNTER.store(2, Ordering::Relaxed);
        print!("{:?}", DATA_RACE_COUNTER);
    }
Two examples are actually presented about global mutable state, and they're both unfit.

Also this:

    [the linked list] will at least require a Rc or Arc to work.
    But even this becomes cumbersome quickly, not to mention the overhead from reference counts.
AFAIK accessing an object via RC has no overhead per se (RCs introduce overhead when they're manipulated). Even if it had, it should be verified that the overhead has a measurable performance impact (which is something that can't be generalized).

The point about array initialization is fair - custom initialization requires unsafe code (or a crate) - but it must be considered that it's a performance optimization.


Not all platforms have atomics; you'll mostly encounter this on embedded systems. That said, use them if you can!

Regarding Rc, the overhead is more in memory than time, which usually has a non-linear relation to performance, so any overhead is really hard to guess or estimate.


> Arguably the most-asked-about missing feature coming from object-oriented languages is inheritance. Why wouldn’t Rust let a struct inherit from another?

This concern faded away as I used Rust a lot. It turns out that inheritance of implementation isn't nearly as useful as it's cracked up to be. And it carries a lot of baggage.

The example hints at this, but unfortunately without driving the point home.

Rust's enums (used to illustrate static dispatch) are amazing, and one of the best (of many) language features. Enums can contain data of their own (as enum structs or enum tuples), opening up many more possibilities than the example might imply.

On two large projects, I've had no need for inherit-the-implementation style OO. And it's just fine.


> Rust's enums (used to illustrate static dispatch) are amazing, and one of the best (of many) language features. Enums can contain data of their own (as enum structs or enum tuples), opening up many more possibilities than the example might imply.

Algebraic data types are table stakes for any programming language in my book. I don't know why it's taken OOP programmers so long to come around to them (and most still haven't). It bothers me that Rust calls them "enums", even though "enum" already means something different to most programmers and we already have several terms that describe the feature that Rust has (coproducts, variants, sum types, discriminated unions, tagged unions, and, more generally, algebraic data types). This makes talking about programming languages more difficult than it needs to be, since when someone says "enums" I now have to ask what programming language they are referring to, since "enum" now means different things in different languages.


I think because if you're building with a language that's extremely interface oriented you probably don't want to think "if it's this type, do x, if it's that type do y" manually. Instead that would be another job for another interface. But I think that's more a symptom of having a powerful hammer than actually designing well, and sum types solve a different problem than interfaces.

I always thought of Rust's enums as simply a generalization of other languages' enum concepts, not a completely different thing.

Yes, that is what they are, but a generalization of a thing is not the same as the specific thing, and in this case the generalization already had other names (since it has been around for many decades in many other languages). It would have been preferable to use one of the existing names instead of adding a new one and thereby complicating the story.

Swift also has sum types called "enums". I thought it was overly cute, the same way it has errors (not exceptions) but uses the C++ exception syntax for them. Seems to work though.

The naming of enums was a bit of a stumbling block for me in my learning Rust journey, but once I got past that, I'm doing fine. I'm finding the language wonderfully expressive and while I still find myself changing fn foo(bar: Baz) to fn foo(&bar: Baz) a lot (and related issues) as I write, the compiler does a good job of explaining what I'm doing wrong. I have never encountered any compiler (or other application) with as good of error messages as Rust provides.

Agreed, I think calling them enums is an example of trying a little too hard to make the language seem comforting and familiar to c/c++ programmers at the expense of clarity.

> Finally, it’s possible to implement a trait for all classes that implement one of a number of other traits, but it requires specialization, which is a nightly feature for now (though there is a workaround available, even packed in a macro crate if you don’t want to write out all the boilerplate required).

This is false. You can easily write a blanket impl in stable Rust:

    impl<T> Foo for T where T: Bar + Baz {
        fn foo(&self) {
            self.bar().baz()
        }
    }
Specialization allows you to create multiple impls, depending on what traits a generic type implements. This is a very powerful feature that is available on nightly, but it is also easy to misuse and create unwanted implicit behavior.

    impl<T> Foo for T {
        default fn foo(&self) {
            // do nothing by default
        }
    }

    impl<T> Foo for T where T: Bar + Baz {
        fn foo(&self) {
            self.bar().baz()
        }
    }

The author is referring to the case where you have multiple traits, and want to provide implentations like:

    impl<T: Cat> Animal for T { ... }
    impl<T: Dog> Animal for T { ... }
This is impossible without specialization, since there's no way to express the constraint that no type implements both `Dog` and `Cat`.

Having multiple blanket trait implementations like this can be used to emulated inheritance, whereas a single blanket implementation cannot.


Legal | privacy