Hacker Read

awestroke | karma 2403 | avg karma 3.32 · 2020-08-22 12:55:39+00:00

The bindings are safe, not the C++ they bind to

fluffything | karma 2917 | avg karma 3.12 · 2020-08-22 13:32:17+00:00

If the C++ they bind to is not safe, then allowing these to be called from safe Rust is unsound.

alvarelle | karma 50 | avg karma 1.85 · 2020-08-22 16:04:04+00:00

The point is that the C++ code should be safe because the C++ programmer should not introduce UB on its C++ code. If the C++ code invoke UB, that is a bug in the C++ code which should be found by reviewing the C++ code alone.

No need to write 'unsafe' because .cpp files are already known to need carefull review.

reply

masklinn | karma 65147 | avg karma 3.36 · 2020-08-22 15:13:18

> The point is that the C++ code should be safe because the C++ programmer should not introduce UB on its C++ code.

That's a misunderstanding of safety, and ub, and `unsafe`.

The C++ code could be unsafe when called with certain values which it is not normally called with. This is common. This is also not allowed in Rust, it'd be unsound.

Furthermore C++ has different notions of safety than Rust. C++ allows dangling and null pointers (whether raw or smart), it doesn't allow calling them. Rust does not allow dangling or null pointers unless they're raw. You can have a null unique_ptr, you can not have an empty Box.

reply

alvarelle | karma 50 | avg karma 1.85 · 2020-08-22 17:24:58

I believe I understand correctly UB and unsafe.

The cxx crate and the autocxx tool should make sure that the exposed C++ functions only take arguments types which have well defined semantics.

In your example, a rust Box<T> maps to a rust::Box<T> in C++, which cannot be null. And a unique_ptr from C++ maps to a cxx::UniquePtr in rust which can be empty.

If somehow the C++ code puts a dangling or null pointer into a rust::Box, that is clearly a bug in the C++ code.

reply

fluffything | karma 2917 | avg karma 3.12 · 2020-08-24 19:01:26+00:00

I agree with you that by controlling both sides of the FFI (the Rust and the C++ code) one can make sure that the types work.

The real problem is, however, that C++ lacks an "unsafe" keyword, so functions like:

    /// # Unsafe
    ///
    /// Must call `bar` after a sequence of calls to `foo`
    unsafe fn foo();
    fn bar();

just look like

    /// note: must call bar after a sequence of calls to foo
    void foo();
    void bar();

You can autogenerate "correct" C++ code from that Rust code (just loose the "unsafe"), but you cannot autogenerate safe Rust code from that C++ code unless you start parsing and understanding documentation comments (which could be possible, e.g., chromium could annotate C++ APIs that should be unsafe in Rust).

To generate Rust from C++, it does not suffice to just "look at the types" like cxx and autocxx do. One also _at least_ need to read all the API documentation comments, check if there are any invariants that must be preserved, and act accordingly.

If the APIs are ok and can be wrapped mechanically, the actual wrapping can be made trivial with tools, but there is no tool today that will tell you whether this is the case.

That is, at the end of the day, if you need to expose 10k C++ APIs from Rust, you will still need to manually inspect those 10k C++ APIs, and _think_ about whether they are safe or not.

That's the time consuming part, and you actually want to only do this once, and write down why an API is safe or not, so that other programmers don't have to repeat this work every time you hit an FFI issue.

So IMO while cxx and autocxx are "ergonomic", they spare you only the easy lest time consuming portion of the work. autocxx also makes it easy for you to either not check, or not write down the result of the check, and this could end up creating a lot more work down the road.

---

Note that this is something one wants to do even when one trusts that the C++ code is correct. In the example above, the C++ APIs can be correct, but one can still UB by using them incorrectly.

reply

fluffything | karma 2917 | avg karma 3.12 · 2020-08-24 09:47:02

C++ code only needs to be safe according to C++ rules (not Rust rules). So it is possible for the C++ to be safe, and the corresponding Rust code to be unsafe, e.g.,

* int foo(); which returns an uninitialized int is OK according to C++ rules, but would need a MaybeUninit<c_int> according to Rust rules.

* int foo(); could throw an exception, causing UB in Rust, since Rust assumes FFI declarations not to throw according to the spec. Rust can only export `noexcept(true)` C++ FFI declarations, or C functions (since C cannot throw). Apparently, autocxx and the cxx crate ignore this and treat all C++ functions as if they never throw, giving them a safe API. That's unsound. (One can fix that on nightly Rust though).

Unsafety can also be introduced through ABI incompatibilities, but IIUC autocxx usage of rust-bindgen deals with that.

reply

simias | karma 28508 | avg karma 6.15 · 2020-08-22 09:23:13

Unsafety is contagious, the whole point of "unsafe {}" is to create a well defined interface between code that can rely on safety guarantees enforced by the compiler and code that needs to be manually checked by the developer.

Safe bindings to unsafe code need to enforce the invariants to make the calls safe, otherwise they are not safe.

Consider this code:

    fn int_to_string(e: &mut u32) -> &mut String {
        unsafe {
            &mut*(e as *mut u32 as *mut String)
        }
    }

    fn main() {
        let mut i = 42u32;
        
        let s = int_to_string(&mut i);
        
        s.push_str("Ayyy");
        
        println!("{}", s);
    }

This int_to_string function is marked safe and can be called from main without any unsafe block, yet if you run this code it will probably segfault. Or maybe it'll format your hard drive, who knows. Because int_to_string is clearly unsound and is broken.

If you just start tagging random, potentially unsound interfaces as safe, what's even the point?

And if you agree that the code in this example is bad and "int_to_string" should definitely not be considered safe, why would that change if I rewrote it to make "int_to_string" a C++ function called through FFI instead?

reply