As the post says, their non-SGX method requires you to trust the server: “This has meant that if you trust the Signal service to be running the published server source code, then the Signal service has no durable knowledge of a user’s social graph if it is hacked or subpoenaed.”
To eliminate that requirement, they developed an SGX-based method: “Since the enclave attests to the software that’s running remotely, and since the remote server and OS have no visibility into the enclave, the service learns nothing about the contents of the client request. It’s almost as if the client is executing the query locally on the client device.”
Of course, there are plenty of attacks on SGX (I’m not enough of a cryptographer to know how practical they are to apply to Signal’s methods or not); but at some level you are going to have to trust servers you don’t control, whether your system is federated or centralized. I’m mostly willing to give Moxie the benefit of the doubt here.
Well your signal client performs SGX remote attestation before sending any contact data to ensure that the server codebase matches a valid release. So if they're not running the published source, your client will refuse to share your contact information and social graph. Note that messages are e2e encrypted on the client side, so they don't enter into it.
That was the original argument. But Signal has now rolled out functionality (sharing of contact lists and other details with the server through Intel SGX) that does force one to trust the server.
> 3. Signal forces all TCP/IP metadata to one stack, which if combined with heuristical analysis, I strongly suspect it would be possible to work out which IPs communicate with which other IPs even without aid of SGX contained metadata.
I don't think you've thought about this properly. The fact you've mistaken this (the Don't Stand Out principle) for a flaw in Signal - rather than a clear strength compared to federated systems - is a bad sign.
> 5. Signal has built their entire social graph on phone numbers
The Signal system doesn't maintain a social graph. That's another huge flaw of many alternatives, since of course an attacker would (and this has already happened) harvest the social graph from the system.
Suppose you turn up with a Federal Judge tomorrow at Signal's offices, demanding a list of the groups I'm in, friends I know, people I've communicated with. Moxie can't help you. You supply my telephone number. Moxie still can't help you. OK, you demand a list of all Signal groups. Well here's all the new-style Signal groups. They have opaque IDs, their names, membership lists and all other metadata are encrypted, with keys Signal does not have.
Now, if you do the same for Riot.im you get truckloads of interesting information about whichever user or users you were interested in, including where to look next for more information about other users they know or communicate with. Even better, the "privacy conscious" users will often, as you've recommended, not be on Riot but instead on a low population Matrix server they control, neatly isolating them so you needn't even bother gathering a "graph" at all.
> If x has me on the list and I have x on the list, how can our two Signal apps discover that? Without telling the Signal server, and without telling random third parties?
I see. I will include that disclaimer in all threads I comment on about Signal in the future. Thanks for explaining this to me.
I've been a fan of Signal since it was RedPhone and TextSecure and my professional relationship with Moxie is quite recent in the scheme of watching the rise of his projects. I apologize if my lack of awareness was offensive, it was unintended.
Edit: Just to be clear, I don't think you need to have the server open-sourced to trust the end to end encryption of the messages, but that's just one part of the overall trust model.
Why would I not be able to make inferences about the software the servers are running if the chances on lying on the letter is low? I haven't read Signal's source code, and yet I believe with just as much confidence that they aren't logging extra information as if keybase had sent the same NSA letter. To me, Signal's source code is effectively closed, and reading it wouldn't increase my belief. (Have you read all of their server's source code? If not, why do you justify your belief?)
The article on attested contact discovery states "Of course, what if that’s not the source code that’s actually running? After all, we could surreptitiously modify the service to log users’ contact discovery requests. Even if we have no motive to do that, someone who hacks the Signal service could potentially modify the code so that it logs user contact discovery requests, or (although unlikely given present law) some government agency could show up and require us to change the service so that it logs contact discovery requests.", which is exactly the point I'm making. They choose to solve it by signing code and ensuring that exactly that code is running (seems like they just move the trust to Intel. Hopefully SGX never has any bugs like https://github.com/lsds/spectre-attack-sgx or issues with the firmware, as noted by the Intel SGX security model document), which is fine, but an equally valid way to do this is to make it so that the secure operation of the system does not depend on what code the server is running.
Doing that has some tradeoffs: there's usually overhead with cryptography, or an algorithm you need may not even be possible (Signal disliked those tradeoffs for this specific algorithm), but for some algorithms, it's entirely possible to do. For example, one can audit OpenSSL's code base, and determine, regardless of what the middle boxes or routers do, that the entire system is secure. Just replace OpenSSL with keybase's client, and middle boxes with keybase's servers, and do the auditing. Hence, open sourcing the server is not necessary for security. Would it be great if more systems could be audited? Absolutely. Is it always necessary for security? Absolutely not.
edit: Another quote from the article: "Since the enclave attests to the software that’s running remotely, and since the remote server and OS have no visibility into the enclave, the service learns nothing about the contents of the client request. It’s almost as if the client is executing the query locally on the client device." Indeed, open sourcing the code running in the secure enclave is effectively open sourcing more code in the client.
> Say what you want about Secure Enclaves, we know of no better way to conceal social graphs.
I'm not following. Secure Enclaves have nothing to do with protecting the social graph of Signal users. They're used to store the contact list (and other things) in the "cloud" in a safe way – things that weren't even shared / stored anywhere by Signal before Secure Value Recovery was introduced.
We should make a distinction between the server tampering with message content and message metadata. Message content is protected by well-scrutinized and auditable client code. However, there's nothing stopping a malicious server from logging a bunch of extra metadata on top of what they claim to log, which would be very interesting for nation states. And the extra-metadata scenario is the one being criticized, I think.
If you trust Intel SGX (or other secure enclaves) it is theoretically possible for the server to attest to the client that a particular hash of code is running. (Typically the reverse process is used, to attest to a server that a client is running whatever DRM code the company wants.)
Signal already uses SGX to implement contact search [1]. The actual algorithm is performed in plaintext in the enclave.
Now, you might counter that SGX is full of holes, and I would agree with you.
Signal actually just wrote up a very detailed post about how they're using SGX to provide verification that the computer you're sending your data to is running a specific algorithm: https://signal.org/blog/private-contact-discovery/
> Despite this excellent explanation by wtbob, despite Moxie's blog posting, despite EFF, Snowden, Poitras, Schneider supporting it, I distrust Whispersystems for persisting Signal must have access to the addressbook by refusing to work without.
The mechanism I explained would permit Signal to work without uploading one's address book to the Signal servers. It would, of course, still require access to the address book itself in order to perform contact discovery. I personally don't mind allowing access to the address book itself; what I mind is uploading its contents.
>but not the contacts or social graph, neither many other relevant metadata [2].
Assuming you trust them (notice all your links point to signal.org own publications). Most of the privacy people are cautious/paranoid and assume that everything that can be collected is collected. Even assuming a lack of malicious intent, what's stopping NSA from hacking into Signal's infrastructure and logging who's talking to who along with timestamps? That's not to say I don't trust signal (it's the best mainstream solution right now), but it could do better to hide metadata from the protocol.
> The Signal project doesn't want non-official clients to connect to their network.
They don't encourage it but they don't ban it either. Non-official clients absolutely exist for the network, some of which use signald, a backend service and abstraction layer for the signal protocol which is neither unstable nor a hack:
I actually trust it more to not have backdoors. Signal is written by the same guys who sold to Facebook, and although both Moxie and Pavel are anarchists, I trust the guys who were actually ousted from their own country and yes, rolled their own encryption (with bounties to break it) rather than use an officially approved one
Besides - Moxie is kind of against decentralization, he thinks that by centralizing trust in a certain entity, things can be better. I am not convinced that such an approach leads to a better model for me to have encrypted communications:
I like that Signal started using SGX though. It’s not ideal (now I have to trust Intel instead) but at least if you’re going to be running some code on a centralized service instead of byzantine consensus, let me know you aren’t backdooring it:
Quite a few reasons to be concerned. Moxie left Signal and at least one of the new board members has a US government background. The update methods allow issuing a compromised update to targeted users. And while Signal still claims that “in our model, the server knows nothing about users”, for a few years now users’ contacts lists (sweet, sweet metadata nearly as valuable as the messages) have been uploaded to servers, protected only by the notoriously vulnerable Intel SGX technology.
> Not sure if I'm 100% right here, but knowing all my contacts and when I communicate with whom is an awful much.
Signal actually doesn't know all your contacts - you can check the source code to confirm that it doesn't know about any contacts that you don't message using Signal, for example.
Signal also doesn't store most of the metadata that it could, so it really knows incredibly little about its users. It knows (for example) the last date that it was able to talk to a particular device, but they don't store historical data for that, so if you received a message on Signal today, they don't (anymore) know that they sent you a message yesterday, or last month.
Of course, that second part all runs server-side, so you do have to trust Signal when they describe their internal architecture. But to be frank, who do you trust more with that metadata: Moxie Marlinspike, or the government that is essentially the "sixth eye" in the Five Eyes alliance[0]?
The very first line of their privacy policy reads:
"Signal is designed to never collect or store any sensitive information" which is a total lie. For someone like a human rights activist or a whistleblower a list of all their Signal contacts is absolutely "sensitive information". It really used to be true that they didn't collect and store anything, but it hasn't been the case now for years!
If this is the first time you're hearing about the data Signal is collecting and storing in the cloud that should tell you all you need to know about how much they can be trusted.
Yes, it’s documented: https://signal.org/blog/private-contact-discovery/
>(and provable)
As the post says, their non-SGX method requires you to trust the server: “This has meant that if you trust the Signal service to be running the published server source code, then the Signal service has no durable knowledge of a user’s social graph if it is hacked or subpoenaed.”
To eliminate that requirement, they developed an SGX-based method: “Since the enclave attests to the software that’s running remotely, and since the remote server and OS have no visibility into the enclave, the service learns nothing about the contents of the client request. It’s almost as if the client is executing the query locally on the client device.”
Of course, there are plenty of attacks on SGX (I’m not enough of a cryptographer to know how practical they are to apply to Signal’s methods or not); but at some level you are going to have to trust servers you don’t control, whether your system is federated or centralized. I’m mostly willing to give Moxie the benefit of the doubt here.
reply