Hacker Read top | best | new | newcomments | leaders | about | bookmarklet login
Beam: Distributed knowledge graph store written in Go by eBay (github.com) similar stories update story
106 points by ngaut | karma 5837 | avg karma 7.3 2019-05-04 19:14:34 | hide | past | favorite | 30 comments



view as:

I wonder if name is coincidence or it's on purpose to get some traffic from searching on Erlang BEAM.

https://en.m.wikipedia.org/wiki/BEAM_(Erlang_virtual_machine...


I was wondering the same but for Apache Beam. I don't think it's to get traffic. Just coincidence.

... or maybe they chose an English word to make it harder to find.

using two English words as the name would already be an improvement for the common web-searcher... As an example, here is the name of two apps that do similar things: "Blue iris" and "motion".... let me know which one that you are able find more quickly.

Don't like opinion or stressing too much? Relax

I was 100% serious... I don't think that the word BEAM helps... And I probably need more stress, not less.

Paranoid much?

Ah yes but not to be confused with apache beam.

And Erlang/OTP's BEAM. And Haskell's Beam. And probably a couple of other things called Beam…

And the Erlang virtual machine named beam.

This is very interesting. The triple store space is crying out for a good solution; a good _distributed_ solution - unheard of.

Is it common to combine triple store with event logging as this seems to do?

I may have misunderstood, but I think they're using the log as a write ahead log, which is a similar, but subtly different idea to event logging.

A Write Ahead Log is an implementation detail of a distributed system, whereby the system has a log of actions it applies against a state machine. Protocols, such as Raft, exist to keep the log in sync, and the state machine is kept in sync as a consequence. The log entries would be quite abstract and low-level, like "tuple inserted".

An Event Log, on the other hand, is an application level architecture technique. The log entries would be application-specific actions, like "user logged in".


OBSERVATION: 7 out of 9 comments are bikeshedding the project's name.

And if we include yours (and mine), we're at 9 out of 11!

"Beam is designed to store large graphs that cannot fit on a single server. "

What would be a good alternative for less huge datasets?


A single-node graph database? Neo4j is probably the most well-supported and standardized option: https://neo4j.com/


https://github.com/dgraph-io/dgraph is worth a look.

Although it can be distributed, setting it up is trivial enough to be appealing in single machine configurations.


I'm curious how Erlang's BEAM/OTP would handle storing large graphs across servers. (My understanding is that it can probably support that?)

Out of the box, imo it'll be faffy and probably not worth it vs just using an existing solution. :digraph is great, having a graph structure available out of the box in the standard library is super nice for certain things, but it's 3 ETS tables, and ETS tables can't be replicated across nodes. MNesia tables can (that's the point), so replicating how :digraph works using MNesia is probably fairly doable (I assume this has been tried but I haven't investigated it)

The fastest one would probably be RedisGraph. https://oss.redislabs.com/redisgraph/

https://github.com/amark/gun

Altho it can sync across machines and backup to S3, so only a subset of your data needs to be loaded on the machine at any given time.

The Internet Archive and others use it in production.



We used Jena for a project with 500M triples, the uncompressed file was about 80G. It mostly works fine.

Architectural decisions explained here: https://github.com/eBay/beam/blob/master/docs/central_log_ar...

The architecture is: materialized views from a central append log.

I don't see a reference to Apache Bookkeeper[1][2] in the above. My recommendation to anyone who wants to build their own variant of this architecture is to build it on top of Bookkeeper instead of rolling your own. It is designed for this precise workload [2] and addresses the write throughput limitations of the OP 'Beam'.

[edit: pull quote from [2]]:

"what makes BookKeeper unique is its ability to offer a short-tailed, low-latency, distributed scale-out storage solution. Although this is a CP system, its greater availability makes it almost a C(A)P system. It is an apt storage for immutable data."

[1]: https://bookkeeper.apache.org/

[2]:https://www.linux.com/news/event/apachecon/2017/4/high-perfo...


"some functionality is lacking before Beam could be used as a critical production data store, including deletion of facts"

It sounds like this needs a second central log to enable eviction of sensitive data before it can be seriously considered usable. One log for transactions and another other for the data itself. Or at least that is the conclusion we reached when designing and building our own knowledge graph store (with a similar use of Kafka): https://juxt.pro/crux/docs/index.html#_unbundled


what is the best general use case open source knowledge graph, if any? The actual graph, not the tech.

Legal | privacy