Hacker Read top | best | new | newcomments | leaders | about | bookmarklet login

Like I said, if I it was my choice, I'd use MonetDB - that screams. But it has operational deficits, like experimental replication. I'd also look further into MemSQL than is visible on their website.

The reason I don't load it into memory and operate on it directly is because it's analytics as a service; I don't particularly want to write a SQL parser and execution engine.

We have looked at things like Zeppelin for more interactive data manipulation, using Spark to keep stuff in memory. But building a UI around that is an open-ended rabbit hole.



sort by: page size:

I want to see how this compares to something like Memsql, which we've had amazing success with.

MemSQL CEO here. There are a few:

- MemSQL is transactional and writes transactions on disk

- MemSQL has an excellent implementation of SQL with mature query optimization and query execution. And it get better every release. This is from 6.5 https://www.memsql.com/blog/6.5-performance/

- MemSQL has in-memory and on-disk data storage so you can use MemSQL to store petabytes

- MemSQL has columnstores and vectorized query processing: https://news.ycombinator.com/item?id=16617098

- MemSQL supports geospatial, fulltext search, and json

- MemSQL allows you to stream data from kafka in one command: https://docs.memsql.com/sql-reference/v6.5/create-pipeline/


MemSQL could be a good fit if you need a SQL-based data warehousing tool. They have pretty good integration with Kafka and BI tools.

I see this as fairly similar to memSQL, but less mature.

That is exactly the position I'm in right now. Was going to use GAE but now I'm rethinking that decision. A micro instance of their relational database service would be perfect for my use, but I guess the ram would be too small.

I'm one of the founders at MemSQL, which does what you describe.

It's not a real database and doesn't promise ACID though. I think it's fine with the understanding that it's an incremental eventually-consistent materialized view engine that works seamlessly with Kafka, especially if you're already a heavy user.

Otherwise loading data into a real database/data warehouse and using ETL with triggers/views/aggregations is better if you need advanced querying rather than a simple stream to table transforms.


It looks like a fully managed database service alternative. MySQL, PostgreSQL, Redis, MongoDB... all the popular db engines.

Yeah, I was thinking of using mysql+nosql. RethinkDb was my main choice, but I'd rather have someone take care of the backend for me like a DAAS, mostly to alleviate the pain of doing a project solo, but compose seems too expensive to use to prototype some ideas.

Excellent! hmmm... version 1.0.0-beta. Am I the only one a bit nervous basing a start-up on something this fresh? I'd certainly like to contribute to open source, but in a post launch, here's a toolkit/library sort of way, not by debugging a low level database library weeks before launch or worse post-launch. I've used memcached (not db) and love it.

This new cousin would be ideal for me. Is anyone else considering it? I need a flexible object database. I've looked at CouchDB or something based off of qdbm (http://qdbm.sourceforge.net). Currently working on other parts of the system (and delaying the persistence part).


Another instance of the Wikipedia page for a product [1] being more useful than the main site to describe it:

* MemSQL is a distributed, in-memory, SQL database management system.

* It is a relational database management system (RDBMS).

* It compiles Structured Query Language (SQL) into machine code, via termed code generation.

* On April 23, 2013, MemSQL launched its first generally available version of the database to the public.

* MemSQL is wire-compatible with MySQL.

* MemSQL can store database tables either as rowstores or columnstores (The OLAP vs OLTP part I guess).

* A MemSQL database is a distributed database implemented with aggregators and leaf nodes.

* MemSQL durability is slightly different for its in-memory rowstore and an on-disk columnstore.

* A MemSQL cluster can be configured in "High Availability" mode.

* MemSQL gives users the ability to install Apache Spark as part of the MemSQL cluster, and use Spark as an ETL tool.

The main value proposition seems to be the distributed nature, which probably makes it easier to setup out of the box than, say, trying to setup a cluster MySQL or PostgreSQL databases which are not "natively distributed". Also, probably most useful when the data is "big enough" vs resources available on any single server or when reliability is very important.

1: https://en.wikipedia.org/wiki/MemSQL


Hey.

Thanks, but.. That's not quite what I had in mind. For one, somedb-only (sqlite or anything else) is usually not enough. I would hesitate to introduce a system that just supports mysql when everything else is using postgresql for me, for example. And on top of that, this schema is .. limited. My dspam setup learns and can do that for each and every user (though system wide training seems to be the norm, as far as I can tell). This is really just a storage engine as far as I can tell and not really comparable.

That said: I guess I would give rspam a try if I saw a lot of positive reviews/reports. It's just that it certainly doesn't do the same thing as dspam. It's quite a different animal.


I mean it has pretty full-featured sql support, so you can probably reproduce your current scenario with it?

Hi there, we are mostly targeting analytic use-cases with this. SQL already has a lot of analytics functionality built into the language, so we thought it might be interesting to enable it.

* We find that in the context of analytics, transactions aren't as important (they are still relevant, but people are OK if results are a bit off).

* For the MongoDB sharding model, CitusDB already keeps similar metadata and moves around shards accordingly. By syncing MongoDB's config server metadata to the CitusDB master, we can ensure that shards are moved around properly. We can also run queries across shards using this metadata.

And thanks a bunch for the encouragement. As you noted, we think of this as addressing some interesting use-cases, and are looking to get the community's feedback.


It looks interesting enough, and the sort-of SQL language looks OK, but I have to ask: why would anyone use a young project (with only one developer?) rather than something really well supported like MongoDB or Cassandra? That said, interesting looking work.

Sounds good, thanks!

Such an in-memory database seems quite important for productivity, so I will definitely consider this.

Read access could be implemented in a first step, while the semantics would still have to be discussed. I think write access would actually be quite important as well so that you can completely abstract away the sending, receiving and processing of events.

Latency compensation or optimistic UI would be nice to have (for making applications more snappy), but I think this is only the third most important piece.

Maybe one can use "SQLiteOpenHelper" with "name" set to "null" for in-memory storage. Or perhaps some things can be adopted from or based on Minimongo, Realm or MapDB:

http://www.quora.com/How-is-Meteors-client-side-database-imp...

https://realm.io/

https://github.com/jankotek/mapdb


The persistence layer is also a thing, and the boring choice is something like Postgres or MySQL.

With those priorities in mind I'd probably start with RocksDB and build SQL engine on top. I'm really impressed with RocksDB performance, space requirements and feature set.

Big Data Developers can use Spark, Cascdading or Pig to do pretty advanced data warehouse, analytics, ETL tasks. Web app developers accessing databases can use an ORM e.g. Hibernate to abstract away the SQL layer or use JSON to query MongoDB, CouchDB etc.

You could write every application under the sun without knowing a single bit of SQL.

next

Legal | privacy