I am working on this for a different definition of term dataset. I started learning deep learning which led me to start building datasets.
Wanting to store versions of the datasets efficiently I started building a version control system for them. It tracks objects and annotations and can roll back to any point in time. It helps answer questions like what has changed since the last release and which user made which changes.
Still working on the core library but I'm excited for it.
Love it! It'd be nice to have a couple more example datasets, and I'd promote them to the top level (rather than in 'more'). I think the first thing many users will want to do is just try to tool with some pre-provided datasets - and when it works nicely (which it does!) then maybe import their own data
Nice! Glad it resonated. Never quite sure how a project like this will land.
Thanks for sharing those - will check them out. Interested to see what happens as the size of the dataset grows.
I have not looked deeply, but Typesense[1] seems like another interesting project. Similar to ES or Algolia, easy to self-host, & with a seemingly efficient memory & disk footprint.
reply