# bigsets **Repository Path**: mirrors_basho/bigsets ## Basic Information - **Project Name**: bigsets - **Description**: No description available - **Primary Language**: Unknown - **License**: Apache-2.0 - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2020-08-08 - **Last Updated**: 2026-01-17 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README bigset: A Riak Core Application ====================================== # What? A prototype / PoC delta orswot database built on riak-core # Why? Inserting an element into a set in riak's data types Implementation involves reading the whole set off disk, deserialising it, appending the new element, reserialising, and writing, then replicating to N-1 replicas. If you were unlucky any or all of the replicas updated that set concurrently, so at each replica you must read the whole set, merge the incoming and local sets, serialise, and write to disk. That dance means sets are not very scalable on riak. Here is a graph of inserting just 10k elements into riak: ![10k inserts](doc/dt-add.png) Here is one of basho_bench trying to load 100k elements into a single set: ![100k inserts](doc/100kelements-dt-killed.png) I had to kill the run before my machine melted. ## Really WHY? Serious? OK: because adding and element to a set should not take time proportional to the size of the set. We're looking for something better than `O(n)`. If we can add elements efficiently, and look them up efficiently, maybe we can have sets that are pretty large. Like 100s of 1000s, or even millions of elements. # How? See the doc (doc/design.md) for details. The overview is: don't store everything in one key. Use a logical clock that we read per write to generate dots, and then append the new elements and dots to the set. Take advantage of Level for this. As a taster, here are some current results: ![10k inserts](doc/bs-add.png) And here is that 100k insert run in basho-bench ![100k inserts](doc/bs-100k.png) I think that shows enough promise to work on. # When? I'm working on it. Current sticking points * reads are really slow as they are a fold of N keys * batch inserts are slow as they generate N writes Pretty sure you can check JIRA for the plan and progress.