Switched from schema-less NoSQL store to Datomic; leveraging rich, flexible information model to simplify complex development problems, and Datomic’s built-in notion of time for auditing and data provenance.
Datomic…can be treated as a distributed, indexed virtual memory system supporting cluster-wide transactional updates and explicit consistency.
Vital Reactor is a small company engaged in a joint venture with a major US-based hospital to develop a series of digital health interventions that improve collaboration among doctors and patients. This is a second generation SaaS system leveraging our joint learning from research prototypes to prepare the platform and hospital workflow practices for larger population trials and multi-center production. These applications are delivered to end users via both web and mobile rich-client interfaces and the backend needs to aggregate data from 3rd party devices and services, traditional Health IT systems, and user self-report. We must be fully compliant with all regulatory guidelines, including HIPAA.
We originally designed our platform to run natively on top of a cloud computing service and we needed a database solution for a set of complex data management challenges, architected for the same environment. We wanted to provide a simple, yet powerful, programming model for the developer while optimizing for a low-complexity cloud deployment strategy. We previously used one of the popular NoSQL databases, but found that schema-less databases and document-oriented data models artificially constrained how our system was being constructed. While appealing at first, explicitly shared, renormalized, non-transactional systems force the programmer to implement some of the key capabilities of a database to ensure horizontal scaling.
Datomic, by contrast, can be treated as a distributed, indexed virtual memory system supporting cluster-wide transactional updates and explicit consistency. We use Datomic to store all of our domain data and to index into our storage layer. Message passing in the cluster is performed by a messaging system that passes references to Datomic, using transaction timestamps to ensure that remote nodes are synchronized to the latest Datomic state prior to processing the message, ensuring cluster-wide consistency.
Datomic integrates well with the rest of our data ecosystem. We complement our Datomic setup with a storage layer based on DataStax Enterprise which includes Cassandra, Solr and Hadoop. We use this stack to handle large blob storage, large-scale full text indexing, streaming writes (logs and time-series data), batch analytics and backups, and in the near future, storing Datomic indexes.
One feature in particular highlights the flexibility of Datomic. In any system that exists under regulatory scrutiny, that system has to have fine-grained authorization mechanisms. We were able to integrate a rich, natural permission management model that augments our user group structure without resorting to a complex set of ACL tables, which is the industry-standard way to solve this problem. A set of query rules characterize a transitive path through a set of entities that are related via authorization and membership attributes. For example, an administrator of a clinic has write access to all the resources of those accounts by transitivity whereas a physician only has read access. The physician acquires write permissions to their own patients via a separate group object representing their active patients. A user can share parts of their record with other users by creating a lightweight group and adding contains relationships between that group and some specific resources. Resources can also be organized hierarchically, so all instances of a type can be easily shared. Finally, when new data is created, transaction functions are used to ensure that important invariants hold, such as the creator being able to write the object they created, at the time the object is asserted into the database.
Datomic’s native support for retaining historical state has additional benefits for our application including debugging (what exact system state caused the observed behavior), low-overhead auditing, data provenance, and edit histories. This is especially key for systems where regulatory scrutiny places high value on data provenance. We have found the architecture to scale well in our cloud-native deployments; latency observed in our systems with Datomic have been superb and have scaled well in testing. We have observed HA fail-over during test conditions and in practice during upgrades. In addition, support from the Datomic team has been extremely responsive and we have found feature updates to be seamless and stable.