This blog post is part of a series on Cloudera’s Operational Database (OpDB) in CDP. Each post goes into more details about new features and capabilities. Start from the beginning of the series with, Operational Database in CDP.
This blog post provides an overview of the OpDB data integrity capabilities that help you achieve ACID transactions and data consistency. OpDB guarantees certain properties to ensure atomicity, durability, consistency, and visibility. We’ll see in this blog post how some of these capabilities can help you to achieve your data integrity goals.
Referential integrity is supported through the implementation of ‘constraints’ as well as enforcing business rules for attributes in the table.
Constraints are configurable, and you can use it across different tables. Keep in mind that you have to choose a behavior depending on the specific configuration given to that constraint.
You can use constraints to enforce certain business rules. By checking all the “puts” in a table, you can enforce data policies. For example, you can set a policy where a certain column family-column qualifier pair always has a value between a range, say 1-20. This way, a “put” is rejected when the value is not in the range and data integrity is maintained.
For more information about constraints, see constraints in the Apache HBase developer API docs.
Similar to referential integrity, non-relational integrity is accomplished through the implementation of constraints and is used to enforcing business rules for attributes in the table (for example, make sure values are in the range 1-10) across any content type that the user chooses to implement in their schema.
Entity and domain integrity
You can use multiple tools that are provided with OpDB including HBCK2, and the IndexScrutinyTool. The HBCK2 tool helps you with finding and resolving any integrity issues. And, you can use the IndexScrutinyTool to identify invalid rows in a source table, typically the date or index table. The IndexScrutinyTool writes the invalid rows that it finds to a file or an output table.
For more information about using the HBCK2 tool, see Using the HBCK2 tool to remediate HBase clusters.
Full ACID compliance is provided on the system for single-row transactions as well as the flexibility of late-bound, schema-on-read capabilities from the NoSQL world. OpDB guarantees the following properties:
- Atomicity: All the changes in a transaction will be successfully applied or none will be applied in the case of a failure.
- Durability: Data written during a successful transaction is persisted to the storage and is not lost when there is a system failure.
- Consistency: Actions cause the table to transition from one valid state to another. This means, for example, a table won’t be lost during a transaction.
- Visibility: Any subsequent read after an update is committed will see that update.
Strong and Timeline Consistency is supported. A client can indicate the level of consistency it requires for a given read operation. The default consistency level is STRONG, meaning that the read request is only sent to the RegionServer servicing the region.
This is the same behavior as when read replicas are not used. The other possibility, TIMELINE, sends the request to all RegionServers with replicas, including the primary. The client accepts the first response, which includes whether it came from the primary or a secondary RegionServer. If it came from a secondary, the client can choose to verify the read later or not to treat it as definitive.
In this blog post, we looked at how you make use of the data integrity capabilities in OpDB. In the next article, we’ll cover the application support aspects of OpDB, read it here.