databricks - Delta Tables...do we need partitions for concurrent write/update?

I am starting to use Databricks in AWS. I have a delta table that contains KPIs, with each KPI having a KPI ID (1000, 1001, 1002, etc...). We want to have concurrent processes that update those KPIs at the same time, such as one process updates data for KPI1000 while another process updates data for KPI1002, both at the same time. When we were doing this in the old platform (teradata) we actually had to partition the target table by KPI IDs so the process to update KPI1000 only locks rows for that KPI, leaving the rows for other KPIs open to be updated at any time by the other processes.

Question is, do we need to use partitions with Delta Tables to accomplish this same outcome? I was reading this article from Databricks where it mentioned about Optimistic Concurrency Control, and it gave me the impression that perhaps partitions are not required and that delta tables will allow concurrent writes on a table, whether I have partitions in it or not. Is my interpretation correct?. Just in case this helps, I am not using Unity Catalog at this time. Thanks

asked Jan 29 at 12:13

chulo

631 silver badge9 bronze badges

1

In all my reading about delta tables, partitioning is only mentioned for read performance. Updates to delta tables occur in the accompanying "transaction log" folder. There are also checkopint files generated at unspecified times. So data only gets written into new files. The existing file is never updated. So there is never any write contention on existing data like there is in a standard RDBMS. Honestly it seems like a complete step backwards to me. This has a lot more detail databricks.com/blog/2019/08/21/…
– Nick.Mc
Commented Jan 29 at 12:41
@nick.mc Thank you for the extra details. Now I am curious to know why you think it is a step backwards.
– chulo
Commented Jan 29 at 20:28
Delta has many advantages over traditional relational tables, particularly for analytics, but it is also missing a lot of mature features and is busy playing catchup. For example, primary keys in delta tables are "informational" - they aren't enforced.
– Nick.Mc
Commented Feb 2 at 0:18

Add a comment |

Collectives™ on Stack Overflow

Delta Tables...do we need partitions for concurrent write/update?

0

Browse other questions tagged
databricks
delta-lake
insert-update
hive-partitions
or ask your own question.

Hot Network Questions

Collectives™ on Stack Overflow

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Browse other questions tagged databricksdelta-lakeinsert-updatehive-partitions or ask your own question.

Browse other questions tagged
databricks
delta-lake
insert-update
hive-partitions
or ask your own question.