0

I am trying to sink data to a delta file. I want to use a insert and update method using the unique Id column: (sink settings)

enter image description here

Whenever the update method is allowed a AlterRow operation will appear. I have the following settings for the alterRow:

enter image description here

I want a row to be updated whenever the id value is the same (set to true()). I want a row to be inserted whenever the id value is not yet in the delta file (set to false().

Is this correct?

2
  • what is your source and sink?
    – Pratik Lad
    Commented Oct 6, 2023 at 10:26
  • Source is a csv in a data lake. Sink is a delta file in a data lake
    – Herwini
    Commented Oct 6, 2023 at 10:28

2 Answers 2

1

If you want inserting when the id doesn't exist and updating when it does, you could just select upsert as your update method.

An upsert is a combination of an insert and an update.

The way you've currently got it configured means that it will always update (as it's set to true) and will never insert (as it's set to false).

4
  • So then I just have to allow it for upsert and set it tow true() ?
    – Herwini
    Commented Oct 6, 2023 at 11:31
  • @Herwini Yes. I've never upserted into a delta file in a data lake before., but I use that for other sink types
    – B.Griffin
    Commented Oct 6, 2023 at 11:33
  • And then it automatically uses the id column for the upsert? Thanks for your answers
    – Herwini
    Commented Oct 6, 2023 at 11:58
  • @Herwini So long as you have selected id as the key column (which in your first image you have), any update method will run against that column
    – B.Griffin
    Commented Oct 6, 2023 at 12:00
1

I want a row to be updated whenever the id value is the same (set to true()). I want a row to be inserted whenever the id value is not yet in the delta file (set to false().

To achieve this, you can use the upsert option directly with alter row transformation as below

enter image description here

then you need to set update method as upsert in sink with the key column based on which it will update the destination.

enter image description here

Not the answer you're looking for? Browse other questions tagged or ask your own question.