-1

I have a raw event table that I'm working with. It has two columns date and metadata. Metadata has raw json dump of all the event attributes. But I'd like to make it clear that each time different attributes are sent. And then I need to ingest this data into looker.

Raw table:

timestamp metadata
2024-04-1 {"type":"created","title":"test1","due":"2024-04-02","id":12345}
2024-04-1 {"type":"confirmed","id":12345}
2024-04-1 {"type":"completed","id":12345, "completedby":"johndoe"}

Now I need to normalize it

OPTION A :

timestamp type title due_date id completed_by
2024-04-1 created test1 2024-04-02 12345
2024-04-1 confirmed 12345
2024-04-1 completed johndoe

OPTION B:

timestamp type title due_date id completed_by
2024-04-1 created test1 2024-04-02 12345 johndoe
2024-04-1 confirmed test1 2024-04-02 12345 johndoe
2024-04-1 completed test1 2024-04-02 12345 johndoe

How should I design the table, should I fill all the rows with their respective information (Option B) or should I leave them as nulls (Option A)?

3
  • Option B allows you to get all the information for a type. Commented May 28 at 13:47
  • I'm not sure I understand the raw table: are the created, confirmed and completed types different events? as in: john doe created something in one moment in time, then confirmed, and finally completed? do you want to treat these as different events?
    – Aleix CC
    Commented May 29 at 7:44
  • @AleixCC, yup the idea here is that Johndoe conducted different actions in that day (he first created a task, confirmed it, and then completed it) - so I thought to treat them as different events to show the types of actions made.
    – noor h
    Commented May 29 at 10:08

1 Answer 1

1

Option B is what I would aim for, specially because it holds relevant information for all rows, which would make it easier for reporting purposes.

Also, given that you want to treat the different actions (created, confirmed, completed) as "sub-events", one thing you could do on top of option B is to also add a surrogate key based on the id, the type and the timestamp. See an example below using the dbt_utils package:

{{ dbt_utils.generate_surrogate_key(['id', 'type', 'timestamp']) }} as sub_event

This way, you can have a primary key in your model to define the granularity, as well as e.g. easily count "sub-events" for each user.

Not the answer you're looking for? Browse other questions tagged or ask your own question.