Whats the best way to design an event table where different event items are provided each time the event updated

Question

I have a raw event table that I'm working with. It has two columns date and metadata. Metadata has raw json dump of all the event attributes. But I'd like to make it clear that each time different attributes are sent. And then I need to ingest this data into looker.

Raw table:

timestamp	metadata
2024-04-1	{"type":"created","title":"test1","due":"2024-04-02","id":12345}
2024-04-1	{"type":"confirmed","id":12345}
2024-04-1	{"type":"completed","id":12345, "completedby":"johndoe"}

Now I need to normalize it

OPTION A :

timestamp	type	title	due_date	id	completed_by
2024-04-1	created	test1	2024-04-02	12345
2024-04-1	confirmed			12345
2024-04-1	completed				johndoe

OPTION B:

timestamp	type	title	due_date	id	completed_by
2024-04-1	created	test1	2024-04-02	12345	johndoe
2024-04-1	confirmed	test1	2024-04-02	12345	johndoe
2024-04-1	completed	test1	2024-04-02	12345	johndoe

How should I design the table, should I fill all the rows with their respective information (Option B) or should I leave them as nulls (Option A)?

I'm not sure I understand the raw table: are the created, confirmed and completed types different events? as in: john doe created something in one moment in time, then confirmed, and finally completed? do you want to treat these as different events? — Aleix CC, Commented May 29 at 7:44
@AleixCC, yup the idea here is that Johndoe conducted different actions in that day (he first created a task, confirmed it, and then completed it) - so I thought to treat them as different events to show the types of actions made. — noor h, Commented May 29 at 10:08

Aleix CC · Accepted Answer · 2024-05-29 11:10:44Z

Option B is what I would aim for, specially because it holds relevant information for all rows, which would make it easier for reporting purposes.

Also, given that you want to treat the different actions (created, confirmed, completed) as "sub-events", one thing you could do on top of option B is to also add a surrogate key based on the id, the type and the timestamp. See an example below using the dbt_utils package:

{{ dbt_utils.generate_surrogate_key(['id', 'type', 'timestamp']) }} as sub_event

This way, you can have a primary key in your model to define the granularity, as well as e.g. easily count "sub-events" for each user.

Collectives™ on Stack Overflow

Whats the best way to design an event table where different event items are provided each time the event updated

1 Answer 1

Not the answer you're looking for? Browse other questions tagged
database
database-design
dbt
looker
or ask your own question.

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Not the answer you're looking for? Browse other questions tagged databasedatabase-designdbtlooker or ask your own question.

Related

Not the answer you're looking for? Browse other questions tagged
database
database-design
dbt
looker
or ask your own question.