PGL/apps/Graph4KG at main · PaddlePaddle/PGL

History

Name		Name	Last commit message	Last commit date
parent directory ..
dataset		dataset
examples		examples
models		models
README.md		README.md
architecture.png		architecture.png
config.py		config.py
dist_train.py		dist_train.py
eval.py		eval.py
train.py		train.py
utils.py		utils.py

README.md

Graph4KG: A PaddlePaddle-based toolkit for Knowledge Representation Learning

Overview

Graph4KG is a flexible framework to learn embeddings of entities and relations in KGs, which supports training on massive KGs. The features are as follows:

Batch Pre-loading. This overlaps the time of loading batch data for next step and GPU computations of current step.
Storage and Computation Separation. Entity embeddings are stored on the disk and loaded in the mmap mode, while computations are conducted with GPUs.
Asynchroneous Gradient Update. This also overlaps the computation time and gradient update time. In this case, there is at most four-step delay for gradient update. As KGs are always sparse, this asynchrony will not hurt performance.

Besides, it provides the 1st place solution in KDD Cup 2021.

Requirements

paddlepaddle-gpu>=2.3rc
pgl
ogb==1.3.1 (optional for wikikg2 and WikiKG90M)

paddle建议使用最新develop版本。

Models

You can implement your score function in models/score_func.py. Besides shallow methods, CNN and GNN based methods are coming soon.

Negative Sampling: Negative samples are constructed by randomly replacing head or tail entities with other entities. Here we implement three uniform sampling strategies.

full: Randomly sample entities from all entities in KGs.
batch: Randomly sample entities from entities arises in the same batch.
chunk: Randomly sample entities from all entities in KGs. Besides, triplets in a batch are divided into K chunks and each chunk shares the same collection of negative samples.

Dimension: embed_dim in config.py denoteds the dimension of real embeddings. Graph4KG will assign entity embeddings' dimension as embed_dim * 2 for complex methods like RotatE and ComplEx, and as embed_dim * 4 for quaternion methods like QuatE.

Datasets

Furthermore, other datasets formated as follows per line are also supported. You can add such new dataset in dataset/reader.py.

HEAD_ENTITY\tRELATION\tTAIL_ENTITY\n

Examples

Scripts of different training settings are provided, including

single-GPU
mix-CPU-GPU + async-update

# download datasets
sh examples/download.sh

# FB15k
sh examples/fb15k.sh

# FB15k-237
sh examples/fb15k237.sh

# WN18
sh examples/wn18.sh

# WN18RR
sh examples/wn18rr.sh

# WikiKG90M
sh examples/wikikg90m.sh

Results

MRR of single GPU version

Model	FB15k	FB15k-237	WN18	WN18RR
TransE	0.655	0.316	0.571	0.189
DistMult	0.746	0.322	0.823	0.441
ComplEx	0.808	0.324	0.922	0.464
RotatE	0.736	0.225	0.947	0.469
OTE	0.617	0.299	0.812	0.466

MRR of Mix CPU and GPU version

Model	FB15k	FB15k-237	WN18	WN18RR
TransE	0.648	0.315	0.568	0.187
DistMult	0.744	0.305	0.822	0.441
ComplEx	0.789	0.312	0.925	0.464
RotatE	0.589	0.286	0.943	0.463
OTE	0.512	0.297	0.656	0.302

MRR of WikiKG90M

Model	MRR
TransE	0.85
RotatE	0.88
OTE	0.89

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Graph4KG

Graph4KG

README.md

Graph4KG: A PaddlePaddle-based toolkit for Knowledge Representation Learning

Overview

Requirements

Models

Datasets

Examples

Results

MRR of single GPU version

MRR of Mix CPU and GPU version

MRR of WikiKG90M

Files

Graph4KG

Directory actions

More options

Directory actions

More options

Latest commit

History

Graph4KG

Folders and files

parent directory

README.md

Graph4KG: A PaddlePaddle-based toolkit for Knowledge Representation Learning

Overview

Requirements

Models

Datasets

Examples

Results

MRR of single GPU version

MRR of Mix CPU and GPU version

MRR of WikiKG90M