Skip to content

Latest commit

 

History

History

Graph4KG

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Graph4KG: A PaddlePaddle-based toolkit for Knowledge Representation Learning

Overview

Graph4KG is a flexible framework to learn embeddings of entities and relations in KGs, which supports training on massive KGs. The features are as follows:

architecture

  • Batch Pre-loading. This overlaps the time of loading batch data for next step and GPU computations of current step.
  • Storage and Computation Separation. Entity embeddings are stored on the disk and loaded in the mmap mode, while computations are conducted with GPUs.
  • Asynchroneous Gradient Update. This also overlaps the computation time and gradient update time. In this case, there is at most four-step delay for gradient update. As KGs are always sparse, this asynchrony will not hurt performance.

Besides, it provides the 1st place solution in KDD Cup 2021.

Requirements

  • paddlepaddle-gpu>=2.3rc
  • pgl
  • ogb==1.3.1 (optional for wikikg2 and WikiKG90M)

paddle建议使用最新develop版本。

Models

  • TransE
  • DistMult
  • ComplEx
  • RotatE
  • OTE

You can implement your score function in models/score_func.py. Besides shallow methods, CNN and GNN based methods are coming soon.

Negative Sampling: Negative samples are constructed by randomly replacing head or tail entities with other entities. Here we implement three uniform sampling strategies.

  • full: Randomly sample entities from all entities in KGs.
  • batch: Randomly sample entities from entities arises in the same batch.
  • chunk: Randomly sample entities from all entities in KGs. Besides, triplets in a batch are divided into K chunks and each chunk shares the same collection of negative samples.

Dimension: embed_dim in config.py denoteds the dimension of real embeddings. Graph4KG will assign entity embeddings' dimension as embed_dim * 2 for complex methods like RotatE and ComplEx, and as embed_dim * 4 for quaternion methods like QuatE.

Datasets

  • FB15k
  • FB15k-237
  • WN18
  • WN18RR
  • ogbl-wikikg2
  • WikiKG90M

Furthermore, other datasets formated as follows per line are also supported. You can add such new dataset in dataset/reader.py.

HEAD_ENTITY\tRELATION\tTAIL_ENTITY\n

Examples

Scripts of different training settings are provided, including

  • single-GPU
  • mix-CPU-GPU + async-update
# download datasets
sh examples/download.sh

# FB15k
sh examples/fb15k.sh

# FB15k-237
sh examples/fb15k237.sh

# WN18
sh examples/wn18.sh

# WN18RR
sh examples/wn18rr.sh

# WikiKG90M
sh examples/wikikg90m.sh

Results

MRR of single GPU version

Model FB15k FB15k-237 WN18 WN18RR
TransE 0.655 0.316 0.571 0.189
DistMult 0.746 0.322 0.823 0.441
ComplEx 0.808 0.324 0.922 0.464
RotatE 0.736 0.225 0.947 0.469
OTE 0.617 0.299 0.812 0.466

MRR of Mix CPU and GPU version

Model FB15k FB15k-237 WN18 WN18RR
TransE 0.648 0.315 0.568 0.187
DistMult 0.744 0.305 0.822 0.441
ComplEx 0.789 0.312 0.925 0.464
RotatE 0.589 0.286 0.943 0.463
OTE 0.512 0.297 0.656 0.302

MRR of WikiKG90M

Model MRR
TransE 0.85
RotatE 0.88
OTE 0.89