NeurIPS is a major venue covering a wide range of ML & AI topics. Of course, there is something interesting for Graph ML aficionados and knowledge graph connoisseurs š§. Tune in to find out!
This year, NeurIPS had 1900 accepted papers š³ and 100+ among them are on graphs. Plus take into account several prominent workshops likeĀ KR2ML,Ā DiffGeo4ML, andĀ LMCA. Be sure to check their proceedings as such workshop papers are likely to appear at future venues like ICLR, ICML, or ACL. Furthermore, š theĀ GML Newsletter #4Ā š by Sergey Ivanov presents an overview of Graph ML papers from NeurIPS including theory, oversmoothing, scalability, and many more, so check it out as well.
In this post, Iād like to put an emphasis on a particular type of graphs,Ā knowledge graphs (KGs), and explore with you 10 papers that might be quite influential in 2021. NeurIPS papers are often a bit more high level with more theory than NLP applications in *CL conferences, so Iād summarize them as:
Behind the curtain of transductive link prediction we step into the valley of logical reasoning tasks for KGs to excel at
Get some āļø, šµ, or even some Glühwein ā today in our agenda:
- Query Embedding: Beyond Query2Box
- KG Embeddings: NAS, š¦ vs š®, Meta-Learning
- SPARQL and Compositional Generalization
- Benchmarking: OGB, GraphGYM, KeOps
- Wrapping up
If this in-depth educational content on is useful for you, you canĀ subscribe to our AI research mailing listĀ to be alerted when we release new material.Ā
Query Embedding: Beyond Query2Box
Query embedding (QE) is about answering queries against a KG directly in the embedding space without any SPARQL or graph database engines. Given that most of KGs are sparse and incomplete, query embedding algorithms are able to infer missing links (with a certain probability). This is one of the hottest topics in Graph ML so far! š„
In theĀ ICLR 2020 post, we coveredĀ Query2Box, a strong QE baseline capable of answering logical queries withĀ conjunctionĀ (ā§),Ā disjunctionĀ (āØ), andĀ existential quantifiersĀ (ā) by modeling entities as d-dimensional boxes š¦.
Ren and LeskovecĀ (the authors of the original Query2Box) finally add aĀ negationĀ operator (¬) in theĀ BetaEĀ framework. Neither points nor boxes have a usable denotation of negation so thatĀ BetaEĀ models entities and queries asĀ Beta distributions. Projection and intersection are modeled nicely with beta distributions as well (negation is a distribution with reciprocal alpha and beta parameters). In addition toĀ DNF, we can useĀ De Morganās lawĀ for replacing disjunctions with negations and conjunction. Check out the nice illustration of the approach below š
š§ŖĀ BetaEĀ slightly outperformsĀ Query2BoxĀ on existing query patterns š while deriving and experimenting on new patterns with negations that can not be answered by any existing QE approach yet š. Two more š differences toĀ Q2B:Ā BetaEĀ better captures query uncertainty (correlation between the differential entropy of the Beta embedding and the cardinality of the answer set, up to 77% better), and can estimate if a given query has zero answers.

On the other hand, Sun et al identified that Q2B and other systems are notĀ logically faithful, that is, not allĀ logically entailedĀ query answers can be retrieved by the QE system. To bridge this š³ gap, the authors introduceĀ EmQLĀ (Embedding Query Language).Ā EmQLĀ still embeds entities into a d-dimensional space and supports ā§, āØ, and ā, but takes a different approach on modelling sets š¤. Instead of boxes or Beta distributions, the authors encode each setĀ XĀ with a pairĀ (a_x,b_x)
Ā whereĀ a_x
Ā is a weighted centroid of set elements, andĀ b_x
Ā is aĀ count-min sketchĀ (CM sketch). Each sketch consists ofĀ DĀ hash functions of depthĀ WĀ (thus aĀ DĆWĀ matrix, the authors select 20Ć2000).
How does it work?
* Using centroids, top-k MIPSĀ a_x.T.mm(E)
Ā retrievesĀ kĀ possible candidate entities that belong toĀ X;
* Importantly, for CM sketches we have a differentiable retrieval operatorĀ CM(i, b_x)
Ā that returns a weight of entityĀ iĀ in a setĀ X;
* We then can combine MIPS with CM-based filtering š„
* The authors then define ā§, āØ, and ā as operators over centroids and CM sketches
š§Ŗ InĀ experiments, the authors probeĀ EmQLĀ onĀ generalizationĀ (answering queries, standard QE task) andĀ entailmentĀ (when a full KG is given, no link prediction required). On average,Ā EmQLĀ outperformsĀ Q2BĀ by 10ā15 H@3 points on FB15k-237 and NELL on the generalization task, and completely dominates on the entailment task (94.2 vs 36.2) š.
Furthermore, EmQL was tested on multi-hop QA benchmarks like MetaQA and WebQSP where it outperforms evenĀ the recent EmbedKGQA from the ACL 2020Ā šŖ
Note thatĀ EmQLĀ does not support negations (¬) allowed byĀ BetaE.Ā Yet? š

KG Embeddings: NAS, š¦ vs š®, Meta-Learning
Something really interesting this year at NeurIPS going beyond āyet-another-KG-embedding-algorithmā. Youāve probably heard aboutĀ Neural Architecture Search (NAS)Ā and its successes in computer vision ā for instance, recent architectures likeĀ EfficientNetĀ are not designed by humans. Instead, a NAS system generates a neural net from a bunch of smaller building blocks š§±optimizing certain metrics. Can we have a NAS to generate efficient architectures for KG-related tasks?

Zhang et alĀ say yes! They proposeĀ Interstellar, an RNN-based NAS approach for relational paths.Ā InterstellarĀ first requires sampling paths from a KG (biased random walks in this case), and then those paths are fed into an RNN. The whole RNN net (cell and weights) is a subject šÆ for NAS. The process is split into two parts: macro-level (e.g., scoring functions) and micro-level (activations and weights) which are governed by a controller š¤. As Hits@10 and related metrics are non-differentiable, the authors resort to policy gradients to optimize the controller.
š§Ŗ InĀ experiments,Ā InterstellarĀ is probed on link prediction and entity matching tasks showing competitive results. Each task requires certain seed architectures (like on a picture š), and finding a good net might take a while ā³ (about 30 hours on search and 70 hours on fine-tuning for FB15k-237), but look, itās the first step showing that NAS is generally applicable for KG-related tasks and can create new RNN architectures! š¤©
Besides, letās see how fast the next Nolanās movie, Tenet, will get some traction in the model-naming world š

Geometric embedding models enjoy ever-increasing attention š from the community! Last year,Ā in the NeurIPSā19 post, we noticed a surge in approaches that use hyperbolic geometry š® for graph representation learning. This year, we have a new strong geometric competitor: hyper-rectangles, aka boxes š¦!
WhereasĀ Query2BoxĀ used boxes for query embedding,Ā Abboud et alĀ develop the idea further and designĀ BoxE, a provably fully-expressive KG embedding model where entities are points in a vector space and relations are boxes š¦. Each relation is modeled with as many boxes as theĀ arityĀ of the relation is, e.g., for aĀ binaryĀ predicateĀ capitalOf(Berlin, Germany)
Ā there will beĀ twoĀ boxes for head and tail entities, and forĀ n-aryĀ predicates, there will beĀ nĀ boxes. Each entity, in addition to the base position, has an additional parameterĀ translational bumpĀ which aims at bringing entities occurring in the same relation closer š³ (check out an illustrated example š).
The authors do invest into theory š and prove several important properties ofĀ BoxE: it can model many inference patterns except composition, it allows for rule injection š (and hence, for injecting ontological axioms), and it is fully expressive. However, itās fully expressive only when the embedding dimension isĀ |E|x|R|Ā for binary relations andĀ |E|^(n-1)x|R|Ā for n-ary predicates, which is, hmm, a bit too much š (interestingly, the authors of Query2Box also showed that you need aboutĀ |E|Ā embedding dimension for modeling an arbitrary FOL query).
āļø BoxEĀ was evaluated on triple-based benchmarks like FB15k-237 as well as on n-ary graphs like JF17K. Although the embedding dimension varies in the range 200ā1000 (not 15000Ć237 as theory needs for FB15k-237, for example),Ā BoxEĀ is still quite competitive and performs on par āļø with current SOTA on the graphs w/o many compositional patterns. The authors also compiled a nice experiment on injecting logical rules over the NELL-sports dataset and showed impressive >25 MRR points gains šŖ.
As 2020 is the year of boxes š¦, do not miss the work ofĀ Dasgupta et alĀ published here at NeurIPS who study boxes deeper on the subject of local identifiability and come up with an idea of using Gumbel distributions to model box parameters.

We also rememberĀ E2R from NeurIPS 2019, a KG embedding model based on quantum logic with interesting properties (either very high š or very low š performance). By that time, E2R only worked in the transductive setup (which means the whole graph is seen during training). This year,Ā Srivastava et alĀ further extend the model and come up with IQE (Inductive Quantum Embedding). š Essentially,Ā IQEĀ now accepts node features so that an entity embedding has to correlate with its feature vector. Furthermore,Ā IQEĀ is now optimized with a novelĀ Alternating MinimizationĀ scheme which the authors find to be approximately 9 times faster š than vanilla E2R. The authors also provide a solid theoretical justification of modelās properties and when one should expect the model to be NP-hard.
š©āš¬ Conceptually, the model supports binary predicates, but the authors concentrate on the fine-grained entity typing task (FIGER, Ontonotes, TypeNet) using BiLSTM as a context encoder. Note that IQE needs only about 6 epochs to converge (on FIGER ā in comparison, E2R required 1000 iterations)! Qualitatively, IQE outperforms the original transductive model by 25ā30 accuracy and F1 points š

Continuing on inductive tasks,Ā Baek et al study two particular link prediction setups: 1) given a training seen graph, a newĀ unseen Ā š» node arrives, and you need to predict its connections toĀ seenĀ š nodes (š» -> š ); 2) moreĀ unseenĀ nodes arrive and you need to predict links amongĀ unseenĀ nodes themselves (š» -> š» ). Sounds pretty complex, right? Usually, in transductive tasks, a model learns entity and relation embeddings of all seen nodes, and inference is performed on a set of seen nodes. Here, we have unseen nodes, and, often, without node features.

The authors resort toĀ meta-learningĀ and proposeĀ Graph Extrapolation Networks (GEN)designed toĀ extrapolateĀ the knowledge from the seen entities to unseen. Furthermore, the authors define the task in theĀ few-shotĀ setting, that is, unseen new nodes might have 3ā5 (K) links to existing nodes or between other unseen nodes š¤.
The meta-learning š©āš« task for GEN relies mostly onĀ relations: given a support set ofĀ KĀ triples for an unseen nodeĀ e_i, apply neighborhood aggregation through a learnable relation-specific weightĀ Wr. In fact, š any relation-aware GNN architecture might be plugged in here. In other words, we meta-learn an embedding of an unseen entity using its neighborsā representations. To cater for the uncertainty of the few-shot scenario, the authors stochastically embed unseen entities as a distribution which parameters are learned with 2 GEN layers through MC sampling (somewhat resemblesĀ GraphVAEs).
š§Ŗ GENĀ has been evaluated on 1- and 3- shot LP tasks on FB15k-237 and NELL-995 yielding significant š improvements when considering unseen-to-unseen links. In addition, GEN has been applied to relation prediction onĀ DeepDDIĀ andĀ BioSNAP-subĀ datasets with impressive gains over baselines, e.g., 0.708 vs 0.397 AUPRC on DeepDDI.
š„ Overall, NeurIPSā20 opened up several prospects in the KG embedding area: look, Neural Architecture Search š works, Meta-Learning works, Quantum and š¦ models are getting more expressive! Thanks to that, we can now solve much more complex tasks than vanilla transductive link prediction.
SPARQL and Compositional Generalization
š In question answering over KGs (KGQA), semantic parsing transforms a question into a structured query (say, in SPARQL) which is then executed against a database. One of the š problems there is compositional generalization, that is, can we build complex query patterns after observing simple atoms? In theĀ ICLRā20 post, we reviewed a new large-scale datasetĀ Complex Freebase Question (CFQ) (letās forgive them for š§āāļø Freebase) that was designed to measure compositional generalization capabilities of NL 2 SPARQL approaches. Notably, baselines like LSTMs and Transformers perform quite poorly: <20% accuracy on average š
šĀ Guo et alĀ present a thorough study of potential caveats, i.e., one of the biggest issues is sequential decoding ā or any kind of ordering bias when generating queries or logical forms including tree decoding. Instead, they propose to leverageĀ partially ordered sets (posets)Ā and, conversely,Ā Hierarchical Poset Decoding (HPD).Ā PosetsĀ allow us to enforce permutation invariance in the decoding process (for instance, predicting two branches of aĀ logical ANDĀ operator independently) so that a model could concentrate on generalization. Posets can be represented as DAGs. Components of that DAG can be predicted by a simple RNN (which the authors resort to).
However, direct prediction of posets does not bring benefits (works even worse than LSTMs and Transformers š ). The essential part is hierarchical decoding (check the š¼ below) which consists of 4 steps. 1ļøā£ First, we predict a post sketch (de-lexicalized DAG). 2ļøā£ Independently, we predict primitives of our query (sort of entity and relation recognition). 3ļøā£ Then, we fill in the primitives into the poset sketch in all possible permutations, and 4ļøā£, predict which particular paths actually do belong to the correct target poset.
š§Ŗ Experimentally,Ā HPDĀ performs surprisingly well š ā on average, 70% accuracy on 3 MCD splits compared to 20% by Universal Transformer andĀ 40%-ish by mighty T5ā11B. Ablations show that seq2seq and seq2tree sketch predictions only worsen the performance, and the hierarchical component is crucial (otherwise minus 50% accuracy). š„ Hopefully, this work will inspire more research on compositional generalization and complex KGQA!

Benchmarking: OGB, GraphGYM, KeOps
Tired of seeing Cora/Citeseer/Pubmed in every other GNN paper? You should be: they are small, expose certain biases, and modelsā performance has pretty much saturated. Time for a big change!

Open Graph Benchmark (OGB)Ā (paper by Hu et al) is a great new effort by the Graph ML community to create a set of complex and diverse tasks on different forms of graphs (leaderboards included š ). OGB offersĀ node classification,Ā graph classification,Ā link predictionĀ tasks on graphs of various sizes (as of now, the biggest graph contains ~100M nodes and ~1.6B edges) and domains (KGs are here, too š: Wikidata-based and BioKG link prediction datasets).
š„ OGB leaderboards have already generated several Twitter storms: for instance, suddenly, aĀ simple label propagation algorithmĀ of 10K-100K parametersĀ outperformsĀ big and slow GNNs of 1M+ parameters by a large margin on transductive node classification tasks š. Clearly, there is still an unexplored room of capabilities and limitations of GNNs. Could Cora/Citeseer/Pubmed demonstrate it? Probably not š¤·āāļø.
Okay, we have such a big variety of tasks now! On the other hand, we have dozens of GNN architectures and hundreds of hyperparameters to tune. Is there a sweet spot, a good starting point to dig into a certain task? The space is so large! š¤ÆĀ You, Ying, and LeskovecĀ tackle exactly this problem of exploring design spaces of GNNs and introduceĀ GraphGYM, a comprehensive suite for creating and evaluating GNNs (and flexing your GNN muscles šŖ). The authors define GNN design and task spaces, each consisting of fine-grained details, e.g., 12 design dimensions: batch norm, dropout rates, aggregation functions, activation functions, node features pre-/postprocessing layers, number of message passing layers, skip-layers, batch size, learning rate, optimizers, and training epochs. Couple that with dozens of tasks, and a Cartesian product of possible combinations surpasses 10M options! š
In the rich experimental agenda, the authors find the best working combinations that you could adopt as good starting points and produce very insightful charts š. TheĀ repoĀ is openly available, you could start experimenting pretty much right away!
š By the way, if youāre looking for something similar in the KG embeddings domain, our team has recently completedĀ a huge surveyĀ of models and hyperparameters concentrating on the link prediction task.

ā”ļø Finally, I would like to outline the work byĀ Feydy et alĀ onĀ KeOps, a blazing fast kernel operations library with NumPy, PyTorch, R, and Matlab bindings. In addition to widely used dense and sparse matrices, the authors supportĀ symbolic matricesĀ (whereĀ ij-thĀ member is computed via a certain formulaĀ F,Ā often a matrix reduction formula). Symbolic matrices are computed on the fly š and optimized for CUDA computations. The authors do invest into benchmarking: on a rather standard server workstation with a 8-core Xeon, 128 Gb RAM, RTX 2080 Ti, KeOps outperforms isĀ 5x ā 20xĀ fasterĀ than PyTorch implementation on the same tasks (then PyTorch crashes with OOM while KeOps works fine).
- You can also perform a kNN search and be competitive with FAISS!
- Some implementations inĀ PyTorch-GeometricĀ already work well with KeOPS
Personally, Iāve been using PyKeOps since summer and find it extremely helpful when working with large-scale KGs. Besides, I compiled the library on a PowerPC + CUDA cluster, please feel my pain š

Wrapping Up
NeurIPS concludes the line-up of top AI conferences, but ICLR 2021 scores are already out there š. If you want to keep updated on Graph ML topics, you could subscribe to theĀ regular newsletterĀ by Sergey Ivanov or join the TelegramĀ GraphML channel!
Merry Christmas, happy New Year, and stay safe š¤

This article was originally published onĀ Medium and re-published to TOPBOTS with permission from the author.
Enjoy this article? Sign up for more AI research updates.
Weāll let you know when we release more summary articles like this one.
The post Machine Learning on Knowledge Graphs at NeurIPS 2020 appeared first on TOPBOTS.