Is there a way to efficiently code inductive image biases into models while retaining all the flexibility of transformers? “Yes,” say researchers from Germany’s Heidelberg University. In a new paper, the team proposes a novel approach that combines the effectiveness of the inductive bias in convolutional neural networks (CNNs) with the expressivity of transformers to model and synthesize high resolution images.
Here is a quick read: Heidelberg University Researchers Combine CNNs and Transformers to Synthesize High-Resolution Images
The paper Taming Transformers for High-Resolution Image Synthesis is on arXiv. This project is also on GitHub.
submitted by /u/Yuqing7
[link] [comments]