SpinMeRound: Consistent Multi-View Identity Generation Using Diffusion Models

TL;DR: We introduce SpinMeRound:
    🔥 An indentity consistent multi-view diffusion model
    🔥 It can generate consisten 360 head avatars, given an input facial image.
    🔥 It concurrently generates the corresponding shape normals for all generated views

SpinMeRound is a multi-view diffusion-based approach for generating human portraits from novel viewpoints. Given a number of input views views, our method produces high-fidelity images, ensuring accurate 3D consistency across perspectives.

Despite recent progress in diffusion models, generating realistic head portraits from novel viewpoints remains a significant challenge. Most current approaches are constrained to limited angular ranges, predominantly focusing on frontal or near-frontal views. Moreover, although the recent emerging large-scale diffusion models have been proven robust in handling 3D scenes, they underperform on facial data, given their complex structure and the uncanny valley pitfalls. In this paper, we propose SpinMeRound, a diffusion-based approach designed to generate consistent and accurate head portraits from novel viewpoints. By leveraging a number of input views alongside an identity embedding, our method effectively synthesizes diverse viewpoints of a subject whilst robustly maintaining its unique identity features. Through experimentation, we showcase our model's generation capabilities in 360 head synthesis, while beating current state-of-the-art multiview diffusion models.

Starting with the given input conditioning views, the identity embedding W is extracted via a Face Recognition network (ArcFace). Both the conditioning and target views are encoded and combined with corresponding ray coordinate maps that represent camera poses. After sampling, our method synthesizes photorealistic images from novel angles, along with their associated shape normals N.

BibTeX


      @misc{galanakis2025spinmeroundconsistentmultiviewidentity,
        title={SpinMeRound: Consistent Multi-View Identity Generation Using Diffusion Models}, 
        author={Stathis Galanakis and Alexandros Lattas and Stylianos Moschoglou and Bernhard Kainz and Stefanos Zafeiriou},
        year={2025},
        eprint={2504.10716},
        archivePrefix={arXiv},
        primaryClass={cs.CV},
        url={https://arxiv.org/abs/2504.10716}, 
    }

SpinMeRound: Consistent Multi-View Identity Generation Using Diffusion Models

SpinMeRound is a multi-view diffusion-based approach for generating human portraits from novel viewpoints. Given a number of input views views, our method produces high-fidelity images, ensuring accurate 3D consistency across perspectives.

Abstract

Method Overview

Qualitative results

3D consistency by using 3DGS

BibTeX