Deep Learning SIREN: Sinusoidal Representation Networks for Implicit Neural Representations

Implicit neural representations have emerged as a compelling alternative to conventional digital signal representations. By parameterizing signals with neural networks, this paradigm offers numerous potential advantages. However, traditional network architectures struggle to capture fine details in signals and fail to accurately represent spatial and temporal derivatives. This limitation is significant, especially considering that many physical signals are implicitly defined as solutions to partial differential equations, which inherently involve derivatives.

Addressing these challenges, Sinusoidal Representation Networks, or SIRENs, leverage periodic activation functions within implicit neural representations. This innovative approach enables networks to effectively model complex natural signals and, crucially, their derivatives. This article explores the capabilities of SIRENs, demonstrating their superior performance in representing images, wavefields, video, sound, and their derivatives. Furthermore, we delve into how SIRENs can be applied to solve intricate boundary value problems, such as Eikonal equations, Poisson’s equation, and the Helmholtz and wave equations. Finally, we touch upon the integration of SIRENs with hypernetworks to learn priors over the space of SIREN functions.

SIRENs Outperform Baseline Architectures

To rigorously evaluate the effectiveness of SIRENs, we compared their performance against various established network architectures. These baselines included Multi-Layer Perceptrons (MLPs) of comparable size utilizing common nonlinearities like TanH, ReLU, and Softplus. We also included a comparison with ReLU networks augmented with positional encoding (ReLU P.E.), a recently proposed technique.

The results unequivocally demonstrate SIREN’s superiority. SIREN not only significantly outperformed all baseline architectures across various tasks but also exhibited considerably faster convergence. Notably, SIREN was the only architecture capable of accurately representing the gradients of the signal. This crucial ability unlocks the potential of SIRENs to effectively solve boundary value problems, a domain where other architectures fall short.

Representing Images with Unprecedented Detail

SIRENs can be effectively employed to parameterize images by mapping 2D pixel coordinates to color values. In this image representation task, SIRENs are directly trained using ground-truth pixel values. The results are striking: SIRENs achieve a PSNR (Peak Signal-to-Noise Ratio) that is 10 dB higher than baseline architectures, and they converge to a high-quality image representation in significantly fewer iterations. Crucially, SIREN is uniquely capable among MLPs to accurately represent not only the image itself but also its first- and second-order derivatives, opening new avenues for image processing and analysis.

High-Fidelity Audio Representation

The ability of SIRENs extends beyond visual data to the realm of audio signals. By configuring a SIREN with a single time-coordinate input and a scalar output, it can effectively parameterize audio signals. In audio reproduction experiments, SIREN emerged as the only network architecture capable of faithfully reproducing both music and human voice signals. This capability highlights SIREN’s potential for advanced audio processing and generation tasks.

Ground truth Audio Samples

ReLU MLP Audio Reconstruction Failure

ReLU with Positional Encoding Audio Reconstruction Failure

SIREN Audio Reconstruction Success

Video as a Continuous Function

Extending the concept further, SIRENs can represent video by incorporating pixel coordinates along with a time coordinate as input. When directly supervised with ground-truth pixel values, SIRENs demonstrate superior video parameterization compared to ReLU MLPs. This capability opens doors for efficient and high-quality video compression, generation, and analysis.

Solving the Poisson Equation with Derivative Supervision

A significant advantage of SIRENs lies in their ability to accurately represent derivatives. This characteristic allows them to solve partial differential equations (PDEs). By supervising only the derivatives of a SIREN, it becomes possible to solve the Poisson equation. Again, SIREN stands out as the only architecture that accurately and efficiently fits image, gradient, and Laplacian domains, demonstrating its robustness and suitability for solving complex PDEs.

Representing Shapes by Solving the Eikonal Equation

SIRENs excel in solving boundary value problems, as exemplified by their application to the Eikonal equation. By solving this first-order boundary value problem, SIRENs can recover Signed Distance Functions (SDFs) from point clouds and surface normals. Remarkably, SIRENs can reconstruct room-scale scenes with fine details from point clouds and surface normals in under an hour of training. This is achieved using a single, 5-layer neural network, without relying on 2D or 3D convolutions, and with significantly fewer parameters compared to methods combining voxel grids with neural implicit representations. The key to SIREN’s success in this challenging task is its well-behaved gradients, which are essential for gradient-domain supervision required by the Eikonal equation. Architectures lacking this gradient fidelity perform considerably worse.

Room Scene Reconstruction – SIREN

Room Scene Reconstruction – ReLU

Statue Reconstruction – SIREN

Statue Reconstruction – ReLU Positional Encoding

Statue Reconstruction – ReLU

Solving the Helmholtz Equation

The capability of SIRENs extends to solving the inhomogeneous Helmholtz equation, another challenging PDE. In this domain, ReLU- and Tanh-based architectures completely fail to converge to a solution, further highlighting the unique advantages of SIRENs in solving complex scientific computing problems.

Solving the Wave Equation in the Time Domain

In the time domain, SIRENs demonstrate their ability to solve the wave equation, a fundamental equation in physics and engineering. In contrast, a Tanh-based architecture fails to discover the correct solution, underscoring the superior capabilities of SIRENs in capturing the dynamics of wave phenomena.

Related Projects Expanding on Implicit Neural Representations

The development of SIRENs is part of a broader effort to explore and advance the field of implicit neural representations. Related projects delve into various aspects and applications of this technology:

MetaSDF: Meta-learning Signed Distance Functions

This project explores the connection between generalization in implicit neural representations and meta-learning. It proposes leveraging gradient-based meta-learning to learn priors over deep signed distance functions, resulting in significantly faster SDF reconstruction without compromising performance.

Scene Representation Networks: Continuous 3D-Structure-Aware Neural Scene Representations

Scene Representation Networks (SRNs) focus on creating continuous, 3D-structure-aware neural scene representations. SRNs encode both geometry and appearance and are trained solely on 2D images through a neural renderer, enabling 3D reconstruction from a single posed 2D image.

Inferring Semantic Information with 3D Neural Scene Representations

This research demonstrates the utility of features learned by neural implicit scene representations for downstream tasks such as semantic segmentation. It introduces a model capable of performing continuous 3D semantic segmentation on object classes, trained with only single 2D semantic label maps.

Access the Research Paper

For a deeper dive into the technical details and further insights, refer to the original research paper:

Citation

To cite this work, please use the following BibTeX entry:

@inproceedings{sitzmann2019siren,
 author = {Sitzmann, Vincent and Martel, Julien N.P. and Bergman, Alexander W. and Lindell, David B. and Wetzstein, Gordon},
 title = {Implicit Neural Representations with Periodic Activation Functions},
 booktitle = {Proc. NeurIPS},
 year={2020}
}

In conclusion, Sinusoidal Representation Networks (SIRENs) represent a significant advancement in implicit neural representations. Their unique ability to accurately model complex signals and their derivatives, coupled with their superior performance across a range of tasks from image and audio representation to solving challenging partial differential equations, positions them as a powerful tool for various applications in deep learning and scientific computing. The ongoing research and related projects further highlight the exciting potential of implicit neural representations and SIRENs in shaping the future of signal processing and beyond.