**Our project** aims to develop mathematical theory, improved algorithms, and open source software for 3-D molecular structure determination using cryo-EM.

Our methods use a combination of tools from different areas of mathematics, statistics, and computer science, such as computerized tomography, optimization (convex and non-convex), random matrix theory, signal and image processing, linear and nonlinear dimension reduction, randomized algorithms in numerical linear algebra, and representation theory.

**Software:**The ASPIRE software was originally developed in Matlab. Most of the Matlab code has been ported to Python. Moving forward, the official version is the Python one.

**Acknowledgement:** This project would not have been possible without the support of Award Numbers R01GM090200 and R01GM136780 from the National Institute of General Medical Sciences (NIGMS), the Simons Foundation, and the Moore Foundation. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIGMS or the NIH.

**What is cryo-EM and SPR?** "Three dimensional electron microscopy" is the name commonly given to methods in which the 3-D structures of macromolecular complexes are obtained from sets of 2-D projection images taken in an electron microscope. The most widespread and general of these methods is single-particle reconstruction (SPR). In SPR, the 3-D structure is determined from images of randomly oriented and positioned, ideally identical macromolecular "particles". The SPR method has been applied to images of negatively stained specimens, and to images obtained from frozen-hydrated, unstained specimens. In the latter technique, called cryo-EM, the sample of macromolecules is rapidly frozen in a thin (~100 nm) layer of vitreous ice, and maintained at liquid nitrogen temperature throughout the imaging process. SPR from cryo-EM images is an entirely general imaging method that does not require crystallization, and can capture molecules in their native states. Recent technological advancements have revolutionized the cryo-EM field, enabling near atomic resolution reconstructions. Furthermore, cryo-EM has the potential to analyze compositionally and conformationally heterogeneous mixtures and, consequently, can be used to determine the structures of complexes in different functional states. Cryo-EM was selected by the journal Nature Methods as Method of the Year 2015 for its newfound ability to solve protein structures at near-atomic resolution. The Nobel Prize in Chemistry 2017 was awarded to Jacques Dubochet, Joachim Frank and Richard Henderson "for developing cryo-electron microscopy for the high-resolution structure determination of biomolecules in solution".

**What is the mathematical problem of SPR using cryo-EM?** The cryo-EM reconstruction problem is to find the three-dimensional structure of a molecule given samples of its two-dimensional projection images at unknown random directions. The intensity of pixels in a given projection image corresponds to line integrals of the electric potential created by the molecule along the path of the imaging electrons. The highly intense electron beam destroys the molecule and it is therefore impractical to take projection images of the same molecule at known different directions as in the case of classical computerized tomography. In other words, a single molecule can be imaged only once. All molecules are assumed to have the exact same structure; they differ only by their spatial orientation and position. Thus, every image is a projection of the same molecule but an unknown random orientation. The cryo-EM problem is an inverse problem stated as follows: find the 3D electric potential given a collection of 2D noisy projection images whose orientations (and positions) are unknown.

**There are many software packages for 3D Electron Microscopy. What makes this one different?** Indeed, there are many excellent software packages for structure determination by cryo-EM. In the heart of these packages is their iterative refinement procedures for high resolution reconstruction. Substantial effort and investment over the years has led to significant improvements of iterative refinement methods which have become quite matured. For that reason, from the very beginning of the ASPIRE project we focused on computational methods other than 3-D iterative refinement. ASPIRE provides unique algorithmic solutions to other important computational challenges of the cryo-EM data processing pipeline, including: 3-D ab-initio modelling, 2-D classification and averaging, 3-D heterogeneity analysis, and particle picking.

**What are the highlight features of this software toolbox?** We believe that experienced clients of current 2D/3D image analysis toolboxes would appreciate the following main features of our toolbox:

1. 3-D structural variability analysis : Fast and accurate estimation of the covariance matrix of the 3D molecular structures from their 2D tomographic noisy images and tools for clustering and non-linear dimensionality reduction for continuous variability. Our method is based on a new technique for PCA from noisy linearly reduced measurements, and diffusion maps for continuous variability. Currently available as a standalone Github module.

2. Particle Picking: APPLE-Picker is a template-free and training-free, fast, and accurate computational framework for automatic particle picking, available a standalone package in both Python and Matlab.

3. Steerable PCA and image restoration: Fast and accurate Principal Component Analysis for computing the eigen-images and eigenvalues of a set of 2D raw images and their in-plane rotations. Those who use MSA (Multivariate Statistical Analysis) for compression and de-noising of their raw 2D images should definitely try our steerable PCA algorithm as a viable alternative.

4. 2-D classification and averaging: An algorithmic pipeline that finds for every 2D raw image in the data set its nearest neighbors in terms of similar viewing directions. Our class averaging procedure is fast (nearly linear running time in the number of images) and succeeds at remarkably low levels of signal-to-noise ratio. The algorithm uses steerable PCA, Wiener filtering, rotational invariant representation of 2D images using bispectrum, fast randomized SVD, fast randomized nearest neighbors search, fast 2D alignment of images, and dimension reduction using vector diffusion maps. Try our class averaging procedure as an alternative to the MSA classification algorithm or the reference-free alignment procedure.

5. Angular Reconstitution from Common Lines: The Fourier Projection Slice Theorem implies that the orientations of three projection images can be determined from their common lines, which is the foundation of the angular reconstitution method (van Heel 1987; Vainshtein and Goncharov 1986). Our toolbox provide several common-line based algorithms that utilize the information in the common lines between all pairs of images simultaneously, leading to an assignment of orientations that is as consistent as possible with the common lines. Our approaches use convex optimization and semidefinite relaxation, spectral methods and Bayesian approaches, and work for both uniform and non-uniform viewing angle distributions.

**Future Extensions:** We plan to add more functionality to our toolbox in the future. Here is a short list of additional capabilities that we are planning to include:

1. Continuous heterogeneity anaysis

2. Kam's method for ab-initio modelling

3. 3-D Reconstruction without particle picking (as proposed here)