Our project aims to develop mathematical theory and improved algorithms for 3D molecular structure determination using cryo-EM. Our methods use a combination of tools from tomography, convex optimization and semidefinite programming, random matrix theory, signal and image processing, statistics, machine learning, nonlinear dimensionality reduction, randomized algorithms in numerical linear algebra, and representation theory.
Acknowledgement: This project would not have been possible without the support of Award Number R01GM090200 from the National Institute of General Medical Sciences (NIGMS). The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIGMS or the NIH.
What is cryo-EM and SPR? "Three dimensional electron microscopy" is the name commonly given to methods in which the 3D structures of macromolecular complexes are obtained from sets of 2D projection images taken in an electron microscope. The most widespread and general of these methods is single-particle reconstruction (SPR). In SPR, the 3D structure is determined from images of randomly oriented and positioned, identical macromolecular "particles'', typically complexes 500 kDa or larger in size. The SPR method has been applied to images of negatively stained specimens, and to images obtained from frozen-hydrated, unstained specimens. In the latter technique, called cryo-EM, the sample of macromolecules is rapidly frozen in a thin (~100 nm) layer of vitreous ice, and maintained at liquid nitrogen temperature throughout the imaging process. SPR from cryo-EM images is of particular interest because it promises to be an entirely general technique. It does not require crystallization or other special preparation of the complexes to be imaged, and is beginning to reach sufficient resolution (~0.4 nm) to allow the polypeptide chain to be traced and residues identified in protein molecules. Even at the resolutions of 0.9-0.6 nm, many important features of protein molecules can be determined.
What is the mathematical problem of SPR using cryo-EM? The cryo-EM reconstruction problem is to find the three-dimensional structure of a molecule given samples of its two-dimensional projection images at unknown random directions. The intensity of pixels in a given projection image corresponds to line integrals of the electric potential induced by the molecule along the path of the imaging electrons. The highly intense electron beam destroys the molecule and it is therefore impractical to take projection images of the same molecule at known different directions as in the case of classical computerized tomography. In other words, a single molecule can be imaged only once. All molecules are assumed to have the exact same structure; they differ only by their spatial orientation and position. Thus, every image is a projection of the same molecule but an unknown random orientation. The cryo-EM problem is an inverse problem stated as follows: find the 3D electric potential given a collection of 2D noisy projection images whose orientations (and positions) are unknown.
There are many software packages for 3D Electron Microscopy. What makes this one different? Indeed, there are many excellent software packages for molecular structure determination from cryo-EM images. Most existing algorithms for SPR use an iterative refinement procedure. As the starting point for the refinement process, however, some sort of ab initio estimate of the 3D structure must be made. Our package provides tools for estimating an initial model directly from the cryo-EM images, in a completely data-driven manner that does make any assumption about the molecular structure or the underlying distribution of the viewing directions.
What are the highlight features of this software toolbox? We believe that experienced clients of current 2D/3D image analysis toolboxes would appreciate the following main features of our toolbox:
- Steerable PCA: Fast and accurate Principal Component Analysis for computing the eigen-images and eigenvalues of a set of 2D raw images and their in-plane rotations. Those who use MSA (Multivariate Statistical Analysis) for compression and de-noising of their raw 2D images should definitely try our Fourier-Bessel Steerable PCA algorithm as a viable alternative.
- Class Averaging: A new algorithmic pipeline that finds for every 2D raw image in the data set its nearest neighbors in terms of similar viewing directions. Our class averaging procedure is fast (nearly linear running time in the number of images) and succeeds at remarkably low levels of signal-to-noise ratio. The algorithm uses steerable PCA, Wiener filtering, rotational invariant representation of 2D images using bispectrum, fast randomized SVD, fast randomized nearest neighbors search, fast 2D alignment of images, and dimension reduction using vector diffusion maps. Try our class averaging procedure as an alternative to the MSA classification algorithm or the reference-free alignment procedure.
- Angular Reconstitution from Common Lines: The Fourier Projection Slice Theorem implies that the orientations of three projection images can be determined from their common lines, which is the foundation of the angular reconstitution method (van Heel 1987; Vainshtein and Goncharov 1986). Our toolbox provide several common-line based algorithms that utilize the information in the common lines between all pairs of images simultaneously, leading to an assignment of orientations that is as consistent as possible with the common lines. Our approaches use convex optimization and semidefinite relaxation, spectral methods and Bayesian approaches, and work for both uniform and non-uniform viewing angle distributions.
Future Extensions: We plan to add more functionality to our toolbox in the future. Here is a short list of additional capabilities that we are planning to include:
- Classification algorithm for heterogeneous datasets.
- Motion correction algorithms to deal with beam induced motion.
- Further improvements to our current ab-initio modeling algorithms.