Examples for using gblearn

We begin with the construction of the LER matrix. It represents each grain boundary as a feature vector whose components are the relative fractions of each unique Local Atomic Environment (LAE) in the entire grain boundary system.

We begin by creating a GrainBoundaryCollection that holds a representation of each GrainBoundary in the collection and manages the calculation and storage of the various representations that can be derived using soap. For this code, we assume that all the Olmsted [1]_ dump files from LAMMPS are in /dbs/olmsted. We tell the framework to store all representations in the /gbs/olmsted folder. Notice that we give a regular expression that matches the file names of the LAMMPS dump files. We use a named capture group to grab out the publication integer id from the file name. Each grain boundary will be referred to by that id for the rest of the analysis.

from gblearn.gb import GrainBoundaryCollection as GBC
olmsted = GBC("olmsted", "/dbs/olmsted", "/gbs/olmsted",
              rcut=3.25, lmax=12, nmax=12, sigma=0.5)

You will also notice that we specify the soap parameters as part of this constructor. Now that we have the collection, we can calculate the SOAP matrices for each grain boundary.

with olmsted.P["1"] as P:

Because grain boundary databases can get quite large, and SOAP matrices can also get quite large, gblearn implements memory-sensitive storage for the SOAP matrices. It does this using context managers so that a SOAP matrix is read from disk and then cleared from memory once it falls out of context. The with construct shown here will load the file from disk, print it, and then remove it from memory.


If you are only after the Local Environment Representation, you won’t have to worry about accessing memory-safe SOAP matrices.

Once the SOAP matrices have been calculated, we can grab the Averaged SOAP Representation (ASR) via a property:


Whenever any of the representations is accessed, it is calculated in the background and then cached to disk automatically. To avoid caching, specify None for the path to storage folder in the GrainBoundaryCollection constructor. Subsequent requests for the same representation will be served from memory/disk cache for optimization.

Constructing the LER requires a similarity parameter eps that is the cutoff for deciding when two atomic environments are similar. It is related to the S() similarity metric between SOAP vectors.

eps = 0.0025
LER = olmsted.LER(eps)

When you run this code, will see several progress bars as the code runs over the grain boundary collection in the background.

  1. All grain boundaries are iterated to determine a set of unique environments for the entire collection.
  2. The collection is iterated over again so that every atom in each grain boundary can be classified with the unique LAE that it is most similar to.
  3. The fraction of each type of unique LAE is computed for each grain boundary to form the LER vectors.

Summary: LER Construction

In summary, you can generate the LER for new collection of grain boundaries using:

from gblearn.gb import GrainBoundaryCollection as GBC
olmsted = GBC("olmsted", "/dbs/olmsted", "/gbs/olmsted",
              rcut=3.25, lmax=12, nmax=12, sigma=0.5)