Examples for using `gblearn` ============================ We begin with the construction of the LER matrix. It represents each grain boundary as a feature vector whose components are the relative fractions of each *unique* Local Atomic Environment (LAE) in the entire grain boundary system. We begin by creating a :class:`~gblearn.gb.GrainBoundaryCollection` that holds a representation of each :class:`~gblearn.gb.GrainBoundary` in the collection and manages the calculation and storage of the various representations that can be derived using :mod:`~gblearn.soap`. For this code, we assume that all the Olmsted [1]_ dump files from LAMMPS are in `/dbs/olmsted`. We tell the framework to store all representations in the `/gbs/olmsted` folder. Notice that we give a regular expression that matches the file names of the LAMMPS dump files. We use a named capture group to grab out the publication integer id from the file name. Each grain boundary will be referred to by that id for the rest of the analysis. .. code-block:: python from gblearn.gb import GrainBoundaryCollection as GBC olmsted = GBC("olmsted", "/dbs/olmsted", "/gbs/olmsted", r"ni.p(?P\d+).out", rcut=3.25, lmax=12, nmax=12, sigma=0.5) You will also notice that we specify the soap parameters as part of this constructor. Now that we have the collection, we can calculate the SOAP matrices for each grain boundary. .. code-block:: python olmsted.soap() with olmsted.P["1"] as P: print(P) Because grain boundary databases can get quite large, and SOAP matrices can *also* get quite large, `gblearn` implements memory-sensitive storage for the SOAP matrices. It does this using context managers so that a SOAP matrix is read from disk and then cleared from memory once it falls out of context. The `with` construct shown here will load the file from disk, print it, and then remove it from memory. .. note:: If you are only after the Local Environment Representation, you won't have to worry about accessing memory-safe SOAP matrices. Once the SOAP matrices have been calculated, we can grab the Averaged SOAP Representation (ASR) via a property: .. code-block:: python olmsted.ASR Whenever any of the representations is accessed, it is calculated in the background and then cached to disk automatically. To avoid caching, specify `None` for the path to storage folder in the :class:`~gblearn.gb.GrainBoundaryCollection` constructor. Subsequent requests for the same representation will be served from memory/disk cache for optimization. Constructing the LER requires a similarity parameter `eps` that is the cutoff for deciding when two atomic environments are similar. It is related to the :func:`~gblearn.soap.S` similarity metric between SOAP vectors. .. code-block:: python eps = 0.0025 LER = olmsted.LER(eps) When you run this code, will see several progress bars as the code runs over the grain boundary collection in the background. 1. All grain boundaries are iterated to determine a set of unique environments for the entire collection. 2. The collection is iterated over *again* so that every atom in each grain boundary can be classified with the unique LAE that it is *most* similar to. 3. The fraction of each type of unique LAE is computed for each grain boundary to form the LER vectors. Summary: LER Construction ------------------------- In summary, you can generate the LER for new collection of grain boundaries using: .. code-block:: python from gblearn.gb import GrainBoundaryCollection as GBC olmsted = GBC("olmsted", "/dbs/olmsted", "/gbs/olmsted", r"ni.p(?P\d+).out", rcut=3.25, lmax=12, nmax=12, sigma=0.5) olmsted.soap() olmsted.LER(0.0025)