gblearn.decomposition¶
Methods for decomposing the SOAP vectors into radial and angular components for analysis.
Functions
fcut (r, rcut, trans_width) |
Applies the cutoff function to the coefficients so that the function will go smoothly to zero at the cutoff. |
pissnnl (pissnnl, nspecies, vcutoff[, nmax, lmax]) |
Returns the geometric information for any entries of a pissnnl vector higher than the specified cutoff. |
Classes
ADFCollection ([dfs, counts]) |
Represents a collection of Angular Distrubition Functions. |
DF (dP, catom, x, decomposer[, radial, calculate]) |
Represents the Radial or Angluar Distribution Function of a SOAP vector. |
DFCollection ([dfs, counts]) |
Represents a collection of Distrubition Functions. |
RDFCollection ([dfs, counts]) |
Represents a radial distribution function collection. |
SOAPDecomposer ([nspecies, lmax, nmax, rcut, …]) |
Implements decompsition of vectors in the SOAP space allowing for differing lmax and nmax values in a single session. |
SOAPVector (P, decomposer) |
Wrapper for a SOAP vector that facilitates decomposition, plotting and analysis in the SOAP space. |
SOAPVectorCollection ([Pij, decomposer, …]) |
Represents a collection of SOAP vectors that are logically combined (for example the local environments at a grain boundary are all part of the grain boundary). |
API Documentation¶
Methods for decomposing the SOAP vectors into radial and angular components for analysis.
-
class
gblearn.decomposition.
ADFCollection
(dfs=None, counts=None)[source] Represents a collection of Angular Distrubition Functions. See also the comments for
DFCollection
.-
static
from_soap
(collection, resolution=75, catom=False)[source] Creates an ADF collection from the specified SOAPVector collection using the \(l>0\) components.
Parameters:
-
static
-
class
gblearn.decomposition.
DF
(dP, catom, x, decomposer, radial=True, calculate=True)[source] Represents the Radial or Angluar Distribution Function of a SOAP vector.
Parameters: - dP (list) – of tuples; result of calling
gblearn.decomposition.pissnnl()
on the SOAP vector with \(l=0\) or \(l>0\) components set to zero. - catom (bool) – when True, this DF is for the central atom only (i.e., \(l=0\) components only).
- x (numpy.ndarray) – domain in radial/angular space on which to sample the function.
- decomposer (SOAPDecomposer) – instance used to decompose the SOAP vector; has configuration information for SOAP parameters and caches for rapid evaluation of the basis functions.
- calculate (bool) – when True, the distribution function will be calculated upon initialization.
-
df
¶ numpy.ndarray – distribution function values corresponding to
x
.
-
dtype
= None str – since this could be a radial or and angular DF, the type of the distribution function.
-
static
from_dict
(data, decomposer=None, x=None)[source] Restores a DF from a serialized dict (i.e., one returned by
serialize()
).Parameters: - data (dict) – result of calling
serialize()
. - decomposer (SOAPDecomposer) – instance used to decompose the SOAP vector; has configuration information for SOAP parameters and caches for rapid evaluation of the basis functions.
- x (numpy.ndarray) – if this DF is constructed from a parent collection, the common domain sample vector to use.
- data (dict) – result of calling
-
static
from_file
(filename, decomposer=None, x=None)[source] Restores a DF from file.
Parameters: - filename (str) – path to the file that was created by
DF.save()
. - decomposer (SOAPDecomposer) – instance used to decompose the SOAP vector; has configuration information for SOAP parameters and caches for rapid evaluation of the basis functions.
- x (numpy.ndarray) – common domain values shared by a collection; used if the file doesn’t explicitly specify the domain values.
- filename (str) – path to the file that was created by
-
norm
Calculates the L2 norm of the distribution function values as if they were a vector in some space.
-
plot
(ax=None, savefile=None, shells=None, opacity=1.0, color=’b’, title=None, xlabel=None, ylabel=None)[source] Plots the DF.
Parameters: - ax (matplotlib.axes.Axes) – axes to plot on. If not specified, a new figure and new axes will be created. If axes are specifed, no x or y labels or plot titles will be added.
- savefile (str) – path to a file if the figure should be saved.
- shells (list) – of float values for neighbor shells to be added as vertical lines to the plot.
- opacity (float) – transparency to use for the DF plot.
- color – any color option received by
matplotlib.pyplot.plot()
. - title (str) – override the default title of the plot.
- xlabel (str) – override the default (generic, not very helpful) label for the x-axis.
- ylabel (str) – override the default label for the y-axis.
-
same
(other, epsilon_=None)[source] Tests whether the specifed DF is similar to this one.
Parameters: - other (DF) – DF to compare similarity to.
- epsilon (float) – override the global (default) value
epsilon
for determining if the DFs are similar.
Returns: True when both of the DFs are identical within tolerance.
Return type:
-
serialize
(withdecomp=True, commonx=None, withdP=True)[source] Returns a serializable dictionary that represents this DF.
Parameters: - withdecomp (bool) – when True, include the parameters of the SOAP decomposer in the dict.
- commonx (numpy.ndarray) – xf this DF is part of a collection, the domain values will be the same for every DF; in that case we don’t need to include the domain in the serialization. This is the parent collections common radial vector. If the DFs is the same, it won’t be serialized. If unspecified, the domain values are serialized.
- withdP (bool) – when True, the large, decomposed SOAP vector values are serialized. Necessary to completely reconstruct the representation, but not necessary for most analysis.
- dP (list) – of tuples; result of calling
-
class
gblearn.decomposition.
DFCollection
(dfs=None, counts=None)[source] Represents a collection of Distrubition Functions. A DF can only be added once to a collection (object instance comparison). Two identical DFs (in the physical sense) can belong to the same collection.
Parameters: - dfs (list) – of
DF
to intialize the collection with. - counts (list) – of int specifying how many DFs of each type are in the collection. For example, self.counts[3] says that we have three DFs in the collection that are identical to the DF in self.dfs[3], though we only keep one of them.
-
dfs
¶ list – of
DF
to intialize the collection with.
-
counts
¶ list – of int specifying how many DFs of each type are in the collection. For example, self.counts[3] says that we have three DFs in the collection that are identical to the DF in self.dfs[3], though we only keep one of them.
-
label
¶ str – user-defined label for the collection.
dict – user-defined tags to identify this collection in a larger analysis.
-
add
(df)[source] Add a new DF to the collection.
Parameters: df (DF) – instance to add; if it is already in the collection, nothing will happen.
-
average
Returns the averaged radial distribution function for the collection.
-
static
dfs_from_soap
(collection, ctype, resolution=75, catom=False)[source] Creates a DF collection from the specified SOAPVector collection using the \(l>0\) components.
Parameters: - collection (SOAPVectorCollection) – collection of SOAP vectors to initialize from.
- ctype (str) – one of [‘RDF’, ‘ADF’], specifies the kind of collection to initialize.
- resolution (int) – number of points to sample in the radial domain.
- catom (bool) – when True, the DFs are constructed using the density of the central atom.
-
static
from_dict
(data)[source] Restores a DFCollection from a serialized dict (i.e., one returned by
serialize()
).Parameters: data (dict) – result of calling serialize()
.
-
static
from_file
(filename)[source] Restores a DFCollection from file.
Parameters: filename (str) – path to the file that was created by DF.save()
.
-
histogram
(epsilon_=None, savefile=None, **kwargs)[source] Determines how many of each kind of DF are in this collection (using the default comparison). Plots the histogram of unique values. This is equivalent to calling
DFCollection.unique.histogram()
.Parameters: Returns: - of the unique values; same result as
DFCollection.unique()
.
Return type: DFCollection
-
plot
(ax=None, savefile=None, shells=None, color=’b’, title=None, xlabel=None, ylabel=None, withavg=False)[source] Plots the collection of DFs on the same axes. Opacity of the line is chosen based on how many DFs in the collection are identical to that one (within tolerance). This is accomplished by calling
DFCollection.histogram()
.Parameters: - ax (matplotlib.axes.Axes) – axes to plot on. If not specified, a new figure and new axes will be created.
- savefile (str) – path to a file if the figure should be saved.
- shells (list) – of float values for neighbor shells to be added as vertical lines to the plot.
- color – any color option received by
matplotlib.pyplot.plot()
. - title (str) – override the default title of the plot.
- xlabel (str) – override the default (generic, not very helpful) label for the x-axis.
- ylabel (str) – override the default label for the y-axis.
- withavg (bool) – when True, the average DF for the whole collection is also plotted.
-
project
(other, epsilon_=None)[source] Projects the RDFs in self into the unique RDFs of other.
Returns: ( numpy.ndarray
projection, list exceptions); projection is the number of each kind of RDF in other with sum(result) == len(self); exceptions are indices in self for which there was no similarDF
in other.Return type: tuple
-
refine
(other, epsilon_=None)[source] Refines this DFCollection to include all DFs in other that are not already in self. Also reduces existing DFs in self to be unique.
Parameters: other (DFCollection) – another collection to augment this one with.
-
remove
(df)[source] Remove an DF from the collection.
Parameters: rdf (DF) – instance to remove; if it isn’t already in the collection, nothing will happen.
-
save
(target, withdP=False)[source] Saves the DF to disk.
Parameters:
- dfs (list) – of
-
class
gblearn.decomposition.
RDFCollection
(dfs=None, counts=None)[source] Represents a radial distribution function collection. See also the comments for
DFCollection
.-
static
from_soap
(collection, resolution=75, catom=False)[source] Creates an RDF collection from the specified SOAPVector collection using the \(l>0\) components.
Parameters:
-
static
-
class
gblearn.decomposition.
SOAPDecomposer
(nspecies=1, lmax=12, nmax=12, rcut=6.0, sigma=0.5, trans_width=0.5)[source] Implements decompsition of vectors in the SOAP space allowing for differing lmax and nmax values in a single session. Uses caching to optimize the decomposition.
Parameters: - nspecies (int) – number of species for the SOAP descriptor.
- nmax (int) – bandwidth limits for the SOAP descriptor radial basis functions.
- lmax (int) – bandwidth limits for the SOAP descriptor spherical harmonics.
- rcut (float) – local environment finite cutoff parameter.
- sigma (float) – width parameter for the Gaussians on each atom.
- trans_width (float) – distance over which the coefficients in the radial functions are smoothly transitioned to zero.
-
alpha
¶ float – constant affecting the width of the Gaussians. Inversely proportional to
sigma
.
-
rb
¶ numpy.ndarray – positions of radial basis functions in the radial space.
-
partitions
¶ dict – keys are tuple of (l values, inverse); values are the corresponding indices that form the partition.
-
fcrs
¶ dict – keys are float values for radius r; values are
fcut()
evaluated at r.
-
rbexp
¶ list – of float values; the n-dependent exponential damping factor for radial basis function values.
-
rbsph
¶ dict – keys are tuple (n,l,r) where n and l are the integer radial and spherical basis function indices and r is the float value at which the spherical bessel function was evaluated.
-
cRs
¶ dict – keys are float values of radius r; values are the corresponding coefficient matrices returned by
_c_numeric()
.
-
aRs
¶ dict – keys are tuple (n,l,r) where n and l are the integer radial and spherical basis function indices and r is the float value that the radial functions were evaluated at; values are the corresponding coefficients of the radial basis functions.
-
transformbasis
¶ numpy.ndarray – transformation matrix needed to account for radial basis function positions when calculating the radial function coefficients using linear algebra.
-
ADF
(dP, ax)[source] Calculates the angular distribution function for the given SOAP vector decomposition.
Parameters: - dp (tuple) – (int index in P, float cnlm,
(int ni, int si), (int nj, int sj), int l). These tuples
are calculated in bulk for a P vector using
gblearn.decomposition.pissnnl()
. - ax (numpy.ndarray) – linear space of polar angle values to evaluate at.
- dp (tuple) – (int index in P, float cnlm,
(int ni, int si), (int nj, int sj), int l). These tuples
are calculated in bulk for a P vector using
-
RDF
(dP, rx, fast=True)[source] Computes the summed radial distribution function for the specified decomposed P vector.
Parameters: - dp (tuple) – (int index in P, float cnlm,
(int ni, int si), (int nj, int sj), int l). These tuples
are calculated in bulk for a P vector using
gblearn.decomposition.pissnnl()
. - rx (numpy.ndarray) – linear space of values to evaluate at.
- dp (tuple) – (int index in P, float cnlm,
(int ni, int si), (int nj, int sj), int l). These tuples
are calculated in bulk for a P vector using
-
apnl
(dp, rx, cnum=False, fast=True)[source] Returns the analytically/numerically derived coefficient for the specified, decomposed pissnnl.
Parameters: - dp (tuple) – (int index in P, float cnlm,
(int ni, int si), (int nj, int sj), int l). These tuples
are calculated in bulk for a pissnnl vector using
gblearn.decomposition.pissnnl()
. - rx (numpy.ndarray) – linear space of values to evaluate at.
- cnum (bool) – when True, the function is evaluated numerically
using linear algebra instead of the built-in numpy
functions. See also
cnl()
. - fast (bool) – when True, caching is used to speed up evaluation of the basis functions between multiple pissnnl vectors.
- dp (tuple) – (int index in P, float cnlm,
(int ni, int si), (int nj, int sj), int l). These tuples
are calculated in bulk for a pissnnl vector using
-
cnl
(n, l, r, cnum=False, fast=True)[source] Returns the coefficient of the radial basis function \(g_n(r)\).
Parameters: - n (int) – index of the radial basis function.
- l (int) – index of the spherical harmonic associated with the radial function.
- r (float) – value at which to evaluate the basis function \(g_n(r)\).
- cnum (bool) – when True, the function is evaluated numerically using linear algebra instead of the built-in numpy functions; can be faster, but the results can have numerical noise issues.
- fast (bool) – when True, caching is used to speed up evaluation of the basis functions between multiple pissnnl vectors.
-
decompose
(P, vcutoff=1e-05)[source] Decomposes the specified SOAP vector into its radial and angular contributions using
gblearn.decomposition.pissnnl()
.Parameters: - P (numpy.ndarray) – vector representing the projection of a local atomic environment into the SOAP space.
- vcutoff (float) – values higher than this will be included in the RDF.
-
get_params
()[source] Returns a dictionary of the constructor parameters needed to re-initialize this decomposer instance.
-
partition
(lfilter, inverse=False)[source] Returns a list of indices for a P vector that correspond to the specified l values.
Parameters: - lfilter (list) – of l values to include in the resulting indices.
- inverse (bool) – when True, then any l values not in lfilter are included instead of the inverse.
Returns: of int indices that have the specified l values.
Return type: (numpy.ndarray)
-
class
gblearn.decomposition.
SOAPVector
(P, decomposer)[source] Wrapper for a SOAP vector that facilitates decomposition, plotting and analysis in the SOAP space.
Parameters: - P (numpy.ndarray) – vector representing the projection of a local atomic environment into the SOAP space.
- decomposer (SOAPDecomposer) – instance used to decompose the SOAP vector; has configuration information for SOAP parameters and caches for rapid evaluation of the basis functions.
-
dnP
¶ list – of tuples; result of calling
gblearn.decomposition.pissnnl()
on the SOAP vector with l=0 components set to zero.
-
dcP
¶ list – of tuples; result of calling
gblearn.decomposition.pissnnl()
on the SOAP vector with l>0 components set to zero.
-
nP
¶ numpy.ndarray – a copy of P with l=0 components set to zero.
-
cP
¶ numpy.ndarray – a copy of P with l>0 components set to zero.
-
cRDF
¶ DF – radial distribution function for the central atom.
-
nRDF
¶ DF – radial distribution function for all the neighbors of the central atom.
-
cADF
¶ DF – angular distribution function for the central atom.
-
nADF
¶ DF – angular distribution function for all the neighbors of the central atom.
-
ADF
(ax, catom=False)[source] Returns the angular distribution function for the SOAP vector.
Parameters: - ax (numpy.ndarray) – domain in the angular space to evaluate the ADF at.
- catom (bool) – when True, the ADF of the central atom is returned; otherwise, the central atom is ignored and the neighbor ADF is returned.
-
RDF
(rx, catom=False)[source] Returns the total radial distribution function for the SOAP vector.
Parameters: - rx (numpy.ndarray) – domain in the radial space to evaluate the RDF at.
- catom (bool) – when True, the RDF of the central atom is returned; otherwise, the central atom is ignored and the neighbor RDF is returned.
-
ax
= None numpy.ndarray – cached value of the angular values that we evaluated the ADF at; useful if the ADF is requested repeatedly for the same theta values.
-
cADF
= None numpy.ndarray – cached values of the ADF for the cached value in
ax
with \(l=0\).
-
cRDF
= None numpy.ndarray – cached values of the RDF for the cached value in
rx
with \(l=0\).
-
equal
(other)[source] Performs a more thorough equality check by also comparing the SOAP decomposers and RDFs.
-
static
from_dict
(data, decomposer_=None, rx=None, ax=None)[source] Restores a SOAP vector from a serialized dict (i.e., one returned by
serialize()
).Parameters: - data (dict) – result of calling
serialize()
. - decomposer (SOAPDecomposer) – if this serialization was part of a higher level collection, then the decomposer would not have been included at the item level. For reconstruction, it is passed in by the collection.
- rx (numpy.ndarray) – as for decomposer, but with the values in the domain.
- ax (numpy.ndarray) – as for decomposer, but with the angular values in the domain.
- data (dict) – result of calling
-
static
from_element
(element, index=0, **kwargs)[source] Returns the SOAP vector for a pure element.
Parameters: - element (str) – element name. Must be one of [“Ni”, “Cr”, “Mg”].
- index (int) – for multi-atom bases, which atom to represent in the elemental crystal.
- nmax (int) – bandwidth limits for the SOAP descriptor radial basis functions.
- lmax (int) – bandwidth limits for the SOAP descriptor spherical harmonics.
- rcut (float) – local environment finite cutoff parameter.
- sigma (float) – width parameter for the Gaussians on each atom.
- trans_width (float) – distance over which the coefficients in the radial functions are smoothly transitioned to zero.
-
static
from_file
(filename, decomposer=None, rx=None, ax=None)[source] Restores a SOAP vector from file.
Parameters: - filename (str) – path to the file that was created by
SOAPVector.save()
. - decomposer (SOAPDecomposer) – if this serialization was part of a higher level collection, then the decomposer would not have been included at the item level. For reconstruction, it is passed in by the collection.
- rx (numpy.ndarray) – as for decomposer, but with the values in the domain.
- ax (numpy.ndarray) – as for decomposer, but with the angular values in the domain.
- filename (str) – path to the file that was created by
-
nADF
= None numpy.ndarray – cached values of the ADF for the cached value in
ax
with \(l>0\).
-
nRDF
= None numpy.ndarray – cached values of the RDF for the cached value in
rx
with \(l>0\).
-
rx
= None numpy.ndarray – cached value of the radial values that we evaluated the RDF at; useful if the RDF is requested repeatedly for the same r values.
-
save
(target)[source] Saves the SOAP vector to disk so that RDF and decomposition operations don’t have to be re-executed later.
Parameters: target (str) – path to save the vector to.
-
serialize
(withdecomp=True, commonrx=None, withdP=True, commonax=None)[source] Returns a serializable dictionary that represents this SOAP Vector.
Parameters: - withdecomp (bool) – when True, include the parameters of the SOAP decomposer in the dict.
- commonrx (numpy.ndarray) – xf this DF is part of a collection, the radial values will be the same for every DF; in that case we don’t need to include the domain in the serialization. This is the parent collections common radial vector. If the DFs is the same, it won’t be serialized. If unspecified, the domain values are serialized.
- withdP (bool) – when True, the large, decomposed SOAP vector values are serialized. Necessary to completely reconstruct the representation, but not necessary for most analysis.
- commonax (numpy.ndarray) – same as for commonrx but for the angular domain.
-
class
gblearn.decomposition.
SOAPVectorCollection
(Pij=None, decomposer=None, calculate=True, **kwargs)[source] Represents a collection of SOAP vectors that are logically combined (for example the local environments at a grain boundary are all part of the grain boundary). The collection allows aggregate quantities to be calculated more easily.
Parameters: - Pij (numpy.ndarray) – a matrix of SOAP vectors, where each row is a vector.
- decomposer (SOAPDecomposer) – instance used to decompose the SOAP vectors; has configuration information for SOAP parameters and caches for rapid evaluation of the basis functions.
- calculate (bool) – when True, the SOAPVectors are calculated based on P; otherwise, they are left as an empty list.
- kwargs (dict) – arguments that can be passed to
SOAPDecomposer
constuctor.
Examples
Create a collection of SOAP vectors and then collapse them into a set of unique ones (an instance of
RDFCollection
).>>> from gblearn.decomposition import SOAPVectorCollection as SVC >>> from numpy import load >>> P = load("pissnnl.npy") >>> svc = SVC(P, lmax=18, nmax=18) >>> urdfs = svc.unique(rdf=True)
-
ADFs
(resolution=100, catom=False)[source] Returns the set of unique angular distribution functions for this collection.
Parameters:
-
RDFs
(resolution=75, catom=False)[source] Returns the set of unique radial distribution functions for this collection.
Parameters:
-
equal
(other)[source] Performs a more rigorous equality test for two collections by also comparing the radial and angular distribution functions.
-
static
from_dict
(data)[source] Restores a SOAPVectorCollection from a serialized dict (i.e., one returned by
serialize()
).Parameters: data (dict) – result of calling serialize()
.
-
static
from_file
(filename)[source] Restores a SOAPVectorCollection from file.
Parameters: filename (str) – path to the file that was created by SOAPVectorCollection.save()
.
-
save
(target, withdP=False)[source] Saves the DF to disk.
Parameters:
-
gblearn.decomposition.
epsilon
= 5e-05 float – finite precision comparison value for RDF norms.
-
gblearn.decomposition.
fcut
(r, rcut, trans_width)[source] Applies the cutoff function to the coefficients so that the function will go smoothly to zero at the cutoff.
Parameters: Returns: - value in [0., 1.] to multiply the coefficient by so
that all values fall within the specified cutoff.
Return type:
-
gblearn.decomposition.
pissnnl
(pissnnl, nspecies, vcutoff, nmax=12, lmax=12)[source] Returns the geometric information for any entries of a pissnnl vector higher than the specified cutoff.
Parameters: - pissnnl (numpy.ndarray) – vector to examine.
- nspecies (int) – number of species for the SOAP descriptor.
- vcutoff (float) – values higher than this will be returned with geometry information.
- nmax (int) – bandwidth limits for the SOAP descriptor radial basis functions.
- lmax (int) – bandwidth limits for the SOAP descriptor spherical harmonics.
Returns: - of tuples with (index, value, (n_i, s_i), (n_j, s_j), l), where n_i
is the n value of the radial basis function for species i, and s_i is the value of the species index.
Return type: list