gblearn.decomposition

Methods for decomposing the SOAP vectors into radial and angular components for analysis.

Functions

fcut(r, rcut, trans_width) Applies the cutoff function to the coefficients so that the function will go smoothly to zero at the cutoff.
pissnnl(pissnnl, nspecies, vcutoff[, nmax, lmax]) Returns the geometric information for any entries of a pissnnl vector higher than the specified cutoff.

Classes

ADFCollection([dfs, counts]) Represents a collection of Angular Distrubition Functions.
DF(dP, catom, x, decomposer[, radial, calculate]) Represents the Radial or Angluar Distribution Function of a SOAP vector.
DFCollection([dfs, counts]) Represents a collection of Distrubition Functions.
RDFCollection([dfs, counts]) Represents a radial distribution function collection.
SOAPDecomposer([nspecies, lmax, nmax, rcut, …]) Implements decompsition of vectors in the SOAP space allowing for differing lmax and nmax values in a single session.
SOAPVector(P, decomposer) Wrapper for a SOAP vector that facilitates decomposition, plotting and analysis in the SOAP space.
SOAPVectorCollection([Pij, decomposer, …]) Represents a collection of SOAP vectors that are logically combined (for example the local environments at a grain boundary are all part of the grain boundary).

API Documentation

Methods for decomposing the SOAP vectors into radial and angular components for analysis.

class gblearn.decomposition.ADFCollection(dfs=None, counts=None)[source]

Represents a collection of Angular Distrubition Functions. See also the comments for DFCollection.

static from_soap(collection, resolution=75, catom=False)[source]

Creates an ADF collection from the specified SOAPVector collection using the \(l>0\) components.

Parameters:
  • collection (SOAPVectorCollection) – collection of SOAP vectors to initialize from.
  • resolution (int) – number of points to sample in the angular domain.
  • catom (bool) – when True, the DFs are constructed using the density of the central atom.
class gblearn.decomposition.DF(dP, catom, x, decomposer, radial=True, calculate=True)[source]

Represents the Radial or Angluar Distribution Function of a SOAP vector.

Parameters:
  • dP (list) – of tuples; result of calling gblearn.decomposition.pissnnl() on the SOAP vector with \(l=0\) or \(l>0\) components set to zero.
  • catom (bool) – when True, this DF is for the central atom only (i.e., \(l=0\) components only).
  • x (numpy.ndarray) – domain in radial/angular space on which to sample the function.
  • decomposer (SOAPDecomposer) – instance used to decompose the SOAP vector; has configuration information for SOAP parameters and caches for rapid evaluation of the basis functions.
  • calculate (bool) – when True, the distribution function will be calculated upon initialization.
df

numpy.ndarray – distribution function values corresponding to x.

dtype = None

str – since this could be a radial or and angular DF, the type of the distribution function.

static from_dict(data, decomposer=None, x=None)[source]

Restores a DF from a serialized dict (i.e., one returned by serialize()).

Parameters:
  • data (dict) – result of calling serialize().
  • decomposer (SOAPDecomposer) – instance used to decompose the SOAP vector; has configuration information for SOAP parameters and caches for rapid evaluation of the basis functions.
  • x (numpy.ndarray) – if this DF is constructed from a parent collection, the common domain sample vector to use.
static from_file(filename, decomposer=None, x=None)[source]

Restores a DF from file.

Parameters:
  • filename (str) – path to the file that was created by DF.save().
  • decomposer (SOAPDecomposer) – instance used to decompose the SOAP vector; has configuration information for SOAP parameters and caches for rapid evaluation of the basis functions.
  • x (numpy.ndarray) – common domain values shared by a collection; used if the file doesn’t explicitly specify the domain values.
norm

Calculates the L2 norm of the distribution function values as if they were a vector in some space.

plot(ax=None, savefile=None, shells=None, opacity=1.0, color=’b’, title=None, xlabel=None, ylabel=None)[source]

Plots the DF.

Parameters:
  • ax (matplotlib.axes.Axes) – axes to plot on. If not specified, a new figure and new axes will be created. If axes are specifed, no x or y labels or plot titles will be added.
  • savefile (str) – path to a file if the figure should be saved.
  • shells (list) – of float values for neighbor shells to be added as vertical lines to the plot.
  • opacity (float) – transparency to use for the DF plot.
  • color – any color option received by matplotlib.pyplot.plot().
  • title (str) – override the default title of the plot.
  • xlabel (str) – override the default (generic, not very helpful) label for the x-axis.
  • ylabel (str) – override the default label for the y-axis.
same(other, epsilon_=None)[source]

Tests whether the specifed DF is similar to this one.

Parameters:
  • other (DF) – DF to compare similarity to.
  • epsilon (float) – override the global (default) value epsilon for determining if the DFs are similar.
Returns:

True when both of the DFs are identical within tolerance.

Return type:

bool

save(target)[source]

Saves the DF to disk.

Parameters:target (str) – path to save the vector to.
serialize(withdecomp=True, commonx=None, withdP=True)[source]

Returns a serializable dictionary that represents this DF.

Parameters:
  • withdecomp (bool) – when True, include the parameters of the SOAP decomposer in the dict.
  • commonx (numpy.ndarray) – xf this DF is part of a collection, the domain values will be the same for every DF; in that case we don’t need to include the domain in the serialization. This is the parent collections common radial vector. If the DFs is the same, it won’t be serialized. If unspecified, the domain values are serialized.
  • withdP (bool) – when True, the large, decomposed SOAP vector values are serialized. Necessary to completely reconstruct the representation, but not necessary for most analysis.
class gblearn.decomposition.DFCollection(dfs=None, counts=None)[source]

Represents a collection of Distrubition Functions. A DF can only be added once to a collection (object instance comparison). Two identical DFs (in the physical sense) can belong to the same collection.

Parameters:
  • dfs (list) – of DF to intialize the collection with.
  • counts (list) – of int specifying how many DFs of each type are in the collection. For example, self.counts[3] says that we have three DFs in the collection that are identical to the DF in self.dfs[3], though we only keep one of them.
dfs

list – of DF to intialize the collection with.

counts

list – of int specifying how many DFs of each type are in the collection. For example, self.counts[3] says that we have three DFs in the collection that are identical to the DF in self.dfs[3], though we only keep one of them.

label

str – user-defined label for the collection.

tags

dict – user-defined tags to identify this collection in a larger analysis.

add(df)[source]

Add a new DF to the collection.

Parameters:df (DF) – instance to add; if it is already in the collection, nothing will happen.
average

Returns the averaged radial distribution function for the collection.

static dfs_from_soap(collection, ctype, resolution=75, catom=False)[source]

Creates a DF collection from the specified SOAPVector collection using the \(l>0\) components.

Parameters:
  • collection (SOAPVectorCollection) – collection of SOAP vectors to initialize from.
  • ctype (str) – one of [‘RDF’, ‘ADF’], specifies the kind of collection to initialize.
  • resolution (int) – number of points to sample in the radial domain.
  • catom (bool) – when True, the DFs are constructed using the density of the central atom.
static from_dict(data)[source]

Restores a DFCollection from a serialized dict (i.e., one returned by serialize()).

Parameters:data (dict) – result of calling serialize().
static from_file(filename)[source]

Restores a DFCollection from file.

Parameters:filename (str) – path to the file that was created by DF.save().
histogram(epsilon_=None, savefile=None, **kwargs)[source]

Determines how many of each kind of DF are in this collection (using the default comparison). Plots the histogram of unique values. This is equivalent to calling DFCollection.unique.histogram().

Parameters:
  • epsilon (float) – override the global (default) value epsilon for determining if the DFs are similar.
  • kwargs (dict) – arguments will be passed directly to matplotlib.pyplot.bar() for the plotting.
Returns:

of the unique values; same result as

DFCollection.unique().

Return type:

DFCollection

plot(ax=None, savefile=None, shells=None, color=’b’, title=None, xlabel=None, ylabel=None, withavg=False)[source]

Plots the collection of DFs on the same axes. Opacity of the line is chosen based on how many DFs in the collection are identical to that one (within tolerance). This is accomplished by calling DFCollection.histogram().

Parameters:
  • ax (matplotlib.axes.Axes) – axes to plot on. If not specified, a new figure and new axes will be created.
  • savefile (str) – path to a file if the figure should be saved.
  • shells (list) – of float values for neighbor shells to be added as vertical lines to the plot.
  • color – any color option received by matplotlib.pyplot.plot().
  • title (str) – override the default title of the plot.
  • xlabel (str) – override the default (generic, not very helpful) label for the x-axis.
  • ylabel (str) – override the default label for the y-axis.
  • withavg (bool) – when True, the average DF for the whole collection is also plotted.
project(other, epsilon_=None)[source]

Projects the RDFs in self into the unique RDFs of other.

Returns:(numpy.ndarray projection, list exceptions); projection is the number of each kind of RDF in other with sum(result) == len(self); exceptions are indices in self for which there was no similar DF in other.
Return type:tuple
refine(other, epsilon_=None)[source]

Refines this DFCollection to include all DFs in other that are not already in self. Also reduces existing DFs in self to be unique.

Parameters:other (DFCollection) – another collection to augment this one with.
remove(df)[source]

Remove an DF from the collection.

Parameters:rdf (DF) – instance to remove; if it isn’t already in the collection, nothing will happen.
save(target, withdP=False)[source]

Saves the DF to disk.

Parameters:
  • target (str) – path to save the vector to.
  • withdP (bool) – when True, the large, SOAP decomposition from which the DF was constructed is also included. Wasteful, but necessary to completely represent the DF.
serialize(withdP=False)[source]

Returns a serializable dictionary that represents this DFCollection.

Parameters:withdP (bool) – when True, the large, SOAP decomposition from which the DF was constructed is also included. Wasteful, but necessary to completely represent the DF.
unique(epsilon_=None)[source]

Returns only the unique DFs in this collection.

Parameters:epsilon (float) – override the global (default) value epsilon for determining if the DFs are similar.
Returns:of unique DFs within the specified tolerance.
Return type:DFCollection
class gblearn.decomposition.RDFCollection(dfs=None, counts=None)[source]

Represents a radial distribution function collection. See also the comments for DFCollection.

static from_soap(collection, resolution=75, catom=False)[source]

Creates an RDF collection from the specified SOAPVector collection using the \(l>0\) components.

Parameters:
  • collection (SOAPVectorCollection) – collection of SOAP vectors to initialize from.
  • resolution (int) – number of points to sample in the radial domain.
  • catom (bool) – when True, the DFs are constructed using the density of the central atom.
class gblearn.decomposition.SOAPDecomposer(nspecies=1, lmax=12, nmax=12, rcut=6.0, sigma=0.5, trans_width=0.5)[source]

Implements decompsition of vectors in the SOAP space allowing for differing lmax and nmax values in a single session. Uses caching to optimize the decomposition.

Parameters:
  • nspecies (int) – number of species for the SOAP descriptor.
  • nmax (int) – bandwidth limits for the SOAP descriptor radial basis functions.
  • lmax (int) – bandwidth limits for the SOAP descriptor spherical harmonics.
  • rcut (float) – local environment finite cutoff parameter.
  • sigma (float) – width parameter for the Gaussians on each atom.
  • trans_width (float) – distance over which the coefficients in the radial functions are smoothly transitioned to zero.
alpha

float – constant affecting the width of the Gaussians. Inversely proportional to sigma.

rb

numpy.ndarray – positions of radial basis functions in the radial space.

partitions

dict – keys are tuple of (l values, inverse); values are the corresponding indices that form the partition.

fcrs

dict – keys are float values for radius r; values are fcut() evaluated at r.

rbexp

list – of float values; the n-dependent exponential damping factor for radial basis function values.

rbsph

dict – keys are tuple (n,l,r) where n and l are the integer radial and spherical basis function indices and r is the float value at which the spherical bessel function was evaluated.

cRs

dict – keys are float values of radius r; values are the corresponding coefficient matrices returned by _c_numeric().

aRs

dict – keys are tuple (n,l,r) where n and l are the integer radial and spherical basis function indices and r is the float value that the radial functions were evaluated at; values are the corresponding coefficients of the radial basis functions.

transformbasis

numpy.ndarray – transformation matrix needed to account for radial basis function positions when calculating the radial function coefficients using linear algebra.

ADF(dP, ax)[source]

Calculates the angular distribution function for the given SOAP vector decomposition.

Parameters:
  • dp (tuple) – (int index in P, float cnlm, (int ni, int si), (int nj, int sj), int l). These tuples are calculated in bulk for a P vector using gblearn.decomposition.pissnnl().
  • ax (numpy.ndarray) – linear space of polar angle values to evaluate at.
RDF(dP, rx, fast=True)[source]

Computes the summed radial distribution function for the specified decomposed P vector.

Parameters:
  • dp (tuple) – (int index in P, float cnlm, (int ni, int si), (int nj, int sj), int l). These tuples are calculated in bulk for a P vector using gblearn.decomposition.pissnnl().
  • rx (numpy.ndarray) – linear space of values to evaluate at.
apnl(dp, rx, cnum=False, fast=True)[source]

Returns the analytically/numerically derived coefficient for the specified, decomposed pissnnl.

Parameters:
  • dp (tuple) – (int index in P, float cnlm, (int ni, int si), (int nj, int sj), int l). These tuples are calculated in bulk for a pissnnl vector using gblearn.decomposition.pissnnl().
  • rx (numpy.ndarray) – linear space of values to evaluate at.
  • cnum (bool) – when True, the function is evaluated numerically using linear algebra instead of the built-in numpy functions. See also cnl().
  • fast (bool) – when True, caching is used to speed up evaluation of the basis functions between multiple pissnnl vectors.
cnl(n, l, r, cnum=False, fast=True)[source]

Returns the coefficient of the radial basis function \(g_n(r)\).

Parameters:
  • n (int) – index of the radial basis function.
  • l (int) – index of the spherical harmonic associated with the radial function.
  • r (float) – value at which to evaluate the basis function \(g_n(r)\).
  • cnum (bool) – when True, the function is evaluated numerically using linear algebra instead of the built-in numpy functions; can be faster, but the results can have numerical noise issues.
  • fast (bool) – when True, caching is used to speed up evaluation of the basis functions between multiple pissnnl vectors.
decompose(P, vcutoff=1e-05)[source]

Decomposes the specified SOAP vector into its radial and angular contributions using gblearn.decomposition.pissnnl().

Parameters:
  • P (numpy.ndarray) – vector representing the projection of a local atomic environment into the SOAP space.
  • vcutoff (float) – values higher than this will be included in the RDF.
get_params()[source]

Returns a dictionary of the constructor parameters needed to re-initialize this decomposer instance.

partition(lfilter, inverse=False)[source]

Returns a list of indices for a P vector that correspond to the specified l values.

Parameters:
  • lfilter (list) – of l values to include in the resulting indices.
  • inverse (bool) – when True, then any l values not in lfilter are included instead of the inverse.
Returns:

of int indices that have the specified l values.

Return type:

(numpy.ndarray)

class gblearn.decomposition.SOAPVector(P, decomposer)[source]

Wrapper for a SOAP vector that facilitates decomposition, plotting and analysis in the SOAP space.

Parameters:
  • P (numpy.ndarray) – vector representing the projection of a local atomic environment into the SOAP space.
  • decomposer (SOAPDecomposer) – instance used to decompose the SOAP vector; has configuration information for SOAP parameters and caches for rapid evaluation of the basis functions.
dnP

list – of tuples; result of calling gblearn.decomposition.pissnnl() on the SOAP vector with l=0 components set to zero.

dcP

list – of tuples; result of calling gblearn.decomposition.pissnnl() on the SOAP vector with l>0 components set to zero.

nP

numpy.ndarray – a copy of P with l=0 components set to zero.

cP

numpy.ndarray – a copy of P with l>0 components set to zero.

rx

numpy.ndarray – sample points in the radial domain at which cRDF and nRDF were sampled.

cRDF

DF – radial distribution function for the central atom.

nRDF

DF – radial distribution function for all the neighbors of the central atom.

rx

numpy.ndarray – sample points in the angular domain at which cADF and nADF were sampled.

cADF

DF – angular distribution function for the central atom.

nADF

DF – angular distribution function for all the neighbors of the central atom.

ADF(ax, catom=False)[source]

Returns the angular distribution function for the SOAP vector.

Parameters:
  • ax (numpy.ndarray) – domain in the angular space to evaluate the ADF at.
  • catom (bool) – when True, the ADF of the central atom is returned; otherwise, the central atom is ignored and the neighbor ADF is returned.
RDF(rx, catom=False)[source]

Returns the total radial distribution function for the SOAP vector.

Parameters:
  • rx (numpy.ndarray) – domain in the radial space to evaluate the RDF at.
  • catom (bool) – when True, the RDF of the central atom is returned; otherwise, the central atom is ignored and the neighbor RDF is returned.
ax = None

numpy.ndarray – cached value of the angular values that we evaluated the ADF at; useful if the ADF is requested repeatedly for the same theta values.

cADF = None

numpy.ndarray – cached values of the ADF for the cached value in ax with \(l=0\).

cRDF = None

numpy.ndarray – cached values of the RDF for the cached value in rx with \(l=0\).

equal(other)[source]

Performs a more thorough equality check by also comparing the SOAP decomposers and RDFs.

static from_dict(data, decomposer_=None, rx=None, ax=None)[source]

Restores a SOAP vector from a serialized dict (i.e., one returned by serialize()).

Parameters:
  • data (dict) – result of calling serialize().
  • decomposer (SOAPDecomposer) – if this serialization was part of a higher level collection, then the decomposer would not have been included at the item level. For reconstruction, it is passed in by the collection.
  • rx (numpy.ndarray) – as for decomposer, but with the values in the domain.
  • ax (numpy.ndarray) – as for decomposer, but with the angular values in the domain.
static from_element(element, index=0, **kwargs)[source]

Returns the SOAP vector for a pure element.

Parameters:
  • element (str) – element name. Must be one of [“Ni”, “Cr”, “Mg”].
  • index (int) – for multi-atom bases, which atom to represent in the elemental crystal.
  • nmax (int) – bandwidth limits for the SOAP descriptor radial basis functions.
  • lmax (int) – bandwidth limits for the SOAP descriptor spherical harmonics.
  • rcut (float) – local environment finite cutoff parameter.
  • sigma (float) – width parameter for the Gaussians on each atom.
  • trans_width (float) – distance over which the coefficients in the radial functions are smoothly transitioned to zero.
static from_file(filename, decomposer=None, rx=None, ax=None)[source]

Restores a SOAP vector from file.

Parameters:
  • filename (str) – path to the file that was created by SOAPVector.save().
  • decomposer (SOAPDecomposer) – if this serialization was part of a higher level collection, then the decomposer would not have been included at the item level. For reconstruction, it is passed in by the collection.
  • rx (numpy.ndarray) – as for decomposer, but with the values in the domain.
  • ax (numpy.ndarray) – as for decomposer, but with the angular values in the domain.
nADF = None

numpy.ndarray – cached values of the ADF for the cached value in ax with \(l>0\).

nRDF = None

numpy.ndarray – cached values of the RDF for the cached value in rx with \(l>0\).

rx = None

numpy.ndarray – cached value of the radial values that we evaluated the RDF at; useful if the RDF is requested repeatedly for the same r values.

save(target)[source]

Saves the SOAP vector to disk so that RDF and decomposition operations don’t have to be re-executed later.

Parameters:target (str) – path to save the vector to.
serialize(withdecomp=True, commonrx=None, withdP=True, commonax=None)[source]

Returns a serializable dictionary that represents this SOAP Vector.

Parameters:
  • withdecomp (bool) – when True, include the parameters of the SOAP decomposer in the dict.
  • commonrx (numpy.ndarray) – xf this DF is part of a collection, the radial values will be the same for every DF; in that case we don’t need to include the domain in the serialization. This is the parent collections common radial vector. If the DFs is the same, it won’t be serialized. If unspecified, the domain values are serialized.
  • withdP (bool) – when True, the large, decomposed SOAP vector values are serialized. Necessary to completely reconstruct the representation, but not necessary for most analysis.
  • commonax (numpy.ndarray) – same as for commonrx but for the angular domain.
class gblearn.decomposition.SOAPVectorCollection(Pij=None, decomposer=None, calculate=True, **kwargs)[source]

Represents a collection of SOAP vectors that are logically combined (for example the local environments at a grain boundary are all part of the grain boundary). The collection allows aggregate quantities to be calculated more easily.

Parameters:
  • Pij (numpy.ndarray) – a matrix of SOAP vectors, where each row is a vector.
  • decomposer (SOAPDecomposer) – instance used to decompose the SOAP vectors; has configuration information for SOAP parameters and caches for rapid evaluation of the basis functions.
  • calculate (bool) – when True, the SOAPVectors are calculated based on P; otherwise, they are left as an empty list.
  • kwargs (dict) – arguments that can be passed to SOAPDecomposer constuctor.

Examples

Create a collection of SOAP vectors and then collapse them into a set of unique ones (an instance of RDFCollection).

>>> from gblearn.decomposition import SOAPVectorCollection as SVC
>>> from numpy import load
>>> P = load("pissnnl.npy")
>>> svc = SVC(P, lmax=18, nmax=18)
>>> urdfs = svc.unique(rdf=True)
ADFs(resolution=100, catom=False)[source]

Returns the set of unique angular distribution functions for this collection.

Parameters:
  • resolution (int) – number of points to sample in the angular domain.
  • catom (bool) – when True, the DFs are constructed using the density of the central atom.
RDFs(resolution=75, catom=False)[source]

Returns the set of unique radial distribution functions for this collection.

Parameters:
  • resolution (int) – number of points to sample in the radial domain.
  • catom (bool) – when True, the DFs are constructed using the density of the central atom.
equal(other)[source]

Performs a more rigorous equality test for two collections by also comparing the radial and angular distribution functions.

static from_dict(data)[source]

Restores a SOAPVectorCollection from a serialized dict (i.e., one returned by serialize()).

Parameters:data (dict) – result of calling serialize().
static from_file(filename)[source]

Restores a SOAPVectorCollection from file.

Parameters:filename (str) – path to the file that was created by SOAPVectorCollection.save().
save(target, withdP=False)[source]

Saves the DF to disk.

Parameters:
  • target (str) – path to save the vector to.
  • withdP (bool) – when True, the large, SOAP decomposition from which the DF was constructed is also included. Wasteful, but necessary to completely represent the DF.
serialize(withdP=False)[source]

Returns a serializable dictionary that represents this SOAPVectorCollection.

Parameters:withdP (bool) – when True, the large, SOAP decomposition from which the DFs are constructed is also included. Wasteful, but necessary to completely represent the DFs.
gblearn.decomposition.epsilon = 5e-05

float – finite precision comparison value for RDF norms.

gblearn.decomposition.fcut(r, rcut, trans_width)[source]

Applies the cutoff function to the coefficients so that the function will go smoothly to zero at the cutoff.

Parameters:
  • r (float) – radial distance parameter.
  • rcut (float) – values greater than this are identically zero.
  • trans_width (float) – the transition to zero at the cutoff happens smoothly across this width.
Returns:

value in [0., 1.] to multiply the coefficient by so

that all values fall within the specified cutoff.

Return type:

float

gblearn.decomposition.pissnnl(pissnnl, nspecies, vcutoff, nmax=12, lmax=12)[source]

Returns the geometric information for any entries of a pissnnl vector higher than the specified cutoff.

Parameters:
  • pissnnl (numpy.ndarray) – vector to examine.
  • nspecies (int) – number of species for the SOAP descriptor.
  • vcutoff (float) – values higher than this will be returned with geometry information.
  • nmax (int) – bandwidth limits for the SOAP descriptor radial basis functions.
  • lmax (int) – bandwidth limits for the SOAP descriptor spherical harmonics.
Returns:

of tuples with (index, value, (n_i, s_i), (n_j, s_j), l), where n_i

is the n value of the radial basis function for species i, and s_i is the value of the species index.

Return type:

list