Molecules#
The molecule objects used in QCArchive come from the
QCElemental package. This class is also accessible
from qcportal.molecules.Molecule
, but is otherwise the same as the
QCElemental Molecule.
Creating Molecules#
A common way to programmatically create a molecule is with the Molecule constructor. With the constructor, you typically specify the symbols and geometry (coordinates) in bohr:
>>> from qcportal.molecules import Molecule
>>> water = Molecule(symbols=['H', 'H', 'O'], geometry=[[0.0, 2.0, 0.0], [2.0, 0.0, 0.0], [0.0, 0.0, 0.0]])
>>> print(water)
Molecule(name='H2O', formula='H2O', hash='3a04ba3')
The geometry can be specified as a nested list as above, or simply as a flattened list. The below is equivalent to the above:
>>> from qcportal.molecules import Molecule
>>> water = Molecule(symbols=['H', 'H', 'O'], geometry=[0.0, 2.0, 0.0, 2.0, 0.0, 0.0, 0.0, 0.0, 0.0])
>>> print(water)
Molecule(name='H2O', formula='H2O', hash='3a04ba3')
Another option is to read molecules from strings that are in some common format (like XYZ, although it will work with more):
>>> mol_xyz = """
... 3
...
... H 0.000000 1.000000 0.000000
... H 1.000000 0.000000 0.000000
... O 0.000000 0.000000 0.000000
... """
>>> water = Molecule.from_data(mol_xyz)
>>> print(water2)
Molecule(name='H2O', formula='H2O', hash='246998b')
Note that in the case of XYZ files, the units are angstroms.
Similarly, you can construct the molecule from a file. If the above XYZ data was in a file,
>>> water = Molecule.from_file('water.xyz')
>>> print(water2)
Molecule(name='H2O', formula='H2O', hash='246998b')
Adding molecules to the server#
Hint
For the most part, this is not needed. You can pass a Molecule object to functions like
add_singlepoints()
and they will be automatically added
to the database as needed. However, in some cases, this may be useful.
Molecules can be added to the server database with ~qcportal.client.PortalClient.add_molecules. This returns some metadata about the insertion, and the molecule IDs.
>>> water = Molecule(symbols=['H', 'H', 'O'], geometry=[0.0, 2.0, 0.0, 2.0, 0.0, 0.0, 0.0, 0.0, 0.0])
>>> water2 = Molecule(symbols=['H', 'H', 'O'], geometry=[0.0, 2.5, 0.0, 2.5, 0.0, 0.0, 0.0, 0.0, 0.0])
>>> meta, ids = client.add_molecules([water, water2])
>>> print(meta)
InsertMetadata(error_description=None, errors=[], inserted_idx=[0, 1], existing_idx=[])
>>> print(ids)
[585, 586]
These IDs can be passed into functions like add_singlepoints()
instead of
full molecule objects.
Retrieving and Querying Molecules#
The client has two methods for retrieving molecules: get_molecules()
and
query_molecules()
. The get_molecules()
method is used to get molecules by ID, and returns molecules in the same order as the given ids.
>>> mols = client.get_molecules([5, 15, 10])
>>> print(mols[0].id, mols[1].id, mols[2].id)
5 15 10
You can also specify a single ID and get a single molecule back
>>> mol = client.get_molecules(5)
>>> print(mol.id)
5
You can also query the molecules in the database with query_molecules()
.
This function returns an iterator, which you can then use to
iterate over the results. The iterator automatically handles returning batches or pages of query results from the
server.
>>> query_it = client.query_molecules(molecular_formula='N2')
>>> for mol in query_it:
... print(mol.id, mol.identifiers.molecular_formula)
371 N2
372 N2
Caution
Unlike the get_molecules
function, the molecules from query_molecules
are not in any defined order,
and the order may be different even with repeated calls with the same arguments
Managing Molecules#
Molecules can be deleted from the server with delete_molecules()
>>> meta = client.delete_molecules([585])
>>> print(meta)
DeleteMetadata(error_description=None, errors=[], deleted_idx=[0], n_children_deleted=0)
The server also allows for some limited modification of molecules. This is limited to the name, comment, and
identifiers of the molecule. By default, new identifiers will be merged with the existing identifiers
unless overwrite_identifiers=True
, in which case all identifiers will be replaced (that is, identifiers
that are not specified in the call to modify_molecules
will be removed).
>>> n2 = Molecule(symbols=['N', 'N'], geometry=[0.0, 0.0, 0.0, 0.0, 0.0, 2.0],
... name='nitrogen', comment='initial geometry of nitrogen', identifiers={'smiles': 'N#N'})
>>> _, ids = client.add_molecules([n2])
>>> print(ids)
[601]
>>> meta = client.modify_molecule(601, name='dinitrogen', comment='dinitrogen molecule',
... identifiers={'pubchem_cid': '947'})
>>> print(meta)
UpdateMetadata(error_description=None, errors=[], updated_idx=[0], n_children_updated=0)
>>> mol = client.get_molecules(601)
>>> print(mol.name)
dinitrogen
>>> print(mol.comment)
dinitrogen molecule
>>> print(mol.identifiers.smiles)
N#N
>>> print(mol.identifiers.pubchem_cid)
947