Molecules#

The molecule objects used in QCArchive come from the QCElemental package. This class is also accessible from qcportal.molecules.Molecule, but is otherwise the same as the QCElemental Molecule.

Creating Molecules#

A common way to programmatically create a molecule is with the Molecule constructor. With the constructor, you typically specify the symbols and geometry (coordinates) in bohr:

>>> from qcportal.molecules import Molecule
>>> water = Molecule(symbols=['H', 'H', 'O'], geometry=[[0.0, 2.0, 0.0], [2.0, 0.0, 0.0], [0.0, 0.0, 0.0]])
>>> print(water)
Molecule(name='H2O', formula='H2O', hash='3a04ba3')

The geometry can be specified as a nested list as above, or simply as a flattened list. The below is equivalent to the above:

>>> from qcportal.molecules import Molecule
>>> water = Molecule(symbols=['H', 'H', 'O'], geometry=[0.0, 2.0, 0.0, 2.0, 0.0, 0.0, 0.0, 0.0, 0.0])
>>> print(water)
Molecule(name='H2O', formula='H2O', hash='3a04ba3')

Another option is to read molecules from strings that are in some common format (like XYZ, although it will work with more):

>>> mol_xyz = """
...   3
...
...   H   0.000000 1.000000 0.000000
...   H   1.000000 0.000000 0.000000
...   O   0.000000 0.000000 0.000000
... """
>>> water = Molecule.from_data(mol_xyz)
>>> print(water2)
Molecule(name='H2O', formula='H2O', hash='246998b')

Note that in the case of XYZ files, the units are angstroms.

Similarly, you can construct the molecule from a file. If the above XYZ data was in a file,

>>> water = Molecule.from_file('water.xyz')
>>> print(water2)
Molecule(name='H2O', formula='H2O', hash='246998b')

Adding molecules to the server#

Hint

For the most part, this is not needed. You can pass a Molecule object to functions like add_singlepoints() and they will be automatically added to the database as needed. However, in some cases, this may be useful.

Molecules can be added to the server database with ~qcportal.client.PortalClient.add_molecules. This returns some metadata about the insertion, and the molecule IDs.

>>> water = Molecule(symbols=['H', 'H', 'O'], geometry=[0.0, 2.0, 0.0, 2.0, 0.0, 0.0, 0.0, 0.0, 0.0])
>>> water2 = Molecule(symbols=['H', 'H', 'O'], geometry=[0.0, 2.5, 0.0, 2.5, 0.0, 0.0, 0.0, 0.0, 0.0])
>>> meta, ids = client.add_molecules([water, water2])
>>> print(meta)
InsertMetadata(error_description=None, errors=[], inserted_idx=[0, 1], existing_idx=[])

>>> print(ids)
[585, 586]

These IDs can be passed into functions like add_singlepoints() instead of full molecule objects.

Retrieving and Querying Molecules#

The client has two methods for retrieving molecules: get_molecules() and query_molecules(). The get_molecules() method is used to get molecules by ID, and returns molecules in the same order as the given ids.

>>> mols = client.get_molecules([5, 15, 10])
>>> print(mols[0].id, mols[1].id, mols[2].id)
5 15 10

You can also specify a single ID and get a single molecule back

>>> mol = client.get_molecules(5)
>>> print(mol.id)
5

You can also query the molecules in the database with query_molecules(). This function returns an iterator, which you can then use to iterate over the results. The iterator automatically handles returning batches or pages of query results from the server.

>>> query_it = client.query_molecules(molecular_formula='N2')
>>> for mol in query_it:
...    print(mol.id, mol.identifiers.molecular_formula)
371 N2
372 N2

Caution

Unlike the get_molecules function, the molecules from query_molecules are not in any defined order, and the order may be different even with repeated calls with the same arguments

Managing Molecules#

Molecules can be deleted from the server with delete_molecules()

>>> meta = client.delete_molecules([585])
>>> print(meta)
DeleteMetadata(error_description=None, errors=[], deleted_idx=[0], n_children_deleted=0)

The server also allows for some limited modification of molecules. This is limited to the name, comment, and identifiers of the molecule. By default, new identifiers will be merged with the existing identifiers unless overwrite_identifiers=True, in which case all identifiers will be replaced (that is, identifiers that are not specified in the call to modify_molecules will be removed).

>>> n2 = Molecule(symbols=['N', 'N'], geometry=[0.0, 0.0, 0.0, 0.0, 0.0, 2.0],
...               name='nitrogen', comment='initial geometry of nitrogen', identifiers={'smiles': 'N#N'})
>>> _, ids = client.add_molecules([n2])
>>> print(ids)
[601]

>>> meta = client.modify_molecule(601, name='dinitrogen', comment='dinitrogen molecule',
...                               identifiers={'pubchem_cid': '947'})
>>> print(meta)
UpdateMetadata(error_description=None, errors=[], updated_idx=[0], n_children_updated=0)

>>> mol = client.get_molecules(601)
>>> print(mol.name)
dinitrogen

>>> print(mol.comment)
dinitrogen molecule

>>> print(mol.identifiers.smiles)
N#N

>>> print(mol.identifiers.pubchem_cid)
947