QCArchive in 15 minutes#

This tutorial will give you an overview of possible actions in QCArchive. Using QCArchive, you can:

  1. Submit a single or set of computations to a server, following a variety of workflows.

  2. Retrieve the results of previous computations.

  3. Query the database for particular computations.

  4. Create datasets holding related quantum chemistry computations.

  5. Retrieve results from datasets.

This notebook will briefly walk you through each capability. For more details, we recommend you follow our other starter tutorials.

Connecting to a server#

To work with QCArchive, you will need to connect to a QCArchive server. For the QuickStart tutorials, we will connect to the QCArchive Demo Server. To interact with a server, you will create a QCPortal client using PortalClient. The argument to PortalClient is the server address. To work with the QCArchive demo server, enter https://qcademo.molssi.org.

import qcportal as ptl

client = ptl.PortalClient("https://qcademo.molssi.org")
print(client)
WARNING: This client version is newer than the server version. This may work if the versions are close, but expect exceptions and errors if attempting things the server does not support. client version: 0.57.post120+g43f985ed, server version: 0.57
PortalClient(server_name='MolSSI QCFractal Demo Server', address='https://qcademo.molssi.org/', username='None')

We now have a QCPortal client that we can use to read from and query the QCArchive demo server.

Retrieving Data and Querying the Database#

How do I retrieve computation results by ID?#

We can retrieve computations by ID using the client.get_records function. Each computation in the database is given an integer ID number. The cell below shows retrieval of the calculation result with ID 1. We see that this computation was a single point computation.

first_record = client.get_records(1)
print(first_record)
<SinglepointRecord id=1 status=RecordStatusEnum.complete>

Typically, properties from a calculation can be viewed using the .properties attribute for a result. The calculated properties are in a dictionary. In the cell below, we print the SCF total energy from our calcuation.

print(first_record.properties["scf_total_energy"])
-1.1117197122934774

We can print information about the computation like the molecule name and properties.

print(f"Molecule: {first_record.molecule.name}, Energy: {first_record.properties['scf_total_energy']}")
Molecule: H2, Energy: -1.1117197122934774

If you pass in several IDs, you will receive a list of results that can be iterated through.

multiple_records = client.get_records([1, 2, 3]) 

for record in multiple_records:
    print(f"Molecule: {record.molecule.name}, Energy: {record.properties['scf_total_energy']}")
Molecule: H2, Energy: -1.1117197122934774
Molecule: HO, Energy: -74.36430040801095
Molecule: H2O, Energy: -74.82586558254185

How do I search for particular types of computations?#

You can search the database for particular computations using query_records. For example, to see all results from a particular time period, we can use query_records with arguments created_before and created_after.

There are many fields you can query the database on and this can differ by the type of computation you’d like to retrieve. The query method returns a Python iterator.

records = client.query_records(created_after="2024/01/01")

# Print the first record.
print(next(records))
<SinglepointRecord id=78 status=RecordStatusEnum.error>

How do I retrieve results from a dataset?#

QCArchive also supports data to be stored in datasets. A dataset is a set of related computations. Datasets can be created when computations are submitted, or after computations have completed, and computations can belong to multiple datasets. Datasets are the primary use case for QCArchive, and are usually created with large-scale workflows. Datasets will contain only one type of calculation.

We can list all of the datasets on the server we’ve connected to using list_datasets. Below, we print the names of the data sets on the QCArchive demo server.

datasets = client.list_datasets()

for dataset in datasets:
    print(f"Name: {dataset['dataset_name']}, Type: {dataset['dataset_type']}")
Name: Element Benchmark, Type: singlepoint

We can retrieve records from a particular dataset using the get_dataset method and passing in the dataset name and type. The following cell retrieves the “Element Benchmark” dataset.

ds = client.get_dataset(dataset_type="singlepoint", dataset_name="Element Benchmark")
print(ds.description)
Single point calculations of water at various levels of theory.

Datasets have a lot of properties that are beyond the scope of this overview. Datasets are made up of many records of the same type of computation that can differ in molecule identity or other specification parameters. You can pull out iterate over records, see specifications, and compile values from records.

The cell below shows using the get_properties_df method to create a pandas dataframe containing the SCF total energy and SCF iterations from each record.

# use compile_values to make a dataframe
df = ds.get_properties_df(["scf_total_energy", "scf_iterations"])

# view the first 10 rows.
df.head(10)
specification hf/sto-3g mp2/aug-cc-pvtz hf/sto-3g mp2/aug-cc-pvtz
scf_total_energy scf_total_energy scf_iterations scf_iterations
entry
b_atom -24.149117 NaN 2.0 NaN
be_atom -14.352011 -14.572879 2.0 9.0
c_atom -37.089740 -37.603047 2.0 13.0
f_atom -97.986588 NaN 2.0 NaN
h_atom -0.466582 NaN 2.0 NaN
he_atom -2.807913 -2.861206 2.0 7.0
li_atom -7.315604 NaN 4.0 NaN
n_atom -53.554678 NaN 2.0 NaN
ne_atom -126.604573 -128.533266 2.0 12.0
o_atom -73.661918 -74.685504 2.0 12.0

This dataframe is a multi-index dataframe with the top level index being the “specification” of our calculation. For example, we can pull out just our results for hf/sto-3g.

df["hf/sto-3g"]
scf_total_energy scf_iterations
entry
b_atom -24.149117 2.0
be_atom -14.352011 2.0
c_atom -37.089740 2.0
f_atom -97.986588 2.0
h_atom -0.466582 2.0
he_atom -2.807913 2.0
li_atom -7.315604 4.0
n_atom -53.554678 2.0
ne_atom -126.604573 2.0
o_atom -73.661918 2.0

df["hf/sto-3g"]["scf_total_energy"] will give us the SCF total energy for all of the records in the dataset with the hf/sto-3g specification.

df["hf/sto-3g"]["scf_total_energy"]
entry
b_atom     -24.149117
be_atom    -14.352011
c_atom     -37.089740
f_atom     -97.986588
h_atom      -0.466582
he_atom     -2.807913
li_atom     -7.315604
n_atom     -53.554678
ne_atom   -126.604573
o_atom     -73.661918
Name: scf_total_energy, dtype: float64

Submitting Computations#

Beyond retrieving results and querying the database, QCArchive provides a robust system for submitting computations. You may submit single computations, multiple computations, or computations to create a dataset.

Our QCArchive demo server is publicly readable. This means you do not need a username or password to access the data. However, to submit computations, a username and password is required.

Protecting usernames and passwords

When connecting to QCArchive using a username and password, be careful to never commit this information to publicly accessible repositories. You can store credentials in environment variables, as shown in the cell below, or you can read user information from a file.

In the cell below, we read environment variables set in the local environment for our username and password. We retrieve these using os.environ.get.

import os

import qcportal as ptl
from qcportal.molecules import Molecule

client = ptl.PortalClient("https://qcademo.molssi.org", 
                          username=os.environ.get("QCArchiveUsername"), 
                          password=os.environ.get("QCArchivePWD"))
WARNING: This client version is newer than the server version. This may work if the versions are close, but expect exceptions and errors if attempting things the server does not support. client version: 0.57.post120+g43f985ed, server version: 0.57

We now have a QCPortal client that can be used to submit computations.

How do I submit a computation?#

QCArchive currently supports seven different computation types including single point, geometry optimization, reactions, and torsion drives.

For this overview, we will show submitting a single point computation for water using two different methods. This notebook shows inputting an XYZ string for our molecule, but there are a number of ways to enter molecule information. Our molecule geometry in this example is an optimized structure of water.

water_xyz = """3
                          H                     0.026223561887     1.224983815810     0.000000000000
                          H                     0.971741135004     0.039335313725     0.000000000000
                          O                     0.002035305512     0.235680871424     0.000000000000"""

water = Molecule.from_data(water_xyz)

If NGLView is installed in your environment, the molecule objects in QCArchive can be visualized using NGLView by putting the variable representing the molecule as the last thing in a notebook cell.

water

To submit our single point computation, we will use the add_singlepoints method. We will submit two single point computations for the same molecule using different methods.

For add_singlepoints, you specify the program you want to run (Psi4 in our case), the driver, the method and the basis set. The driver determines what is in the return_result for the record. For this demonstration, we are submitting two single point calculations for water with a differing method (b3lyp vs mp2).

b3lyp_meta, b3lyp_record_ids = client.add_singlepoints([water], 
                                                       program='psi4', 
                                                       driver='energy', 
                                                       method='b3lyp', 
                                                       basis='def2-tzvp')

mp2_meta, mp2_record_ids = client.add_singlepoints([water], 
                                                   program='psi4', 
                                                   driver='energy', 
                                                   method='mp2', 
                                                   basis='def2-tzvp')

Once submitted, we can retrieve the results using the get_records method shown earlier in the tutorial.

b3lpy_record = client.get_records(b3lyp_record_ids[0])
mp2_record = client.get_records(mp2_record_ids[0])

print(f"B3LYP Status:\t{b3lpy_record.status}")
print(f"MP2 Status:\t{mp2_record.status}")
B3LYP Status:	RecordStatusEnum.complete
MP2 Status:	RecordStatusEnum.complete

When the computations are complete, we can retrieve the energies in the same way we did earlier.

print(f"B3LYP result: {b3lpy_record.return_result}")
print(f"MP2 result: {mp2_record.return_result}")
B3LYP result: -75.76802068303165
MP2 result: -75.63546371075435

How do I create datasets?#

Instead of submitting these computations separately, we could have grouped them together in a dataset. This would allow us to more easily retrieve the results together.

To create a dataset, you use the create_dataset method.


ds = client.add_dataset("singlepoint",
                        name="Water calculations",
                        description="Single point calculations of water at various levels of theory.")

Creation of datasets is beyond the scope of this overview tutorial. For more information on dataset construction, see the Dataset Quickstart.