QCArchive in 15 minutes#

This tutorial will give you an overview of possible actions in QCArchive. Using QCArchive, you can:

  1. Submit a single or set of computations to a server, following a variety of workflows.

  2. Retrieve the results of previous computations.

  3. Query the database for particular computations.

  4. Create datasets holding related quantum chemistry computations.

  5. Retrieve results from datasets.

This notebook will briefly walk you through each capability. For more details, we recommend you follow our other starter tutorials.

Connecting to a server#

To work with QCArchive, you will need to connect to a QCArchive server. For the QuickStart tutorials, we will connect to the QCArchive Demo Server. To interact with a server, you will create a QCPortal client using PortalClient. The argument to PortalClient is the server address. To work with the QCArchive demo server, enter https://qcademo.molssi.org.

import qcportal as ptl

client = ptl.PortalClient("https://qcademo.molssi.org")
print(client)
WARNING: This client version is newer than the server version. This may work if the versions are close, but expect exceptions and errors if attempting things the server does not support. client version: 0.54.1.post34+ga87767a0, server version: 0.54.1
PortalClient(server_name='MolSSI QCFractal Demo Server', address='https://qcademo.molssi.org/', username='None')

We now have a QCPortal client that we can use to read from and query the QCArchive demo server.

Retrieving Data and Querying the Database#

How do I retrieve computation results by ID?#

We can retrieve computations by ID using the client.get_records function. Each computation in the database is given an integer ID number. The cell below shows retrieval of the calculation result with ID 1. We see that this computation was a single point computation.

first_record = client.get_records(1)
print(first_record)
<SinglepointRecord id=1 status=RecordStatusEnum.complete>

Typically, properties from a calculation can be viewed using the .properties attribute for a result. The calculated properties are in a dictionary. In the cell below, we print the SCF total energy from our calcuation.

print(first_record.properties["scf_total_energy"])
-74.82586558254185

We can print information about the computation like the molecule name and properties.

print(f"Molecule: {first_record.molecule.name}, Energy: {first_record.properties['scf_total_energy']}")
Molecule: H2O, Energy: -74.82586558254185

If you pass in several IDs, you will receive a list of results that can be iterated through.

multiple_records = client.get_records([1, 2, 3]) 

for record in multiple_records:
    print(f"Molecule: {record.molecule.name}, Energy: {record.properties['scf_total_energy']}")
Molecule: H2O, Energy: -74.82586558254185
Molecule: H2, Energy: -1.0660263371078118
Molecule: H2, Energy: -1.1117197122934774

How do I search for particular types of computations?#

You can search the database for particular computations using query_records. For example, to see all results from a particular time period, we can use query_records with arguments created_before and created_after.

There are many fields you can query the database on and this can differ by the type of computation you’d like to retrieve. The query method returns a Python iterator. For more detailed information, see the quickstart tutorial on querying.

records = client.query_records(created_after="2023/01/10", created_before="2023/01/14")

# Print the first record.
print(next(records))
---------------------------------------------------------------------------
StopIteration                             Traceback (most recent call last)
Cell In[6], line 4
      1 records = client.query_records(created_after="2023/01/10", created_before="2023/01/14")
      3 # Print the first record.
----> 4 print(next(records))

File ~/work/QCFractal/QCFractal/qcportal/qcportal/base_models.py:155, in QueryIteratorBase.__next__(self)
    151 def __next__(self) -> T:
    152     # This can happen if there is none returned on the first iteration
    153     # Check here so we don't fetch twice
    154     if len(self._current_batch) == 0:
--> 155         raise StopIteration
    157     if self._current_pos >= len(self._current_batch):
    158         # At the end of the current batch. Fetch the next
    159         self._fetch_batch()

StopIteration: 

How do I retrieve results from a dataset?#

QCArchive also supports data to be stored in datasets. A dataset is a set of related computations. Datasets can be created when computations are submitted, or after computations have completed, and computations can belong to multiple datasets. Datasets are the primary use case for QCArchive, and are usually created with large-scale workflows. Datasets will contain only one type of calculation.

We can list all of the datasets on the server we’ve connected to using list_datasets. Below, we print the names of the data sets on the QCArchive demo server.

datasets = client.list_datasets()

for dataset in datasets:
    print(f"Name: {dataset['dataset_name']}, Type: {dataset['dataset_type']}")

We can retrieve records from a particular dataset using the get_dataset method and passing in the dataset name and type. The following cell retrieves the “Element Benchmark” dataset.

ds = client.get_dataset(dataset_type="singlepoint", dataset_name="Element Benchmark")
print(ds.description)

Datasets have a lot of properties that are beyond the scope of this overview. Datasets are made up of many records of the same type of computation that can differ in molecule identity or other specification parameters. You can pull out iterate over records, see specifications, and compile values from records.

The cell below shows using the get_properties_df method to create a pandas dataframe containing the SCF total energy and SCF iterations from each record.

# use compile_values to make a dataframe
df = ds.get_properties_df(["scf_total_energy", "scf_iterations"])

# view the first 10 rows.
df.head(10)

This dataframe is a multi-index dataframe with the top level index being the “specification” of our calculation. For example, we can pull out just our results for hf/sto-3g.

df["hf/sto-3g"]

df["hf/sto-3g"]["scf_total_energy"] will give us the SCF total energy for all of the records in the dataset with the hf/sto-3g specification.

df["hf/sto-3g"]["scf_total_energy"]

Submitting Computations#

Beyond retrieving results and querying the database, QCArchive provides a robust system for submitting computations. You may submit single computations, multiple computations, or computations to create a dataset.

Our QCArchive demo server is publicly readable. This means you do not need a username or password to access the data. However, to submit computations, a username and password is required.

Protecting usernames and passwords

When connecting to QCArchive using a username and password, be careful to never commit this information to publicly accessible repositories. You can store credentials in environment variables, as shown in the cell below, or you can read user information from a file.

In the cell below, we read environment variables set in the local environment for our username and password. We retrieve these using os.environ.get.

import os

import qcportal as ptl
from qcportal.molecules import Molecule

client = ptl.PortalClient("https://qcademo.molssi.org", 
                          username=os.environ.get("QCArchiveUsername"), 
                          password=os.environ.get("QCArchivePWD"))

We now have a QCPortal client that can be used to submit computations.

How do I submit a computation?#

QCArchive currently supports seven different computation types including single point, geometry optimization, reactions, and torsion drives.

For this overview, we will show submitting a single point computation for water using two different methods. This notebook shows inputting an XYZ string for our molecule, but there are a number of ways to enter molecule information. Our molecule geometry in this example is an optimized structure of water.

water_xyz = """
3

H                     0.026223561887     1.224983815810     0.000000000000
H                     0.971741135004     0.039335313725     0.000000000000
O                     0.002035305512     0.235680871424     0.000000000000

"""

water = Molecule.from_data(water_xyz)

If NGLView is installed in your environment, the molecule objects in QCArchive can be visualized using NGLView by putting the variable representing the molecule as the last thing in a notebook cell.

water

To submit our single point computation, we will use the add_singlepoints method. We will submit two single point computations for the same molecule using different methods.

For add_singlepoints, you specify the program you want to run (Psi4 in our case), the driver, the method and the basis set. The driver determines what is in the return_result for the record. For this demonstration, we are submitting two single point calculations for water with a differing method (b3lyp vs mp2).

b3lyp_meta, b3lyp_record_ids = client.add_singlepoints([water], 
                                                       program='psi4', 
                                                       driver='energy', 
                                                       method='b3lyp', 
                                                       basis='def2-tzvp')

mp2_meta, mp2_record_ids = client.add_singlepoints([water], 
                                                   program='psi4', 
                                                   driver='energy', 
                                                   method='mp2', 
                                                   basis='def2-tzvp')

Once submitted, we can retrieve the results using the get_records method shown earlier in the tutorial.

b3lpy_record = client.get_records(b3lyp_record_ids[0])
mp2_record = client.get_records(mp2_record_ids[0])

print(f"B3LYP Status:\t{b3lpy_record.status}")
print(f"MP2 Status:\t{mp2_record.status}")

When the computations are complete, we can retrieve the energies in the same way we did earlier.

print(f"B3LYP result: {b3lpy_record.return_result}")
print(f"MP2 result: {mp2_record.return_result}")

How do I create datasets?#

Instead of submitting these computations separately, we could have grouped them together in a dataset. This would allow us to more easily retrieve the results together.

To create a dataset, you use the create_dataset method.


ds = client.add_dataset("singlepoint",
                        name="Water calculations",
                        description="Single point calculations of water at various levels of theory.")

Creation of datasets is beyond the scope of this overview tutorial. For more information on dataset construction, see the Dataset Quickstart.