QCArchive in 15 minutes#
This tutorial will give you an overview of possible actions in QCArchive. Using QCArchive, you can:
Submit a single or set of computations to a server, following a variety of workflows.
Retrieve the results of previous computations.
Query the database for particular computations.
Create datasets holding related quantum chemistry computations.
Retrieve results from datasets.
This notebook will briefly walk you through each capability. For more details, we recommend you follow our other starter tutorials.
Connecting to a server#
To work with QCArchive, you will need to connect to a QCArchive server.
For the QuickStart tutorials, we will connect to the QCArchive Demo Server.
To interact with a server, you will create a QCPortal client using PortalClient
.
The argument to PortalClient
is the server address.
To work with the QCArchive demo server, enter https://qcademo.molssi.org
.
import qcportal as ptl
client = ptl.PortalClient("https://qcademo.molssi.org")
print(client)
WARNING: This client version is newer than the server version. This may work if the versions are close, but expect exceptions and errors if attempting things the server does not support. client version: 0.57.post14+g011be51d, server version: 0.57
PortalClient(server_name='MolSSI QCFractal Demo Server', address='https://qcademo.molssi.org/', username='None')
We now have a QCPortal client that we can use to read from and query the QCArchive demo server.
Retrieving Data and Querying the Database#
How do I retrieve computation results by ID?#
We can retrieve computations by ID using the client.get_records
function.
Each computation in the database is given an integer ID number.
The cell below shows retrieval of the calculation result with ID 1.
We see that this computation was a single point computation.
first_record = client.get_records(1)
print(first_record)
<SinglepointRecord id=1 status=RecordStatusEnum.complete>
Typically, properties from a calculation can be viewed using the .properties
attribute for a result.
The calculated properties are in a dictionary.
In the cell below, we print the SCF total energy from our calcuation.
print(first_record.properties["scf_total_energy"])
-1.1117197122934774
We can print information about the computation like the molecule name and properties.
print(f"Molecule: {first_record.molecule.name}, Energy: {first_record.properties['scf_total_energy']}")
Molecule: H2, Energy: -1.1117197122934774
If you pass in several IDs, you will receive a list of results that can be iterated through.
multiple_records = client.get_records([1, 2, 3])
for record in multiple_records:
print(f"Molecule: {record.molecule.name}, Energy: {record.properties['scf_total_energy']}")
Molecule: H2, Energy: -1.1117197122934774
Molecule: HO, Energy: -74.36430040801095
Molecule: H2O, Energy: -74.82586558254185
How do I search for particular types of computations?#
You can search the database for particular computations using query_records
.
For example, to see all results from a particular time period, we can use
query_records
with arguments created_before
and created_after
.
There are many fields you can query the database on and this can differ by the type of computation you’d like to retrieve. The query method returns a Python iterator.
records = client.query_records(created_after="2024/01/01")
# Print the first record.
print(next(records))
<SinglepointRecord id=78 status=RecordStatusEnum.error>
How do I retrieve results from a dataset?#
QCArchive also supports data to be stored in datasets. A dataset is a set of related computations. Datasets can be created when computations are submitted, or after computations have completed, and computations can belong to multiple datasets. Datasets are the primary use case for QCArchive, and are usually created with large-scale workflows. Datasets will contain only one type of calculation.
We can list all of the datasets on the server we’ve connected to using list_datasets
.
Below, we print the names of the data sets on the QCArchive demo server.
datasets = client.list_datasets()
for dataset in datasets:
print(f"Name: {dataset['dataset_name']}, Type: {dataset['dataset_type']}")
Name: Element Benchmark, Type: singlepoint
We can retrieve records from a particular dataset using the get_dataset
method and passing in the dataset name and type. The following cell retrieves the “Element Benchmark” dataset.
ds = client.get_dataset(dataset_type="singlepoint", dataset_name="Element Benchmark")
print(ds.description)
Single point calculations of water at various levels of theory.
Datasets have a lot of properties that are beyond the scope of this overview. Datasets are made up of many records of the same type of computation that can differ in molecule identity or other specification parameters. You can pull out iterate over records, see specifications, and compile values from records.
The cell below shows using the get_properties_df
method to create a pandas dataframe containing the SCF total energy and SCF iterations from each record.
# use compile_values to make a dataframe
df = ds.get_properties_df(["scf_total_energy", "scf_iterations"])
# view the first 10 rows.
df.head(10)
specification | hf/sto-3g | mp2/aug-cc-pvtz | hf/sto-3g | mp2/aug-cc-pvtz |
---|---|---|---|---|
scf_total_energy | scf_total_energy | scf_iterations | scf_iterations | |
entry | ||||
b_atom | -24.149117 | NaN | 2.0 | NaN |
be_atom | -14.352011 | -14.572879 | 2.0 | 9.0 |
c_atom | -37.089740 | -37.603047 | 2.0 | 13.0 |
f_atom | -97.986588 | NaN | 2.0 | NaN |
h_atom | -0.466582 | NaN | 2.0 | NaN |
he_atom | -2.807913 | -2.861206 | 2.0 | 7.0 |
li_atom | -7.315604 | NaN | 4.0 | NaN |
n_atom | -53.554678 | NaN | 2.0 | NaN |
ne_atom | -126.604573 | -128.533266 | 2.0 | 12.0 |
o_atom | -73.661918 | -74.685504 | 2.0 | 12.0 |
This dataframe is a multi-index dataframe with the top level index being the “specification” of our calculation.
For example, we can pull out just our results for hf/sto-3g
.
df["hf/sto-3g"]
scf_total_energy | scf_iterations | |
---|---|---|
entry | ||
b_atom | -24.149117 | 2.0 |
be_atom | -14.352011 | 2.0 |
c_atom | -37.089740 | 2.0 |
f_atom | -97.986588 | 2.0 |
h_atom | -0.466582 | 2.0 |
he_atom | -2.807913 | 2.0 |
li_atom | -7.315604 | 4.0 |
n_atom | -53.554678 | 2.0 |
ne_atom | -126.604573 | 2.0 |
o_atom | -73.661918 | 2.0 |
df["hf/sto-3g"]["scf_total_energy"]
will give us the SCF total energy for all of the records in the dataset with the hf/sto-3g
specification.
df["hf/sto-3g"]["scf_total_energy"]
entry
b_atom -24.149117
be_atom -14.352011
c_atom -37.089740
f_atom -97.986588
h_atom -0.466582
he_atom -2.807913
li_atom -7.315604
n_atom -53.554678
ne_atom -126.604573
o_atom -73.661918
Name: scf_total_energy, dtype: float64
Submitting Computations#
Beyond retrieving results and querying the database, QCArchive provides a robust system for submitting computations. You may submit single computations, multiple computations, or computations to create a dataset.
Our QCArchive demo server is publicly readable. This means you do not need a username or password to access the data. However, to submit computations, a username and password is required.
Protecting usernames and passwords
When connecting to QCArchive using a username and password, be careful to never commit this information to publicly accessible repositories. You can store credentials in environment variables, as shown in the cell below, or you can read user information from a file.
In the cell below, we read environment variables set in the local environment for our username and password.
We retrieve these using os.environ.get
.
import os
import qcportal as ptl
from qcportal.molecules import Molecule
client = ptl.PortalClient("https://qcademo.molssi.org",
username=os.environ.get("QCArchiveUsername"),
password=os.environ.get("QCArchivePWD"))
WARNING: This client version is newer than the server version. This may work if the versions are close, but expect exceptions and errors if attempting things the server does not support. client version: 0.57.post14+g011be51d, server version: 0.57
We now have a QCPortal client that can be used to submit computations.
How do I submit a computation?#
QCArchive currently supports seven different computation types including single point, geometry optimization, reactions, and torsion drives.
For this overview, we will show submitting a single point computation for water using two different methods. This notebook shows inputting an XYZ string for our molecule, but there are a number of ways to enter molecule information. Our molecule geometry in this example is an optimized structure of water.
water_xyz = """3
H 0.026223561887 1.224983815810 0.000000000000
H 0.971741135004 0.039335313725 0.000000000000
O 0.002035305512 0.235680871424 0.000000000000"""
water = Molecule.from_data(water_xyz)
If NGLView is installed in your environment, the molecule objects in QCArchive can be visualized using NGLView by putting the variable representing the molecule as the last thing in a notebook cell.
water
To submit our single point computation, we will use the add_singlepoints
method.
We will submit two single point computations for the same molecule using different methods.
For add_singlepoints
, you specify the program you want to run (Psi4 in our case), the driver, the method and the basis set.
The driver
determines what is in the return_result
for the record.
For this demonstration, we are submitting two single point calculations for water with a differing method (b3lyp
vs mp2
).
b3lyp_meta, b3lyp_record_ids = client.add_singlepoints([water],
program='psi4',
driver='energy',
method='b3lyp',
basis='def2-tzvp')
mp2_meta, mp2_record_ids = client.add_singlepoints([water],
program='psi4',
driver='energy',
method='mp2',
basis='def2-tzvp')
Once submitted, we can retrieve the results using the get_records
method shown earlier in the tutorial.
b3lpy_record = client.get_records(b3lyp_record_ids[0])
mp2_record = client.get_records(mp2_record_ids[0])
print(f"B3LYP Status:\t{b3lpy_record.status}")
print(f"MP2 Status:\t{mp2_record.status}")
B3LYP Status: RecordStatusEnum.complete
MP2 Status: RecordStatusEnum.complete
When the computations are complete, we can retrieve the energies in the same way we did earlier.
print(f"B3LYP result: {b3lpy_record.return_result}")
print(f"MP2 result: {mp2_record.return_result}")
B3LYP result: -75.76802068303165
MP2 result: -75.63546371075435
How do I create datasets?#
Instead of submitting these computations separately, we could have grouped them together in a dataset. This would allow us to more easily retrieve the results together.
To create a dataset, you use the create_dataset
method.
ds = client.add_dataset("singlepoint",
name="Water calculations",
description="Single point calculations of water at various levels of theory.")
Creation of datasets is beyond the scope of this overview tutorial. For more information on dataset construction, see the Dataset Quickstart.