QCArchive in 15 minutes#
This tutorial will give you an overview of possible actions in QCArchive. Using QCArchive, you can:
Submit a single or set of computations to a server, following a variety of workflows.
Retrieve the results of previous computations.
Query the database for particular computations.
Create datasets holding related quantum chemistry computations.
Retrieve results from datasets.
This notebook will briefly walk you through each capability. For more details, we recommend you follow our other starter tutorials.
Connecting to a server#
To work with QCArchive, you will need to connect to a QCArchive server.
For the QuickStart tutorials, we will connect to the QCArchive Demo Server.
To interact with a server, you will create a QCPortal client using PortalClient
.
The argument to PortalClient
is the server address.
To work with the QCArchive demo server, enter https://qcademo.molssi.org
.
import qcportal as ptl
client = ptl.PortalClient("https://qcademo.molssi.org")
print(client)
---------------------------------------------------------------------------
PortalRequestError Traceback (most recent call last)
Cell In[1], line 3
1 import qcportal as ptl
----> 3 client = ptl.PortalClient("https://qcademo.molssi.org")
4 print(client)
File ~/work/QCFractal/QCFractal/qcportal/qcportal/client.py:150, in PortalClient.__init__(self, address, username, password, verify, show_motd, cache_dir, cache_max_size, memory_cache_key)
116 def __init__(
117 self,
118 address: str,
(...)
126 memory_cache_key: Optional[str] = None,
127 ) -> None:
128 """
129 Parameters
130 ----------
(...)
147 Maximum size of the cache directory
148 """
--> 150 PortalClientBase.__init__(self, address, username, password, verify, show_motd)
151 self._logger = logging.getLogger("PortalClient")
152 self.cache = PortalCache(address, cache_dir, cache_max_size)
File ~/work/QCFractal/QCFractal/qcportal/qcportal/client_base.py:156, in PortalClientBase.__init__(self, address, username, password, verify, show_motd)
153 self._jwt_refresh_exp = None
155 # Try to connect and pull the server info
--> 156 self.server_info = self.get_server_information()
157 self.server_name = self.server_info["name"]
158 self.api_limits = self.server_info["api_limits"]
File ~/work/QCFractal/QCFractal/qcportal/qcportal/client.py:190, in PortalClient.get_server_information(self)
181 """Request general information about the server
182
183 Returns
(...)
186 Server information.
187 """
189 # Request the info, and store here for later use
--> 190 return self.make_request("get", "api/v1/information", Dict[str, Any])
File ~/work/QCFractal/QCFractal/qcportal/qcportal/client_base.py:416, in PortalClientBase.make_request(self, method, endpoint, response_model, body_model, url_params_model, body, url_params, allow_retries)
413 if isinstance(parsed_url_params, pydantic.BaseModel):
414 parsed_url_params = parsed_url_params.dict()
--> 416 r = self._request(
417 method, endpoint, body=serialized_body, url_params=parsed_url_params, allow_retries=allow_retries
418 )
419 d = deserialize(r.content, r.headers["Content-Type"])
421 if response_model is None:
File ~/work/QCFractal/QCFractal/qcportal/qcportal/client_base.py:381, in PortalClientBase._request(self, method, endpoint, body, url_params, internal_retry, allow_retries)
376 except:
377 # If this error comes from, ie, the web server or something else, then
378 # we have to use 'reason'
379 details = {"msg": r.reason}
--> 381 raise PortalRequestError(f"Request failed: {details['msg']}", r.status_code, details)
383 return r
PortalRequestError: Request failed: Not Found (HTTP status 404)
We now have a QCPortal client that we can use to read from and query the QCArchive demo server.
Retrieving Data and Querying the Database#
How do I retrieve computation results by ID?#
We can retrieve computations by ID using the client.get_records
function.
Each computation in the database is given an integer ID number.
The cell below shows retrieval of the calculation result with ID 1.
We see that this computation was a single point computation.
first_record = client.get_records(1)
print(first_record)
Typically, properties from a calculation can be viewed using the .properties
attribute for a result.
The calculated properties are in a dictionary.
In the cell below, we print the SCF total energy from our calcuation.
print(first_record.properties["scf_total_energy"])
We can print information about the computation like the molecule name and properties.
print(f"Molecule: {first_record.molecule.name}, Energy: {first_record.properties['scf_total_energy']}")
If you pass in several IDs, you will receive a list of results that can be iterated through.
multiple_records = client.get_records([1, 2, 3])
for record in multiple_records:
print(f"Molecule: {record.molecule.name}, Energy: {record.properties['scf_total_energy']}")
How do I search for particular types of computations?#
You can search the database for particular computations using query_records
.
For example, to see all results from a particular time period, we can use
query_records
with arguments created_before
and created_after
.
There are many fields you can query the database on and this can differ by the type of computation you’d like to retrieve. The query method returns a Python iterator.
records = client.query_records(created_after="2024/01/01")
# Print the first record.
print(next(records))
How do I retrieve results from a dataset?#
QCArchive also supports data to be stored in datasets. A dataset is a set of related computations. Datasets can be created when computations are submitted, or after computations have completed, and computations can belong to multiple datasets. Datasets are the primary use case for QCArchive, and are usually created with large-scale workflows. Datasets will contain only one type of calculation.
We can list all of the datasets on the server we’ve connected to using list_datasets
.
Below, we print the names of the data sets on the QCArchive demo server.
datasets = client.list_datasets()
for dataset in datasets:
print(f"Name: {dataset['dataset_name']}, Type: {dataset['dataset_type']}")
We can retrieve records from a particular dataset using the get_dataset
method and passing in the dataset name and type. The following cell retrieves the “Element Benchmark” dataset.
ds = client.get_dataset(dataset_type="singlepoint", dataset_name="Element Benchmark")
print(ds.description)
Datasets have a lot of properties that are beyond the scope of this overview. Datasets are made up of many records of the same type of computation that can differ in molecule identity or other specification parameters. You can pull out iterate over records, see specifications, and compile values from records.
The cell below shows using the get_properties_df
method to create a pandas dataframe containing the SCF total energy and SCF iterations from each record.
# use compile_values to make a dataframe
df = ds.get_properties_df(["scf_total_energy", "scf_iterations"])
# view the first 10 rows.
df.head(10)
This dataframe is a multi-index dataframe with the top level index being the “specification” of our calculation.
For example, we can pull out just our results for hf/sto-3g
.
df["hf/sto-3g"]
df["hf/sto-3g"]["scf_total_energy"]
will give us the SCF total energy for all of the records in the dataset with the hf/sto-3g
specification.
df["hf/sto-3g"]["scf_total_energy"]
Submitting Computations#
Beyond retrieving results and querying the database, QCArchive provides a robust system for submitting computations. You may submit single computations, multiple computations, or computations to create a dataset.
Our QCArchive demo server is publicly readable. This means you do not need a username or password to access the data. However, to submit computations, a username and password is required.
Protecting usernames and passwords
When connecting to QCArchive using a username and password, be careful to never commit this information to publicly accessible repositories. You can store credentials in environment variables, as shown in the cell below, or you can read user information from a file.
In the cell below, we read environment variables set in the local environment for our username and password.
We retrieve these using os.environ.get
.
import os
import qcportal as ptl
from qcportal.molecules import Molecule
client = ptl.PortalClient("https://qcademo.molssi.org",
username=os.environ.get("QCArchiveUsername"),
password=os.environ.get("QCArchivePWD"))
We now have a QCPortal client that can be used to submit computations.
How do I submit a computation?#
QCArchive currently supports seven different computation types including single point, geometry optimization, reactions, and torsion drives.
For this overview, we will show submitting a single point computation for water using two different methods. This notebook shows inputting an XYZ string for our molecule, but there are a number of ways to enter molecule information. Our molecule geometry in this example is an optimized structure of water.
water_xyz = """3
H 0.026223561887 1.224983815810 0.000000000000
H 0.971741135004 0.039335313725 0.000000000000
O 0.002035305512 0.235680871424 0.000000000000"""
water = Molecule.from_data(water_xyz)
If NGLView is installed in your environment, the molecule objects in QCArchive can be visualized using NGLView by putting the variable representing the molecule as the last thing in a notebook cell.
water
To submit our single point computation, we will use the add_singlepoints
method.
We will submit two single point computations for the same molecule using different methods.
For add_singlepoints
, you specify the program you want to run (Psi4 in our case), the driver, the method and the basis set.
The driver
determines what is in the return_result
for the record.
For this demonstration, we are submitting two single point calculations for water with a differing method (b3lyp
vs mp2
).
b3lyp_meta, b3lyp_record_ids = client.add_singlepoints([water],
program='psi4',
driver='energy',
method='b3lyp',
basis='def2-tzvp')
mp2_meta, mp2_record_ids = client.add_singlepoints([water],
program='psi4',
driver='energy',
method='mp2',
basis='def2-tzvp')
Once submitted, we can retrieve the results using the get_records
method shown earlier in the tutorial.
b3lpy_record = client.get_records(b3lyp_record_ids[0])
mp2_record = client.get_records(mp2_record_ids[0])
print(f"B3LYP Status:\t{b3lpy_record.status}")
print(f"MP2 Status:\t{mp2_record.status}")
When the computations are complete, we can retrieve the energies in the same way we did earlier.
print(f"B3LYP result: {b3lpy_record.return_result}")
print(f"MP2 result: {mp2_record.return_result}")
How do I create datasets?#
Instead of submitting these computations separately, we could have grouped them together in a dataset. This would allow us to more easily retrieve the results together.
To create a dataset, you use the create_dataset
method.
ds = client.add_dataset("singlepoint",
name="Water calculations",
description="Single point calculations of water at various levels of theory.")
Creation of datasets is beyond the scope of this overview tutorial. For more information on dataset construction, see the Dataset Quickstart.