Demo Tutorial Guide¶
This guide will go over a short demo project to make users more familiar with the various nuts and bolts that go behind a typical ChemHTPS project’s work-flow.
chemhtpsshell --setup_project --project_name trial
The above command will start a new project called “trial” at the current location.
$ ls trial
archive db jobpool job_templates lost+found queue_list.dat screeninglib trial.config
Among the contents of a project folder:
- “archive/” is where successfull jobs get tarred and archived to
- “db/” should be used for storing the generated data in a data-base
- “jobpool/” keeps all the jobs generated that are yet to be run under “short/” and “long/” sub-folders
- “job_templates/” keeps the project-specific SLURM scripts and program-template files. These are essentially copies of the files in the “chemhtps/chemhtps/metadata/job_templates/” folder. Users can make their desired SLURM scripts in the original metadata location as well
- “lost+found/” stores all the unsuccessful jobs
- “queue_list.dat” contains the various cluster/partition limits that can be defined by the user before a job-run
- “screeninglib/” contains all the files and folders for library-generation purposes
- “trial.config” is the config-file for this project
Although, ChemHTPS can be used purely for work-flow purposes, in our example we will show its use while interfacing with ChemLG for library generation.
Once, the project has been setup as shown above, the config-file “trial.config” will look like:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 | project_name = trial
user_name = yudhajit
log_file = trial.log
error_file = trial.err
generatelib
generation_file = none
building_block = none
cores = 0
clusters = none
generatejobs
library_in = none
job_type = none
program = none
template = none
feedjobs
feed_local = TRUE
job_sched = SLURM
postprocess
populatedb
|
Once the user has downloaded and installed chemlg, (for instructions see https://hachmannlab.github.io/chemlg/), the “config.dat” file under the “chemlg/chemlg/templates/” folder should be copied into the “trial/screeninglib/” folder
Without going into stuff that will require an in-depth discussion of the mechanics of ChemLG, in this example we will be using a library that will be generated from a file already containing a list of SMILES codes for molecules.
1 2 3 4 5 6 7 8 9 | C(=O)(N)N
C(=S)(N)N
CNC(=O)NC
CN(C)C(=O)N
CC(=O)N
C(=O)(C(F)(F)F)N
C(C(CO)O)O
C(CO)O
C(CCO)CO
|
We will use ChemLG to just convert the smiles in the above file into their .xyz formats. The ChemLG config-file for this will look like:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 | Do not change the order of the generation rules below.
1. Include building blocks == None
2. Min and max no. of bonds == None
3. Min and max no. of atoms == None
4. Min and max mol.weight == None
5. Min and max no. of rings == None
6. Min and max no. of aromatic rings == None
7. Min and max no. of non-aromatic rings == None
8. Min and max no. of single bonds == None
9. Min and max no. of double bonds == None
10. Min and max no. of triple bonds == None
11. Max no. of specific atoms == None
12. Lipinski rule == False
13. Fingerprint matching == None
14. Substructure == None
15. Substructure exclusion == None
16. Include_BB == False
Combination type for molecules :: Link
Number of generations :: 0
Molecule format in output file :: xyz
Maximum files per folder :: 1000
Library name :: trial_lib_
|
and the ChemHTPS will be something like:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 | project_name = trial
user_name = yudhajit
log_file = trial.log
error_file = trial.err
generatelib
generation_file = ./screeninglib/config.dat
building_block = ./screeninglib/trial-smiles.dat
cores = 1
clusters = run locally
generatejobs
library_in = none
job_type = none
program = none
template = none
feedjobs
feed_local = TRUE
job_sched = SLURM
postprocess
populatedb
|
Note that the generation number in the ChemLG config-file has been kept 0 since we are not generating any new molecules and are converting the existing ones between file-formats. Now, the library-generator call will be done (note: this MUST be done from within the project folder).
$ chemhtpsshell --generatelib
After the run, the “screeninglib/” folder will look like:
$ ls screeninglib/
building_blocks.dat config.dat final_library.csv logfile.txt trial_lib_xyz trial-smiles.dat
$ ls screeninglib/trial_lib_xyz/1_1000
1.xyz 2.xyz 3.xyz 4.xyz 5.xyz 6.xyz 7.xyz 8.xyz 9.xyz
Moving on to the job-generation step, the options for generatejobs need to be modified. For our case the config-file looks like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 | project_name = trial_htps
user_name = yudhajit
log_file = trial_htps.log
error_file = trial_htps.err
generatelib
generation_file = ./screeninglib/config.dat
building_block = ./screeninglib/trial-smiles.dat
cores = 1
clusters = run locally
generatejobs
library_in = trial_lib_xyz
job_type = long
program = ORCA
template = ./job_templates/ORCA/test_orca.inp
feedjobs
feed_local = TRUE
job_sched = SLURM
postprocess
populatedb
|
We will be generating input files using the template file as shown above. For more information on template files and their working, please refer to:
ChemHTPS Template Guide
Once this step is completed, the job-folders look like the following:
$ ls jobpool/long/
jg1_ORCA_trial_lib_xyz_1_1000
$ ls jobpool/long/jg1_ORCA_trial_lib_xyz_1_1000/
1 2 3 4 5 6 7 8 9
$ ls jobpool/long/jg1_ORCA_trial_lib_xyz_1_1000/1/
1.inp 1.xyz
Note that in the above case, the jobs are all put together in a folder with the prefix of jg1. This signifies that this group of jobs was created first and so on and so forth. This distinction therefore allows the job-feeder module to proceed serially and prioritize the earlier jobs that were created. Furthermore, every cycle of job-generation can be referred back to in the “job_gen.log” log-file. One can look in the log-file to check what the config-options were for every job-generator module call. Each individual job-folder within the larger class contains the correponding .xyz and the .inp input files as seen above with 1.xyz and 1.inp files, respectively.
After making appropriate changes to feed_local and job_sched options in the config-file under the feedjobs section, we can finally execute all the jobs under the “jobpool/” folder by:
$ chemhtpsshell --feedjobs
These jobs will be submitted according to the limits set in the “queue_list.dat” file:
1 2 3 4 5 6 | #cluster partition limit type
ub-hpc general-compute 0 long
ub-hpc debug 0 short
ub-hpc gpu 0 long
ub-hpc largemem 0 long
chemistry beta 10 long
|
In our case, we will set the limit of jobs to be run in the “beta” cluster/partition combination as 10 since we have 9 jobs to run. The definitions of the cluster/partition combinations can be changed in this file and the corresponding job-scheduler (SLURM) scripts shoulld be available in the “job_templates/” folder. One can try it out and note the logs and the error messages in “feedjobs.out” and “job_checker.err”, respectively.
Finally, we would like to make two important points, one of them being a reminder. All the module-calls apart from –setup_project must be made from within the project directory. Additionally, all these non-setup module-calls can also be combined into a one-line command after making the desired changes for all the modules in the config-file. This command simply looks like:
$ chemhtpsshell --generatelib --generatejobs --feedjobs
Note that even when the order of the module-calls look like:
$ chemhtpsshell --generatelib --feedjobs --generatejobs
The end result of the execution however, will still proceed in the library –> job-generation –> job-feeder logic-flow.