Sartaaj Khan

Advancing data-driven materials for climate and energy.

I am a PhD candidate in the Department of Chemical Engineering and Applied Chemistry at the University of Toronto. My work involves building geometry-aware representations of metal-organic frameworks (MOFs) in machine/deep learning. If you ever want to talk about AI, chemistry, material science, computer science or just about life, feel free to contact me!

🧪 Research Interests

2025 • Connecting metal-organic framework synthesis to applications using multimodal machine learning

Screening for MOF applications in sustainable energy

We want to leverage data readily available upon MOF synthesis and use that to directly recommend MOFs for applications such as carbon capture, gas storage and semiconductors.

Multimodal ML Water stability Carbon capture XRD

2025 • Capturing non-local features of crystals from their bond networks

Geometry-aware representations for porous crystals

A big challenge in existing methods is capturing long-range interactions (and geometry) of porous crystals. Building new representations and algorithms that can capture this aids in the prediction of geometry-reliant MOF properties such as mechanical properties, gas separation and charge transport.

DescriptorsGNNTopology

2025 • MOF-ChemUnity: Unifying MOF data using large language models

MOF Database Development

Using large language models (LLMs), it is now possible to use extraction methods to construct MOF databases that link experimental and existing computational methods together. However, proper curation of these databases is needed for the development of benchmarks to build robust ML models.

Large language model (LLM)BenchmarksCuration

🚀 Publications

Please refer to my Google Scholar for my full list of authored works.

Khan, S. T., & Moosavi, S. M. (2025). Connecting metal-organic framework synthesis to applications using multimodal machine learning. Nature Communications, 16, 5642.

Open access Code

Khan, S. T., & Moosavi, S. M. (2025). Connecting metal-organic framework synthesis to applications using multimodal machine learning. *Nature Communications, 16*, 5642. https://doi.org/10.1038/s41467-025-60796-0
Ai, Q., Khan, S. T., Barthel, S., & Moosavi, S. M. (2025, March). Capturing global features of crystals from their bond networks. AI for Accelerated Materials Design (ICLR 2025).

Co-first author (equal contribution).

Open access Code

Ai, Q., Khan, S. T., Barthel, S., & Moosavi, S. M. (2025, March). Capturing global features of crystals from their bond networks. *AI for Accelerated Materials Design (ICLR 2025)*. https://openreview.net/forum?id=wLSmBbYDY5
Pruyn, T. M., Aswad, A., Khan, S. T., Black, R., & Moosavi, S. M. (2025). MOF-ChemUnity: Unifying metal-organic framework data using large language models. Under review in Journal of American Chemical Society.

Open access Code

Pruyn, T. M., Aswad, A., Khan, S. T., Black, R., & Moosavi, S. M. (2025). MOF-ChemUnity: Unifying metal-organic framework data using large language models. *Preprint*.
Zimmermann, Y., Bazgir, A., …, Khan, S. T., …, & Blaiszik, B. (2025). 32 examples of LLM applications in materials science and chemistry: Towards automation, assistants, agents, and accelerated scientific discovery. Machine Learning: Science and Technology.

Open access Code

Zimmermann, Y., Bazgir, A., …, Khan, S. T., …, & Blaiszik, B. (2025). 32 examples of LLM applications in materials science and chemistry: Towards automation, assistants, agents, and accelerated scientific discovery. *Machine Learning: Science and Technology*.
Kochi, M. R., Rezaei, H., Khan, S. T., Mamillapalli, B. T., Ebrahimiazar, M., Ye, H., … & Moosavi, S. M. (2025). Thermodynamics-informed machine learning for predicting temperature-dependent chemical properties. Preprint.

Open access Code

Kochi, M. R., Rezaei, H., Khan, S. T., Mamillapalli, B. T., Ebrahimiazar, M., Ye, H., … & Moosavi, S. M. (2025). Thermodynamics-informed machine learning for predicting temperature-dependent chemical properties. *Preprint*.
Zimmermann, Y., Bazgir, A., …, Khan, S. T., … & Blaiszik, B. (2024). Reflections from the 2024 large language model (LLM) hackathon for applications in materials science and chemistry. arXiv preprint (arXiv:2411.15221).

Open access

Zimmermann, Y., Bazgir, A., …, Khan, S. T., … & Blaiszik, B. (2024). Reflections from the 2024 large language model (LLM) hackathon for applications in materials science and chemistry. *arXiv* preprint (arXiv:2411.15221). https://arxiv.org/abs/2411.15221

🏆 Awards

Selected scholarships, awards and honours.

Queen Elizabeth II Graduate Scholarship in Science & Technology

The Queen Elizabeth II Graduate Scholarship in Science and Technology (QEII‐GSST) program is designed to encourage excellence in graduate studies in science and technology. The program is supported through funds provided by the Ministry of Colleges and Universities and funds raised by the University of Toronto from the private sector. Visit this for more information.

Amount:$15,000 Period:Sept 2025 - Sept 2026
University of Toronto Fellowship

Received upon admission into the Masters of Applied Science (MASc) program at the University of Toronto in Chemical Engineering and Applied Chemistry for academic and professional achievements.

Amount:$13,200 Period:Sept 2023 - Sept 2025
LLM Hackathon in Chemistry and Material Science

Awarded 3rd place ($250 prize funded by Radical AI and Anthropic) in the LLM Hackathon for Applications in Materials and Chemistry for the development of PoreVoyant - a chemistry-informed AI agent that can generate new linkers to decrease the band gap in metal-organic frameworks while leveraging MOF literature. This work was featured in a Medium Article.

Amount:$250 Period:May 2024
Bayesian Optimization in Chemistry and Material Science Hackathon

Awarded 2nd place ($500 CAD prize funded by Acceleration Consortium) in the Bayesian Optimization for Chemistry and Materials Hackathon hosted by Acceleration Consortium at the University of Toronto. The project involved the application of Bayesian Optimization to accelerate the discovery of fluids with the highest heat transfer coefficients. The repository can be found here.

Amount:$500 Period:March 2024
NSERC USRA

Award of $6000 CAD issued for completion of 16 weeks of research work for the summer 2020 term and high academic achievements.

Amount:$6,000 Period:May 2020 - Sept 2020

🤖 Software and Datasets

Open-source tools and datasets supporting the community.

MOF recommendation system

XRayPro

A recommendation system for MOFs leveraging only PXRDs and precursors.

Physics-informed ML

ThermoML

Thermodynamic-informed model for predicting thermophysical properties of fluids.

MOF database

MOF-ChemUnity

A knowledge graph unifying computational and experimental data for MOFs.

Solution chemistry

pySolution

A solution chemistry toolkit that computes solution characteristics for modeling purposes.

🎓 Teaching

Courses and instructional roles.

DEL-Bootcamp for Drug Discovery

Period:Feb 2024 - Present Role:ML Tutor
CHE260: Thermodynamics and Heat Transfer

Period:Sept 2025 - Present Role:Teaching Assistant (TA)
CHE333: Chemical Reaction Engineering

Period:Jan 2026 - Present Role:Teaching Assistant (TA)

Advancing data-driven materials for climate and energy.

🧪 Research Interests

Screening for MOF applications in sustainable energy

Geometry-aware representations for porous crystals

MOF Database Development

🚀 Publications

🏆 Awards

Queen Elizabeth II Graduate Scholarship in Science & Technology

University of Toronto Fellowship

LLM Hackathon in Chemistry and Material Science

Bayesian Optimization in Chemistry and Material Science Hackathon

NSERC USRA

🤖 Software and Datasets

XRayPro

ThermoML

MOF-ChemUnity

pySolution

🎓 Teaching

DEL-Bootcamp for Drug Discovery

CHE260: Thermodynamics and Heat Transfer

CHE333: Chemical Reaction Engineering