I am a fourth year CS PhD student in the MURGe Lab (part of the bigger UNC NLP Lab) at the University of North Carolina at Chapel Hill, advised by Prof. Mohit Bansal. My PhD is supported by a Google PhD Fellowship and a Rebecca and Munroe Cobey Fellowship.
My broad research interests are in interpretable machine learning and natural language processing. I am generally interested in multi-step reasoning problems (over text and semi-structured data), often referred to as the System 2 Reasoning. In the past, I have developed interpretable models that can generate Natural Language Proofs for formal reasoning (EMNLP 2020, NAACL 2021), Explanation Graphs for structured commonsense reasoning (EMNLP 2021, ACL 2022), Summarization Programs for abstractive summarization (arxiv 2022), and Multi-step Reasoning Paths for Text Generation from semi-structured data (arxiv 2022). I also wrote a paper on connecting explainability to data hardness (EMNLP 2022). During my PhD, I have spent two wonderful summers interning at FAIR Labs, Meta AI Research and Salesforce AI Research.
Before starting my PhD, I was a Research Engineer at IBM Research - India building industry-scale Intelligent Tutoring Systems (IJCAI 2019, CIKM 2018, AIED 2018). Even before that, I was an M.Tech. student in the CS department at IIT, Delhi, having worked with Prof. Mausam and developed the state-of-the-art Open Information Extraction system (Open IE 5.0) (ACL 2017, COLING 2018).
MURMUR: Modular Multi-Step Reasoning for Semi-Structured Data-to-Text Generation
Swarnadeep Saha, Xinyan Velocity Yu, Mohit Bansal, Ramakanth Pasunuru, and Asli Celikyilmaz
Pre-print on arXiv, 2022
[Long] [paper] [code]
Summarization Programs: Interpretable Abstractive Summarization with Neural Modular Trees
Swarnadeep Saha, Shiyue Zhang, Peter Hase, and Mohit Bansal
International Conference on Learning Representations (ICLR) 2023
[Long] [paper] [code]
Are Hard Examples also Harder to Explain? A Study with Human and Model-Generated Explanations
Swarnadeep Saha, Peter Hase, Nazneen Rajani, and Mohit Bansal
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
[Short] [Oral] [paper] [data]
Explanation Graph Generation via Pre-trained Language Models: An Empirical Study with Contrastive Learning [Accept Rate: 21%]
Swarnadeep Saha, Prateek Yadav, and Mohit Bansal
Annual Meeting of the Association for Computational Linguistics (ACL), 2022
[Long] [Poster] [paper] [code]
ExplaGraphs: An Explanation Graph Generation Task for Structured Commonsense Reasoning [Acceptance Rate: 23%]
Swarnadeep Saha, Prateek Yadav, Lisa Bauer, and Mohit Bansal
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2021
[Long] [Oral] [paper] [data/code] [website]
multiPRover: Generating Multiple Proofs for Improved Interpretability in Rule Reasoning [Acceptance Rate: 26%]
Swarnadeep Saha, Prateek Yadav, and Mohit Bansal
Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), 2021
[Long] [Oral] [paper] [code]
PRover: Proof Generation for Interpretable Reasoning over Rules [Acceptance Rate: 22%]
Swarnadeep Saha, Sayan Ghosh, Shashank Srivastava, and Mohit Bansal
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2020
[Long] [Oral] [paper] [code]
ConjNLI: Natural Language Inference over Conjunctive Sentences [Acceptance Rate: 22%]
Swarnadeep Saha, Yixin Nie, and Mohit Bansal
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2020
[Long] [Poster] [paper] [data/code]
Pre-Training BERT on Domain Resources for Short Answer Grading [Acceptance Rate: 23%]
Chul Sung, Tejas Dhamecha, Swarnadeep Saha, Tengfei Ma, Vinay Reddy, and Rishi Arora
Conference on Empirical Methods in Natural Language Processing (EMNLP-IJCNLP), 2019
[Short] [Poster] [paper]
Aligning Learning Objectives to Learning Resources: A Lexico-Semantic Spatial Approach [Acceptance Rate: 17%]
Swarnadeep Saha, Malolan Chetlur, Tejas Indulal Dhamecha, W M Gayathri K Wijayarathna, Red Mendoza, Paul Gagnon, Nabil Zary, and Shantanu Godbole
28th International Joint Conference on Artificial Intelligence (IJCAI), 2019
[Long] [Oral+Poster] [paper]
Creating Scoring Rubric from Representative Student Answers for Improved Short Answer Grading [Acceptance Rate: 17%]
Smit Marvaniya, Swarnadeep Saha, Tejas I. Dhamecha, Peter Foltz, Renuka Sindhgatta, and Bikram Sengupta
27th ACM International Conference on Information and Knowledge Management (CIKM), 2018
[Long] [Oral] [paper]
Open Information Extraction from Conjunctive Sentences [Acceptance Rate: 37%]
Swarnadeep Saha, and Mausam
27th International Conference on Computational Linguistics (COLING), 2018
[Long] [Oral] [paper] [code] [slides]
Sentence Level or Token Level Features for Automatic Short Answer Grading?: Use Both [Acceptance Rate: 25%]
Swarnadeep Saha, Tejas I. Dhamecha, Smit Marvaniya, Renuka Sindhgatta, and Bikram Sengupta
19th International Conference of AI in Education (AIED), 2018
[Long] [Oral] [paper] [slides]
Balancing Human Efforts and Performance of Student Response Analyzer in Dialog-based Tutors [Acceptance Rate: 25%]
Tejas I. Dhamecha, Smit Marvaniya, Swarnadeep Saha, Renuka Sindhgatta, and Bikram Sengupta
19th International Conference of AI in Education (AIED), 2018
[Long] [Oral] [paper] [slides]
Bootstrapping for Numerical Open IE [Acceptance Rate: 18%]
Swarnadeep Saha, Harinder Pal, and Mausam
55th Annual Meeting of the Association for Computational Linguistics (ACL), 2017
[Short] [Poster] [paper] [code] [poster]
COL 774: Machine Learning - Spring 2016
Instructor: Prof. Parag Singla
COL 333/COL 671: Artificial Intelligence - Fall 2016
Instructor: Prof. Mausam