I am a final year Ph.D. candidate in Natural Language Processing (NLP) at Sofia University “St. Kliment Ohridski”. My research interests include question answering, conversational agents, patterns-based few-shot learning, distant supervision, cross-lingual and cross-domain transfer. I was a Research Scientist in Machine Learning at Checkstep Research, where I worked on NLP for content moderation using few-shot learning, semi-supervised and multi-source fine-tuning of pre-trained language models.
I have a strong software engineering background along with extensive experience in developing production systems. I previously held a position as a Machine Learning Engineer at ReceiptBank (now Dext) where I built NLP models for automatic field information extraction from unstructured payment documents. I was also a Machine Learning Engineer in an ad-tech company, Adcash OÜ, where I developed distributed and scalable models for ad recommendations. Prior to that, I created and maintained desktop and web applications while in the roles of a full-stack Software Engineer and Team Lead at Acstre.
Ph.D. Candidate in Natural Language Processing, 2017-2022 (Expected)
Sofia University "St. Kliment Ohridski", Sofia, Bulgaria
MSc in Information Retrieval and Knowledge Discovery, 2014-2016
Sofia University "St. Kliment Ohridski"
BEng in Computer Systems and Technologies, 2010-2014
Conducted research in natural language processing for content moderation and stance detection.
Worked on automatic field information extraction from unstructured payment documents.
Developed distributed and scalable product serving ad recommendations.
Designed, developed and maintained desktop and web applications. Led a small team of developers.
A new model to learn from a large-scale collection of 330,000 tweets paired with a responding fact-checking article based on modified self-adaptive training is proposed and shown to show improvements over the state of the art by two points absolute.
This survey examines the relationship between stance detection and mis- and disinformation detection from a holistic viewpoint and reviews and analyzes existing work in this area.
A novel interpretable framework for cross-lingual content flagging, which significantly outperforms prior work both in terms of predictive performance and average inference time and can easily adapt to new instances without the need to retrain it from scratch.
This paper presents the most comprehensive study of cross-lingual stance detection to date, and proposes sentiment-based generation of stance data for pre-training, which shows sizeable improvement of more than 6% F1 absolute in low-shot settings compared to several strong baselines.
An in-depth analysis of 16 stance detection datasets is performed, and an end-to-end unsupervised framework for outof-domain prediction of unseen, user-defined labels is proposed, which combines domain adaptation techniques such as mixture of experts and domain-adversarial training with label embeddings.
This work proposes EXAMS – a new benchmark dataset for cross-lingual and multilingual question answering for high school examinations and performs various experiments with existing top-performing multilingual pre-trained models to show that EXAMS offers multiple challenges that require multilingual knowledge and reasoning in multiple domains.