Anoop Kunchukuttan

Hyderabad, Telangana, India Contact Info

Sign in to view Anoop’s full profile

Welcome back

By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.

New to LinkedIn? Join now

or

By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.

New to LinkedIn? Join now

4K followers 500+ connections

View mutual connections with Anoop

Welcome back

By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.

New to LinkedIn? Join now

or

By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.

New to LinkedIn? Join now

Join to view profile

Microsoft

Indian Institute of Technology, Bombay

About

I am interested in Natural Language Processing and Machine Learning.

My primary…

Activity

Meet the speaker @ #GlobalIndiaAISummit -Mr. Jibu Elias, Country Lead for the Responsible Computing Challenge, Mozilla Foundation. Watch as he shares…

Meet the speaker @ #GlobalIndiaAISummit -Mr. Jibu Elias, Country Lead for the Responsible Computing Challenge, Mozilla Foundation. Watch as he shares…

Liked by Anoop Kunchukuttan
🤩 Starting #ACMIndia #VicePresidentGiri. 🤩 #Humbled #StrengthofWellWishers #ProfGiri 👏 Congratulations Prof. Meenakshi D'Souza for President &…

🤩 Starting #ACMIndia #VicePresidentGiri. 🤩 #Humbled #StrengthofWellWishers #ProfGiri 👏 Congratulations Prof. Meenakshi D'Souza for President &…

Liked by Anoop Kunchukuttan
Excited to share that I have started a rotation in a Microsoft product group focusing on M365 Copilots to bring research on multilingual language…

Excited to share that I have started a rotation in a Microsoft product group focusing on M365 Copilots to bring research on multilingual language…

Liked by Anoop Kunchukuttan

Join now to see all activity

Experience & Education

Microsoft

****** ********* ** **********, ******

******* *******
****** ********* ** **********, ******

******** ********* ******* *******
****** ********* ** **********, ******

****** ** ********** (**.*.) ******* ******** **********, ******* ***********, ****** ******** ***, **** ********

2012 - 2018
****** ********* ** **********, ******

*.**** ******* ******** **********, *********** *********, ******* ********

2006 - 2008

View Anoop’s full experience

See their title, tenure and more.

By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.

Publications

Learning Multilingual Word Embeddings in Latent Metric Space: A Geometric Approach

Preprint - arxiv August 27, 2018
We propose a novel geometric approach for learning bilingual mappings given monolingual embeddings and a bilingual dictionary. Our approach decouples learning the transformation from the source language to the target language into (a) learning rotations for language-specific embeddings to align them to a common space, and (b) learning a similarity metric in the common space to model similarities between the embeddings. We model the bilingual mapping problem as an optimization problem on smooth…

We propose a novel geometric approach for learning bilingual mappings given monolingual embeddings and a bilingual dictionary. Our approach decouples learning the transformation from the source language to the target language into (a) learning rotations for language-specific embeddings to align them to a common space, and (b) learning a similarity metric in the common space to model similarities between the embeddings. We model the bilingual mapping problem as an optimization problem on smooth Riemannian manifolds. We show that our approach outperforms previous approaches on the bilingual lexicon induction and cross-lingual word similarity tasks. We also generalize our framework to represent multiple languages in a common latent space. In particular, the latent space representations for several languages are learned jointly, given bilingual dictionaries for multiple language pairs. We illustrate the effectiveness of joint learning for multiple languages in zero-shot word translation setting.

Other authors
See publication
Judicious Selection of Training Data in Assisting Language for Multilingual Neural NER

Conference of the Association of Computational Linguistics (ACL) 2018
Multilingual learning for Neural Named Entity Recognition (NNER) involves jointly training a neural network for multiple languages. Typically, the goal is improving the NER performance of one of the languages (the primary language) using the other assisting languages. We show that the divergence in the tag distributions of the common named entities between the primary and assisting language can reduce the effectiveness of multilingual learning. To alleviate this problem, we propose a metric…

Multilingual learning for Neural Named Entity Recognition (NNER) involves jointly training a neural network for multiple languages. Typically, the goal is improving the NER performance of one of the languages (the primary language) using the other assisting languages. We show that the divergence in the tag distributions of the common named entities between the primary and assisting language can reduce the effectiveness of multilingual learning. To alleviate this problem, we propose a metric based on symmetric KL divergence to filter out the highly divergent training instances in the assisting language. We empirically show that our data selection strategy improves NER performance on many languages, including those with very limited training data.

Other authors
Leveraging Orthographic Similarity for Multilingual Neural Transliteration

Transactions of the Association for Computational Linguistics (TACL) 2018
We address the task of joint training of transliteration models for multiple language pairs (multilingual transliteration). This is an instance of multitask learning, where individual tasks (language pairs) benefit from sharing knowledge with related tasks. We focus on transliteration involving related tasks i.e., languages sharing writing systems and phonetic properties (orthographically similar languages). We propose a modified neural encoder-decoder model that maximizes parameter sharing…

We address the task of joint training of transliteration models for multiple language pairs (multilingual transliteration). This is an instance of multitask learning, where individual tasks (language pairs) benefit from sharing knowledge with related tasks. We focus on transliteration involving related tasks i.e., languages sharing writing systems and phonetic properties (orthographically similar languages). We propose a modified neural encoder-decoder model that maximizes parameter sharing across language pairs in order to effectively leverage orthographic similarity. We show that multilingual transliteration
significantly outperforms bilingual transliteration in different scenarios (average increase of 58% across a variety of languages we experimented with). We also show that multilingual transliteration models can generalize well to languages/language pairs not encountered during training and hence perform well on the zeroshot transliteration task. We show that further improvements can be achieved by using phonetic feature input.

Other authors
The IIT Bombay English-Hindi Parallel Corpus

Language Resources and Evaluation Conference 2018
We present the IIT Bombay English-Hindi Parallel Corpus. The corpus is a compilation of parallel corpora previously available in the public domain as well as new parallel corpora we collected. The corpus contains 1.49 million parallel segments, of which 694k segments were not previously available in the public domain. The corpus has been pre-processed for machine translation, and we report baseline phrase-based SMT and NMT translation results on this corpus. This corpus has been used in two…

We present the IIT Bombay English-Hindi Parallel Corpus. The corpus is a compilation of parallel corpora previously available in the public domain as well as new parallel corpora we collected. The corpus contains 1.49 million parallel segments, of which 694k segments were not previously available in the public domain. The corpus has been pre-processed for machine translation, and we report baseline phrase-based SMT and NMT translation results on this corpus. This corpus has been used in two editions of shared tasks at the Workshop on Asian Language Transation (2016 and 2017). The corpus is freely available for non-commercial research. To the best of our knowledge, this is the largest publicly available English-Hindi parallel corpus.

Other authors
Utilizing Lexical Similarity between Related, Low-resource Languages for Pivot-based SMT

International Joint Conference on Natural Language Processing November 1, 2017
We investigate pivot-based translation between related languages in a low resource, phrase-based SMT setting. We show that a subword-level pivot-based SMT model using a related pivot language is substantially better than word and morpheme-level pivot models. It is also highly competitive with the best direct translation model, which is encouraging as no direct source-target training corpus is used. We also show that combining multiple related language pivot models can rival a direct translation…

We investigate pivot-based translation between related languages in a low resource, phrase-based SMT setting. We show that a subword-level pivot-based SMT model using a related pivot language is substantially better than word and morpheme-level pivot models. It is also highly competitive with the best direct translation model, which is encouraging as no direct source-target training corpus is used. We also show that combining multiple related language pivot models can rival a direct translation model. Thus, the use of subwords as translation units coupled with multiple related pivot languages can compensate for the lack of a direct parallel corpus.

Other authors
See publication
Learning variable length units for SMT between related languages via Byte Pair Encoding

Workshop on Subword and Character level models in NLP (SCLeM 2017, co-located with EMNLP 2017) September 1, 2017
We explore the use of segments learnt using Byte Pair Encoding (referred to as BPE units) as basic units for statistical machine translation between related languages and compare it with orthographic syllables, which are currently the best performing basic units for this translation task. BPE identifies the most frequent character sequences as basic units, while orthographic syllables are linguistically motivated pseudo-syllables. We show that BPE units modestly outperform orthographic…

We explore the use of segments learnt using Byte Pair Encoding (referred to as BPE units) as basic units for statistical machine translation between related languages and compare it with orthographic syllables, which are currently the best performing basic units for this translation task. BPE identifies the most frequent character sequences as basic units, while orthographic syllables are linguistically motivated pseudo-syllables. We show that BPE units modestly outperform orthographic syllables as units of translation, showing up to 11% increase in BLEU score. While orthographic syllables can be used only for languages whose writing systems use vowel representations, BPE is writing system independent and we show that BPE outperforms other units for non-vowel writing systems too. Our results are supported by extensive experimentation spanning multiple language families and writing systems.

Other authors
See publication
Orthographic Syllable as basic unit for SMT between Related Languages

Conference on Empirical Methods in Natural Language Processing (EMNLP) Nov 2016
We explore the use of the orthographic syllable, a variable-length consonant-vowel sequence, as a basic unit of translation between related languages which use abugida or alphabetic scripts. We show that orthographic syllable level translation significantly outperforms models trained over other basic units (word, morpheme and character) when training over small parallel corpora.

Other authors
Substring-based unsupervised transliteration with phonetic and contextual knowledge

SIGNLL Conference on Computational Natural Language Learning (CoNLL) 2016
We propose an unsupervised approach for substring-based transliteration which incorporates two new sources of knowledge in the learning process: (i) context by learning substring mappings, as opposed to single character mappings, and (ii) phonetic features which capture cross-lingual character similarity via prior distributions.

Our approach is a two-stage iterative, boot-strapping solution, which vastly outperforms Ravi & Knight's (2009) state-of-the-art unsupervised transliteration…

We propose an unsupervised approach for substring-based transliteration which incorporates two new sources of knowledge in the learning process: (i) context by learning substring mappings, as opposed to single character mappings, and (ii) phonetic features which capture cross-lingual character similarity via prior distributions.

Our approach is a two-stage iterative, boot-strapping solution, which vastly outperforms Ravi & Knight's (2009) state-of-the-art unsupervised transliteration method and outperforms a rule-based baseline by up to 50\% for top-1 accuracy on multiple language pairs. We show that substring-based models are superior to character-based models, and observe that their top-10 accuracy is comparable to the top-1 accuracy of supervised systems.

Our method only requires a phonemic representation of the words. This is possible for many language-script combinations which have a high grapheme-to-phoneme correspondence \textit{e.g.} scripts of Indian languages derived from the Brahmi script. Hence, Indian languages were the focus of our experiments. For other languages, a grapheme-to-phoneme converter would be required.

Other authors
Brahmi-Net: A transliteration and script conversion system for languages of the Indian subcontinent

North American Chapter of the Association for Computational Linguistics - Human Language Technologies: System Demonstrations (NAACL) 2015
Other authors
The IIT Bombay SMT System for ICON 2014 Tools Contest

Proceedings of ICON, 2014 December 2, 2014
Other authors
Shata-Anuvadak: Tackling Multiway Translation of Indian Languages

Language and Resources and Evaluation Conference (LREC) 2014
Other authors
Tuning a Grammar Correction System for Increased Precision

SIGNLL Conference on Computational Natural Language Learning (CoNLL) 2014
Other authors
IITB System for CoNLL 2013 Shared Task: A Hybrid Approach to Grammatical Error Correction

SIGNLL Conference on Computational Natural Language Learning (CoNLL) 2013
Other authors
TransDoop: A Map-Reduce based Crowdsourced Translation for Complex Domain

Association of Computational Linguistics : System Demonstrations 2013
Other authors
Experiences in Resource Generation for Machine Translation through Crowdsourcing

CrowdConf 2011
Other authors
See publication
A System for Compound Noun Multiword Expression Extraction for Hindi

6th Intl. Conf. on Natural Language Processing, ICON 2008 Dec 2008
Identifying compound noun multiword
expressions is important for applications like
machine translation and information retrieval.
We describe a system for extracting Hindi
compound noun multiword expressions
(MWE) from a given corpus. We identify
major categories of compound noun MWEs,
based on linguistic and psycholinguistic
principles. Our extraction methods use
various statistical co-occurrence measures to
exploit the statistical idiosyncrasy of MWEs.
We make…

Identifying compound noun multiword
expressions is important for applications like
machine translation and information retrieval.
We describe a system for extracting Hindi
compound noun multiword expressions
(MWE) from a given corpus. We identify
major categories of compound noun MWEs,
based on linguistic and psycholinguistic
principles. Our extraction methods use
various statistical co-occurrence measures to
exploit the statistical idiosyncrasy of MWEs.
We make use of various lexical cues from the
corpus to enhance our methods. We also
address the extraction of reduplicative
expressions using lexical, semantic and
phonetic knowledge. We have also built an
evaluation resource of compound noun
MWEs for Hindi. Our methods give a recall
of 80% and precision of 23% at rank 1000.

Other authors
See publication

Projects

BrahmiNet

Feb 2015
Brahmi-Net is an online system for transliteration and script conversion for all major Indian language pairs (306 pairs). The system covers 13 Indo-Aryan languages, 4 Dravidian languages and English.
Languages supported include:

- Indo-Aryan languages: Hindi, Urdu, Bengali, Gujarati, Punjabi, Marathi, Konkani, Assamese, Odia, Sindhi, Sinhala, Nepali, Sanskrit
- Dravidian languages: Tamil, Telugu, Malayalam, Kannada
- English

Other creators
See project
Śata-Anuva̅dak: Indic Translator

Aug 2013 - Present
Śata-Anuva̅dak (100 Translators) is a broad coverage Statisitical Machine Translation system for Indian languages. It is a Phrase-Based MT system with pre-processing and post-processing extensions. The pre-processing includes source-side reordering for English to Indian language translation. The post-processing includes transliteration between Indian languages for OOV words. It currently supports translation between 11 Indian languages:

- Indo-Aryan languages: Hindi, Urdu, Bengali…

Śata-Anuva̅dak (100 Translators) is a broad coverage Statisitical Machine Translation system for Indian languages. It is a Phrase-Based MT system with pre-processing and post-processing extensions. The pre-processing includes source-side reordering for English to Indian language translation. The post-processing includes transliteration between Indian languages for OOV words. It currently supports translation between 11 Indian languages:

- Indo-Aryan languages: Hindi, Urdu, Bengali, Gujarati, Punjabi, Marathi, Konkani
- Dravidian languages: Tamil, Telugu, Malayalam
- English

Other creators
See project
Indic NLP Library

2013 - Present

The goal of this project is to build Python based libraries for common text processing and Natural Language Processing in Indian languages. Indian languages share a lot of similarity in terms of script, phonology, language syntax, etc. and this library is an attempt to provide a general solution to very commonly required toolsets for Indian language text.

The library provides the following functionalities:
- Text Normalization
- Indic Script Conversion
- Romanization of Indic…

The goal of this project is to build Python based libraries for common text processing and Natural Language Processing in Indian languages. Indian languages share a lot of similarity in terms of script, phonology, language syntax, etc. and this library is an attempt to provide a general solution to very commonly required toolsets for Indian language text.

The library provides the following functionalities:
- Text Normalization
- Indic Script Conversion
- Romanization of Indic Scripts (ITRANS) and vice-versa
- Indian Language Transliteration
- Tokenization
- Word Segmentation

See project

Honors & Awards

Outstanding Paper at SCLeM workshop 2017

Workshop on Subword and Character level models in NLP 2017 (Collocated with EMNLP)

Aug 2017

Our paper titled:
Learning variable length units for SMT between related languages via Byte Pair Encoding

co-authored with Prof. Pushpak Bhattacharyya has been awarded an outstanding paper award at the 1st Workshop on Subword and Character level models in NLP 2017, which is collocated with EMNLP 2017. The workshop is on 7th September 2017.

Here is the paper:
https://arxiv.org/abs/1610.06510
Best Thesis Talk at Research and Innovation Symposium in Computing

Department of Computer Science and Engineering, IIT Bombay

Apr 2017

This talk was given at the Department of Computer Science and Engineering, IIT Bombay's annual research symposium. The abstract of the talk is given below:

Related languages are those that exhibit lexical and structural similarities on account of sharing a common ancestry or being in contact for a long period of time. Machine Translation between related languages is a major requirement since there is substantial government, commercial and cultural communication among people speaking…

This talk was given at the Department of Computer Science and Engineering, IIT Bombay's annual research symposium. The abstract of the talk is given below:

Related languages are those that exhibit lexical and structural similarities on account of sharing a common ancestry or being in contact for a long period of time. Machine Translation between related languages is a major requirement since there is substantial government, commercial and cultural communication among people speaking related languages. However, most of these languages have few parallel corpora resources, an important requirement for building good quality statistical machine translation (SMT) systems.

A key property of related languages is lexical similarity, which means the languages share many words with the similar form (spelling/pronunciation) and meaning. These words could be cognates, lateral borrowings or loan words from other languages. Modelling lexical similarity among related languages is the key to building good-quality SMT systems with limited parallel corpora. We propose the use of two subword units of translation for modelling lexical similarity: (i) orthographic syllables motivated from the design of Indic scripts and (ii) byte pair encoded units inspired from compression theory. We show that the proposed significantly outperforms other units of representation (word, morpheme and character), over multiple language pairs, spanning different language families, with varying degrees of lexical similarity and is robust to domain changes too.
Invited Talk at Inter-Research-Institute Student Seminar in Computer Science (ACM India Annual Meet)

ACM India

Jan 2017

Title of talk: Orthographic Syllable as basic unit for SMT between Related Languages

Abstract:
We explore the use of theorthographic syllable, a variable-length consonant-vowel sequence, as a basic unit of translation between related languages which use abugida or alphabetic scripts. We show that orthographic syllable level translation significantly outperforms models trained over other basic units (word, morpheme and character) when training over small parallel corpora.
Invited Tutorial on Statistical Machine Translation between related languages

North American Chapter of the Association for Computational Linguistics - Human Language Technologies: System Demonstrations

2016

With Pushpak Bhattacharyya and Mitesh Khapra

Abstract:

Language-independent Statistical Machine Translation (SMT) has proven to be very challenging. The diversity of languages makes high accuracy difficult and requires substantial parallel corpus as well as linguistic resources (parsers, morph analyzers, etc.). An interesting observation is that a large chunk of machine translation (MT) requirements involve related languages. They are either : (i) between related languages, or…

With Pushpak Bhattacharyya and Mitesh Khapra

Abstract:

Language-independent Statistical Machine Translation (SMT) has proven to be very challenging. The diversity of languages makes high accuracy difficult and requires substantial parallel corpus as well as linguistic resources (parsers, morph analyzers, etc.). An interesting observation is that a large chunk of machine translation (MT) requirements involve related languages. They are either : (i) between related languages, or (ii) between a lingua franca (like English) and a set of related languages. For instance, India, the European Union and South-East Asia have such translation requirements due to government, business and socio-cultural communication needs.

Related languages share a lot of linguistic features and the divergences among them are at a lower level of the NLP pipeline. The objective of the tutorial is to discuss how the relatedness among languages can be leveraged to bridge this language divergence thereby achieving some/all of these goals: (i) improving translation quality, (ii) achieving better generalization, (iii) sharing linguistic resources, and (iv) reducing resource requirements.

We will look at the existing research in SMT from the perspective of related languages, with the goal to build a toolbox of methods that are useful for translation between related languages. This tutorial would be relevant to Machine Translation researchers and developers, especially those interested in translation between low-resource languages which have resource-rich related languages. It will also be relevant for researchers interested in multilingual computation.

Languages

English

Full professional proficiency
Hindi

Native or bilingual proficiency
Marathi

Native or bilingual proficiency
Malayalam

Native or bilingual proficiency

More activity by Anoop

After six months of my research internship at the Nilekani Centre at AI4Bhārat, Indian Institute of Technology, Madras under Dr. Mitesh Khapra, I am…

After six months of my research internship at the Nilekani Centre at AI4Bhārat, Indian Institute of Technology, Madras under Dr. Mitesh Khapra, I am…

Liked by Anoop Kunchukuttan
𝐌𝐢𝐜𝐫𝐨𝐬𝐨𝐟𝐭 𝐀𝐜𝐚𝐝𝐞𝐦𝐢𝐜 𝐏𝐚𝐫𝐭𝐧𝐞𝐫𝐬𝐡𝐢𝐩 𝐆𝐫𝐚𝐧𝐭 (𝐌𝐀𝐏𝐆) 𝟐𝟎𝟐𝟒 Microsoft strongly believes that academia and industry can…

𝐌𝐢𝐜𝐫𝐨𝐬𝐨𝐟𝐭 𝐀𝐜𝐚𝐝𝐞𝐦𝐢𝐜 𝐏𝐚𝐫𝐭𝐧𝐞𝐫𝐬𝐡𝐢𝐩 𝐆𝐫𝐚𝐧𝐭 (𝐌𝐀𝐏𝐆) 𝟐𝟎𝟐𝟒 Microsoft strongly believes that academia and industry can…

Liked by Anoop Kunchukuttan
It has been an amazing journey over the last eight months or so, culminating in a mesmerizing last two weeks at the Harvard Business School, where we…

It has been an amazing journey over the last eight months or so, culminating in a mesmerizing last two weeks at the Harvard Business School, where we…

Liked by Anoop Kunchukuttan
I am extremely happy and proud to share that my brother, Ayush Maheshwari, has successfully defended his PhD Thesis at Indian Institute of…

I am extremely happy and proud to share that my brother, Ayush Maheshwari, has successfully defended his PhD Thesis at Indian Institute of…

Liked by Anoop Kunchukuttan
Interesting talk by Vipul R. on Grammarly's research on multilingual text editing covering grammatical error correction, text simplification and…

Interesting talk by Vipul R. on Grammarly's research on multilingual text editing covering grammatical error correction, text simplification and…

Liked by Anoop Kunchukuttan
Happy to share that I have successfully defended my PhD thesis at IITB! 🎓 Title: Enriching Language Representations using Knowledge Resources This…

Happy to share that I have successfully defended my PhD thesis at IITB! 🎓 Title: Enriching Language Representations using Knowledge Resources This…

Liked by Anoop Kunchukuttan
Our “No Language Left Behind” paper is now published in Nature. I’m very proud to have been a part of this amazing team! https://lnkd.in/gShAJtHM

Our “No Language Left Behind” paper is now published in Nature. I’m very proud to have been a part of this amazing team! https://lnkd.in/gShAJtHM

Liked by Anoop Kunchukuttan
Great talk by today's guest speaker Vivek Gupta in our internalNLP seminar series on "Reasoning with Complex Data: Insights from Semi-Structured…

Great talk by today's guest speaker Vivek Gupta in our internalNLP seminar series on "Reasoning with Complex Data: Insights from Semi-Structured…

Liked by Anoop Kunchukuttan
Excited to be giving the keynote talk at the EURALI workshop @ LREC-COLING 2024 tomorrow! I'll be speaking about our learnings over the few years and…

Excited to be giving the keynote talk at the EURALI workshop @ LREC-COLING 2024 tomorrow! I'll be speaking about our learnings over the few years and…

Liked by Anoop Kunchukuttan
Prof Pushpak Bhattacharyya named Chairman of National Committee on Indian Language Standards This committee will evaluate encoding, fonts, search…

Prof Pushpak Bhattacharyya named Chairman of National Committee on Indian Language Standards This committee will evaluate encoding, fonts, search…

Liked by Anoop Kunchukuttan
It is thrilling to announce the release of new models as part of the Phi-3 series on both HuggingFace (https://aka.ms/phi3-hf) and Azure…

It is thrilling to announce the release of new models as part of the Phi-3 series on both HuggingFace (https://aka.ms/phi3-hf) and Azure…

Liked by Anoop Kunchukuttan
Thrilled to see GeekWire's Innovation of the Year award to Allen Institute for AI (AI2)'s OLMo team 😁 Learn more about the model:…

Thrilled to see GeekWire's Innovation of the Year award to Allen Institute for AI (AI2)'s OLMo team 😁 Learn more about the model:…

Liked by Anoop Kunchukuttan

View Anoop’s full profile

See who you know in common
Get introduced
Contact Anoop directly

Join to view full profile

Sign in

Stay updated on your professional world

By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.

New to LinkedIn? Join now

Other similar profiles

Explore collaborative articles

We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.

Explore More

Add new skills with these courses

See all courses

About

Activity

Meet the speaker @ #GlobalIndiaAISummit -Mr. Jibu Elias, Country Lead for the Responsible Computing Challenge, Mozilla Foundation. Watch as he shares…

Liked by Anoop Kunchukuttan

🤩 Starting #ACMIndia #VicePresidentGiri. 🤩 #Humbled #StrengthofWellWishers #ProfGiri 👏 Congratulations Prof. Meenakshi D'Souza for President &…

Liked by Anoop Kunchukuttan

Excited to share that I have started a rotation in a Microsoft product group focusing on M365 Copilots to bring research on multilingual language…

Liked by Anoop Kunchukuttan

Experience & Education

Microsoft

********* ******* **********

View Anoop’s full experience

See their title, tenure and more.

Publications

Preprint - arxiv August 27, 2018

Judicious Selection of Training Data in Assisting Language for Multilingual Neural NER

Conference of the Association of Computational Linguistics (ACL) 2018

Leveraging Orthographic Similarity for Multilingual Neural Transliteration

Transactions of the Association for Computational Linguistics (TACL) 2018

The IIT Bombay English-Hindi Parallel Corpus

Language Resources and Evaluation Conference 2018

International Joint Conference on Natural Language Processing November 1, 2017

Workshop on Subword and Character level models in NLP (SCLeM 2017, co-located with EMNLP 2017) September 1, 2017

Orthographic Syllable as basic unit for SMT between Related Languages

Conference on Empirical Methods in Natural Language Processing (EMNLP) Nov 2016

Substring-based unsupervised transliteration with phonetic and contextual knowledge

SIGNLL Conference on Computational Natural Language Learning (CoNLL) 2016

Brahmi-Net: A transliteration and script conversion system for languages of the Indian subcontinent

North American Chapter of the Association for Computational Linguistics - Human Language Technologies: System Demonstrations (NAACL) 2015

The IIT Bombay SMT System for ICON 2014 Tools Contest

Proceedings of ICON, 2014 December 2, 2014

Shata-Anuvadak: Tackling Multiway Translation of Indian Languages

Language and Resources and Evaluation Conference (LREC) 2014

Tuning a Grammar Correction System for Increased Precision

SIGNLL Conference on Computational Natural Language Learning (CoNLL) 2014

IITB System for CoNLL 2013 Shared Task: A Hybrid Approach to Grammatical Error Correction

SIGNLL Conference on Computational Natural Language Learning (CoNLL) 2013

TransDoop: A Map-Reduce based Crowdsourced Translation for Complex Domain

Association of Computational Linguistics : System Demonstrations 2013

CrowdConf 2011

6th Intl. Conf. on Natural Language Processing, ICON 2008 Dec 2008

Projects

Feb 2015

Aug 2013 - Present

2013 - Present

Honors & Awards

Outstanding Paper at SCLeM workshop 2017

Workshop on Subword and Character level models in NLP 2017 (Collocated with EMNLP)

Best Thesis Talk at Research and Innovation Symposium in Computing

Department of Computer Science and Engineering, IIT Bombay

Invited Talk at Inter-Research-Institute Student Seminar in Computer Science (ACM India Annual Meet)

ACM India

Invited Tutorial on Statistical Machine Translation between related languages

North American Chapter of the Association for Computational Linguistics - Human Language Technologies: System Demonstrations

Languages

English

Full professional proficiency

Hindi

Native or bilingual proficiency

Marathi

Native or bilingual proficiency

Malayalam

Native or bilingual proficiency

More activity by Anoop

After six months of my research internship at the Nilekani Centre at AI4Bhārat, Indian Institute of Technology, Madras under Dr. Mitesh Khapra, I am…

Liked by Anoop Kunchukuttan

𝐌𝐢𝐜𝐫𝐨𝐬𝐨𝐟𝐭 𝐀𝐜𝐚𝐝𝐞𝐦𝐢𝐜 𝐏𝐚𝐫𝐭𝐧𝐞𝐫𝐬𝐡𝐢𝐩 𝐆𝐫𝐚𝐧𝐭 (𝐌𝐀𝐏𝐆) 𝟐𝟎𝟐𝟒 Microsoft strongly believes that academia and industry can…

Liked by Anoop Kunchukuttan

It has been an amazing journey over the last eight months or so, culminating in a mesmerizing last two weeks at the Harvard Business School, where we…

Liked by Anoop Kunchukuttan

I am extremely happy and proud to share that my brother, Ayush Maheshwari, has successfully defended his PhD Thesis at Indian Institute of…

Liked by Anoop Kunchukuttan

Interesting talk by Vipul R. on Grammarly's research on multilingual text editing covering grammatical error correction, text simplification and…

Liked by Anoop Kunchukuttan

Happy to share that I have successfully defended my PhD thesis at IITB! 🎓 Title: Enriching Language Representations using Knowledge Resources This…

Liked by Anoop Kunchukuttan

Our “No Language Left Behind” paper is now published in Nature. I’m very proud to have been a part of this amazing team! https://lnkd.in/gShAJtHM

Liked by Anoop Kunchukuttan

Great talk by today's guest speaker Vivek Gupta in our internalNLP seminar series on "Reasoning with Complex Data: Insights from Semi-Structured…