Activity
-
I’m excited to share that Duolingo has acquired Hobbes, a world-class animation and motion design studio based in Detroit. This acquisition will do…
I’m excited to share that Duolingo has acquired Hobbes, a world-class animation and motion design studio based in Detroit. This acquisition will do…
Liked by Sam Davidson
-
What is the biggest barrier to build open-source AI coding agents? It’s *data*, there are no high-quality, large, and open agentic coding datasets…
What is the biggest barrier to build open-source AI coding agents? It’s *data*, there are no high-quality, large, and open agentic coding datasets…
Liked by Sam Davidson
-
𝕎𝕙𝕪 #𝔸𝕀 𝕗𝕠𝕝𝕜𝕤 𝕟𝕖𝕖𝕕 𝕒 𝕓𝕣𝕠𝕒𝕕 𝕓𝕒𝕤𝕖𝕕 𝕀𝕟𝕥𝕣𝕠 𝕥𝕠 #𝔸𝕀 👉 As I go around giving talks/tutorials on the planning and…
𝕎𝕙𝕪 #𝔸𝕀 𝕗𝕠𝕝𝕜𝕤 𝕟𝕖𝕖𝕕 𝕒 𝕓𝕣𝕠𝕒𝕕 𝕓𝕒𝕤𝕖𝕕 𝕀𝕟𝕥𝕣𝕠 𝕥𝕠 #𝔸𝕀 👉 As I go around giving talks/tutorials on the planning and…
Liked by Sam Davidson
Experience & Education
Publications
-
Developing a New Classifier for Automated Identification of Incivility in Social Media
Proceedings of the 4th Workshop on Online Abuse and Harms (November, 2020 - co-located with EMNLP, 2020)
Incivility is not only prevalent on online social media platforms, but also has concrete effects on individual users, online groups, and the platforms themselves. Given the prevalence and effects of online incivility, and the challenges involved in human-based incivility detection, it is urgent to develop validated and versatile automatic approaches to identifying uncivil posts and comments. This project advances both a neural, BERT-based classifier as well as a logistic regression classifier…
Incivility is not only prevalent on online social media platforms, but also has concrete effects on individual users, online groups, and the platforms themselves. Given the prevalence and effects of online incivility, and the challenges involved in human-based incivility detection, it is urgent to develop validated and versatile automatic approaches to identifying uncivil posts and comments. This project advances both a neural, BERT-based classifier as well as a logistic regression classifier to identify uncivil comments. The classifier is trained on a dataset of Reddit posts, which are annotated for incivility, and further expanded using a combination of labeled data from Reddit and Twitter. Our best performing model achieves an F1 of 0.802 on our Reddit test set. The final model is not only applicable across social media platforms and their distinct data structures, but also computationally versatile, and-as such-ready to be used on vast volumes of online data. All trained models and annotated data are made available to the research community.
-
Developing NLP Tools with a New Corpus of Learner Spanish
Proceedings of the 12th Language Resources and Evaluation Conference (LREC), 2020
The development of effective NLP tools for the L2 classroom depends largely on the availability of large annotated corpora of language learner text. While annotated learner corpora of English are widely available, large learner corpora of Spanish are less common. Those Spanish corpora that are available do not contain the annotations needed to facilitate the development of tools beneficial to language learners, such as grammatical error correction. As a result, the field has seen little…
The development of effective NLP tools for the L2 classroom depends largely on the availability of large annotated corpora of language learner text. While annotated learner corpora of English are widely available, large learner corpora of Spanish are less common. Those Spanish corpora that are available do not contain the annotations needed to facilitate the development of tools beneficial to language learners, such as grammatical error correction. As a result, the field has seen little research in NLP tools designed to benefit Spanish language learners and teachers. We introduce COWS-L2H, a freely available corpus of Spanish learner data which includes error annotations and parallel corrected text to help researchers better understand L2 development, to examine teaching practices empirically, and to develop NLP tools to better serve the Spanish teaching community. We demonstrate the utility of this corpus by developing a neural-network based grammatical error correction system for Spanish learner writing.
-
Dependency Parsing for Spoken Dialog System
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing (EMNLP)
Dependency parsing of conversational input can play an important role in language understanding for dialog systems by identifying the relationships between entities extracted from user utterances. Additionally, effective dependency parsing can elucidate differences in language structure and usage for discourse analysis of human-human versus human-machine dialogs. However, models trained on datasets based on news articles and web data do not perform well on spoken human-machine dialog, and…
Dependency parsing of conversational input can play an important role in language understanding for dialog systems by identifying the relationships between entities extracted from user utterances. Additionally, effective dependency parsing can elucidate differences in language structure and usage for discourse analysis of human-human versus human-machine dialogs. However, models trained on datasets based on news articles and web data do not perform well on spoken human-machine dialog, and currently available annotation schemes do not adapt well to dialog data. Therefore, we propose the Spoken Conversation Universal Dependencies (SCUD) annotation scheme that extends the Universal Dependencies (UD) (Nivre et al., 2016) guidelines to spoken human-machine dialogs. We also provide ConvBank, a conversation dataset between humans and an open-domain conversational dialog system with SCUD annotation. Finally, to demonstrate the utility of the dataset, we train a dependency parser on the ConvBank dataset. We demonstrate that by pre-training a dependency parser on a set of larger public datasets and fine-tuning on ConvBank data, we achieved the best result, 85.05% unlabeled and 77.82% labeled attachment accuracy.
-
Gunrock: A Social Bot for Complex and Engaging Long Conversations
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing
Gunrock is the winner of the 2018 Amazon Alexa Prize, as evaluated by coherence and engagement from both real users and Amazon-selected expert conversationalists. We focus on understanding complex sentences and having in-depth conversations in open domains. In this paper, we introduce some innovative system designs and related validation analysis. Overall, we found that users produce longer sentences to Gunrock, which are directly related to users' engagement (e.g., ratings, number of turns)…
Gunrock is the winner of the 2018 Amazon Alexa Prize, as evaluated by coherence and engagement from both real users and Amazon-selected expert conversationalists. We focus on understanding complex sentences and having in-depth conversations in open domains. In this paper, we introduce some innovative system designs and related validation analysis. Overall, we found that users produce longer sentences to Gunrock, which are directly related to users' engagement (e.g., ratings, number of turns). Additionally, users' backstory queries about Gunrock are positively correlated to user satisfaction. Finally, we found dialog flows that interleave facts and personal opinions and stories lead to better user satisfaction.
Languages
-
English
Native or bilingual proficiency
-
French
Professional working proficiency
More activity by Sam
-
It was great to share thoughts on pros and cons of big and smaller models
It was great to share thoughts on pros and cons of big and smaller models
Liked by Sam Davidson
-
Did Open Science just beat OpenAI? 🤯 Kyutai just announced Moshi, a real-time native multimodal foundation model that can listen and speak, similar…
Did Open Science just beat OpenAI? 🤯 Kyutai just announced Moshi, a real-time native multimodal foundation model that can listen and speak, similar…
Liked by Sam Davidson
-
Exciting life update! 🎓 Last week I defended my dissertation and completed my PhD from UC Riverside! It was truly a life changing journey and I want…
Exciting life update! 🎓 Last week I defended my dissertation and completed my PhD from UC Riverside! It was truly a life changing journey and I want…
Liked by Sam Davidson
-
Announcement: Robert Brennan, Xingyao Wang, and I have formed a company! Our name is All Hands AI 🙌 Site: https://www.all-hands.dev/ Our mission…
Announcement: Robert Brennan, Xingyao Wang, and I have formed a company! Our name is All Hands AI 🙌 Site: https://www.all-hands.dev/ Our mission…
Liked by Sam Davidson
-
A few weeks ago, I got to travel to Ann Arbor, MI to the Annual Human Sentence Processing conference to share my recent work on how native Arabic…
A few weeks ago, I got to travel to Ann Arbor, MI to the Annual Human Sentence Processing conference to share my recent work on how native Arabic…
Liked by Sam Davidson
-
I'm excited to announce that I have started a new position as an Applied Scientist II on the Amazon Web Services Next Generation Developer Experience…
I'm excited to announce that I have started a new position as an Applied Scientist II on the Amazon Web Services Next Generation Developer Experience…
Posted by Sam Davidson
-
It was great to be at NAACL 2024 in Mexico city. I really enjoyed the conference and big congrats to my group and our MSR collaborators for winning…
It was great to be at NAACL 2024 in Mexico city. I really enjoyed the conference and big congrats to my group and our MSR collaborators for winning…
Liked by Sam Davidson
-
Soon after OpenAI released GPT-4o on Monday, May 13, some Chinese speakers started to notice that something seemed off about this newest version of…
Soon after OpenAI released GPT-4o on Monday, May 13, some Chinese speakers started to notice that something seemed off about this newest version of…
Liked by Sam Davidson
-
Attending #NAACL this week for presenting our Amazon Science paper "FLAP: Flow-Adhering Planning with Constrained Decoding in LLMs". Work done in…
Attending #NAACL this week for presenting our Amazon Science paper "FLAP: Flow-Adhering Planning with Constrained Decoding in LLMs". Work done in…
Liked by Sam Davidson
-
AI is not some sort of natural phenomenon that will just emerge and become dangerous. *WE* design it and *WE* build it. I can imagine thousands of…
AI is not some sort of natural phenomenon that will just emerge and become dangerous. *WE* design it and *WE* build it. I can imagine thousands of…
Liked by Sam Davidson
-
Hiring Applied Scientists for Amazon Q science team. If you are interested in building conversational AI assistants for enterprises (with expertise…
Hiring Applied Scientists for Amazon Q science team. If you are interested in building conversational AI assistants for enterprises (with expertise…
Liked by Sam Davidson
-
My new paper «Your Transformer is Secretly Linear» has been accepted at ACL! 🎉 We have discovered that most layers of language models are 99%…
My new paper «Your Transformer is Secretly Linear» has been accepted at ACL! 🎉 We have discovered that most layers of language models are 99%…
Liked by Sam Davidson
-
Turns out that everything they told you about scaling laws was specific to web data, because that’s what people used in their experiments, and doing…
Turns out that everything they told you about scaling laws was specific to web data, because that’s what people used in their experiments, and doing…
Liked by Sam Davidson
-
Thanks Saab, great working with everyone last summer. Looking forward to sharing our work with the wider NLP community!
Thanks Saab, great working with everyone last summer. Looking forward to sharing our work with the wider NLP community!
Liked by Sam Davidson
Other similar profiles
-
Elizabeth Conrad
Connect -
Shamik Roy
Connect -
Caterina Keri
BA graduate - doing a Masters in Data Science (Digital Humanities strand)
Connect -
Aly Butler
--
Connect -
Caroline Glabik
Professionell fotograf på Bildverkstan
Connect -
Nicholas Villarreal
Connect -
Skyler Reese
PhD Candidate at UC Davis in Experimental and Computational Linguistics
Connect -
Ben Wiebe
Recent graduate from the College of Idaho
Connect -
Christian Ridmark
Connect -
Jules Vonessen
PhD Candidate at University of California, Davis
Connect
Explore collaborative articles
We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.
Explore MoreOthers named Sam Davidson in United States
-
Sam Davidson
-
Sam Davidson
Owner at Nashville Daily Spirits
-
Sam Davidson
Management, Sales and Marketing
-
Sam Davidson
Business Operations Manager-Skills in HR, team leadership, training, employee development, and budget analysis - MBA Candidate
-
Sam Davidson
321 others named Sam Davidson in United States are on LinkedIn
See others named Sam Davidson