-
Is In-Context Learning a Type of Gradient-Based Learning? Evidence from the Inverse Frequency Effect in Structural Priming
Authors:
Zhenghao Zhou,
Robert Frank,
R. Thomas McCoy
Abstract:
Large language models (LLMs) have shown the emergent capability of in-context learning (ICL). One line of research has explained ICL as functionally performing gradient descent. In this paper, we introduce a new way of diagnosing whether ICL is functionally equivalent to gradient-based learning. Our approach is based on the inverse frequency effect (IFE) -- a phenomenon in which an error-driven le…
▽ More
Large language models (LLMs) have shown the emergent capability of in-context learning (ICL). One line of research has explained ICL as functionally performing gradient descent. In this paper, we introduce a new way of diagnosing whether ICL is functionally equivalent to gradient-based learning. Our approach is based on the inverse frequency effect (IFE) -- a phenomenon in which an error-driven learner is expected to show larger updates when trained on infrequent examples than frequent ones. The IFE has previously been studied in psycholinguistics because humans show this effect in the context of structural priming (the tendency for people to produce sentence structures they have encountered recently); the IFE has been used as evidence that human structural priming must involve error-driven learning mechanisms. In our experiments, we simulated structural priming within ICL and found that LLMs display the IFE, with the effect being stronger in larger models. We conclude that ICL is indeed a type of gradient-based learning, supporting the hypothesis that a gradient component is implicitly computed in the forward pass during ICL. Our results suggest that both humans and LLMs make use of gradient-based, error-driven processing mechanisms.
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
RoboCar: A Rapidly Deployable Open-Source Platform for Autonomous Driving Research
Authors:
Mehdi Testouri,
Gamal Elghazaly,
Raphael Frank
Abstract:
This paper introduces RoboCar, an open-source research platform for autonomous driving developed at the University of Luxembourg. RoboCar provides a modular, cost-effective framework for the development of experimental Autonomous Driving Systems (ADS), utilizing the 2018 KIA Soul EV. The platform integrates a robust hardware and software architecture that aligns with the vehicle's existing systems…
▽ More
This paper introduces RoboCar, an open-source research platform for autonomous driving developed at the University of Luxembourg. RoboCar provides a modular, cost-effective framework for the development of experimental Autonomous Driving Systems (ADS), utilizing the 2018 KIA Soul EV. The platform integrates a robust hardware and software architecture that aligns with the vehicle's existing systems, minimizing the need for extensive modifications. It supports various autonomous driving functions and has undergone real-world testing on public roads in Luxembourg City. This paper outlines the platform's architecture, integration challenges, and initial test results, offering insights into its application in advancing autonomous driving research. RoboCar is available to anyone at https://github.com/sntubix/robocar and is released under an open-source MIT license.
△ Less
Submitted 6 May, 2024;
originally announced May 2024.
-
A national longitudinal dataset of skills taught in U.S. higher education curricula
Authors:
Alireza Javadian Sabet,
Sarah H. Bana,
Renzhe Yu,
Morgan R. Frank
Abstract:
Higher education plays a critical role in driving an innovative economy by equipping students with knowledge and skills demanded by the workforce. While researchers and practitioners have developed data systems to track detailed occupational skills, such as those established by the U.S. Department of Labor (DOL), much less effort has been made to document skill development in higher education at a…
▽ More
Higher education plays a critical role in driving an innovative economy by equipping students with knowledge and skills demanded by the workforce. While researchers and practitioners have developed data systems to track detailed occupational skills, such as those established by the U.S. Department of Labor (DOL), much less effort has been made to document skill development in higher education at a similar granularity. Here, we fill this gap by presenting a longitudinal dataset of skills inferred from over three million course syllabi taught at nearly three thousand U.S. higher education institutions. To construct this dataset, we apply natural language processing to extract from course descriptions detailed workplace activities (DWAs) used by the DOL to describe occupations. We then aggregate these DWAs to create skill profiles for institutions and academic majors. Our dataset offers a large-scale representation of college-educated workers and their role in the economy. To showcase the utility of this dataset, we use it to 1) compare the similarity of skills taught and skills in the workforce according to the US Bureau of Labor Statistics, 2) estimate gender differences in acquired skills based on enrollment data, 3) depict temporal trends in the skills taught in social science curricula, and 4) connect college majors' skill distinctiveness to salary differences of graduates. Overall, this dataset can enable new research on the source of skills in the context of workforce development and provide actionable insights for shaping the future of higher education to meet evolving labor demands especially in the face of new technologies.
△ Less
Submitted 19 April, 2024;
originally announced April 2024.
-
LIEDER: Linguistically-Informed Evaluation for Discourse Entity Recognition
Authors:
Xiaomeng Zhu,
Robert Frank
Abstract:
Discourse Entity (DE) recognition is the task of identifying novel and known entities introduced within a text. While previous work has found that large language models have basic, if imperfect, DE recognition abilities (Schuster and Linzen, 2022), it remains largely unassessed which of the fundamental semantic properties that govern the introduction and subsequent reference to DEs they have knowl…
▽ More
Discourse Entity (DE) recognition is the task of identifying novel and known entities introduced within a text. While previous work has found that large language models have basic, if imperfect, DE recognition abilities (Schuster and Linzen, 2022), it remains largely unassessed which of the fundamental semantic properties that govern the introduction and subsequent reference to DEs they have knowledge of. We propose the Linguistically-Informed Evaluation for Discourse Entity Recognition (LIEDER) dataset that allows for a detailed examination of language models' knowledge of four crucial semantic properties: existence, uniqueness, plurality, and novelty. We find evidence that state-of-the-art large language models exhibit sensitivity to all of these properties except novelty, which demonstrates that they have yet to reach human-level language understanding abilities.
△ Less
Submitted 10 March, 2024;
originally announced March 2024.
-
Picsou: Enabling Efficient Cross-Consensus Communication
Authors:
Reginald Frank,
Micah Murray,
Suyash Gupta,
Ethan Xu,
Natacha Crooks,
Manos Kapritsos
Abstract:
Replicated state machines (RSMs) cannot effectively communicate today as there is no formal framework or efficient protocol to do so. To address this issue, we introduce a new primitive, the Cross-Cluster Consistent Broadcast (C3B) and present PICSOU, a practical C3B implementation. PICSOU draws inspiration from networking and TCP to allow two RSMs to communicate with constant metadata overhead in…
▽ More
Replicated state machines (RSMs) cannot effectively communicate today as there is no formal framework or efficient protocol to do so. To address this issue, we introduce a new primitive, the Cross-Cluster Consistent Broadcast (C3B) and present PICSOU, a practical C3B implementation. PICSOU draws inspiration from networking and TCP to allow two RSMs to communicate with constant metadata overhead in the failure-free case and minimal number of message resends in the case of failures. PICSOU is flexible and allows both crash fault-tolerant and byzantine fault-tolerant protocols to communicate. At the heart of PICSOU's good performance and generality lies a novel technique we call QUACKs (quorum acknowledgements) that allow nodes in each RSM to precisely determine when messages have definitely been received, or definitely been lost. Our results are promising: we obtain up to 24x better performance than existing all-to-all solutions.
△ Less
Submitted 18 December, 2023;
originally announced December 2023.
-
How Abstract Is Linguistic Generalization in Large Language Models? Experiments with Argument Structure
Authors:
Michael Wilson,
Jackson Petty,
Robert Frank
Abstract:
Language models are typically evaluated on their success at predicting the distribution of specific words in specific contexts. Yet linguistic knowledge also encodes relationships between contexts, allowing inferences between word distributions. We investigate the degree to which pre-trained Transformer-based large language models (LLMs) represent such relationships, focusing on the domain of argu…
▽ More
Language models are typically evaluated on their success at predicting the distribution of specific words in specific contexts. Yet linguistic knowledge also encodes relationships between contexts, allowing inferences between word distributions. We investigate the degree to which pre-trained Transformer-based large language models (LLMs) represent such relationships, focusing on the domain of argument structure. We find that LLMs perform well in generalizing the distribution of a novel noun argument between related contexts that were seen during pre-training (e.g., the active object and passive subject of the verb spray), succeeding by making use of the semantically-organized structure of the embedding space for word embeddings. However, LLMs fail at generalizations between related contexts that have not been observed during pre-training, but which instantiate more abstract, but well-attested structural generalizations (e.g., between the active object and passive subject of an arbitrary verb). Instead, in this case, LLMs show a bias to generalize based on linear order. This finding points to a limitation with current models and points to a reason for which their training is data-intensive.s reported here are available at https://github.com/clay-lab/structural-alternations.
△ Less
Submitted 8 November, 2023;
originally announced November 2023.
-
Brief for the Canada House of Commons Study on the Implications of Artificial Intelligence Technologies for the Canadian Labor Force: Generative Artificial Intelligence Shatters Models of AI and Labor
Authors:
Morgan R. Frank
Abstract:
Exciting advances in generative artificial intelligence (AI) have sparked concern for jobs, education, productivity, and the future of work. As with past technologies, generative AI may not lead to mass unemployment. But, unlike past technologies, generative AI is creative, cognitive, and potentially ubiquitous which makes the usual assumptions of automation predictions ill-suited for today. Exist…
▽ More
Exciting advances in generative artificial intelligence (AI) have sparked concern for jobs, education, productivity, and the future of work. As with past technologies, generative AI may not lead to mass unemployment. But, unlike past technologies, generative AI is creative, cognitive, and potentially ubiquitous which makes the usual assumptions of automation predictions ill-suited for today. Existing projections suggest that generative AI will impact workers in occupations that were previously considered immune to automation. As AI's full set of capabilities and applications emerge, policy makers should promote workers' career adaptability. This goal requires improved data on job separations and unemployment by locality and job titles in order to identify early-indicators for the workers facing labor disruption. Further, prudent policy should incentivize education programs to accommodate learning with AI as a tool while preparing students for the demands of the future of work.
△ Less
Submitted 6 November, 2023;
originally announced November 2023.
-
Leveraging the Edge and Cloud for V2X-Based Real-Time Object Detection in Autonomous Driving
Authors:
Faisal Hawlader,
François Robinet,
Raphaël Frank
Abstract:
Environmental perception is a key element of autonomous driving because the information received from the perception module influences core driving decisions. An outstanding challenge in real-time perception for autonomous driving lies in finding the best trade-off between detection quality and latency. Major constraints on both computation and power have to be taken into account for real-time per…
▽ More
Environmental perception is a key element of autonomous driving because the information received from the perception module influences core driving decisions. An outstanding challenge in real-time perception for autonomous driving lies in finding the best trade-off between detection quality and latency. Major constraints on both computation and power have to be taken into account for real-time perception in autonomous vehicles. Larger object detection models tend to produce the best results, but are also slower at runtime. Since the most accurate detectors cannot run in real-time locally, we investigate the possibility of offloading computation to edge and cloud platforms, which are less resource-constrained. We create a synthetic dataset to train object detection models and evaluate different offloading strategies. Using real hardware and network simulations, we compare different trade-offs between prediction quality and end-to-end delay. Since sending raw frames over the network implies additional transmission delays, we also explore the use of JPEG and H.265 compression at varying qualities and measure their impact on prediction metrics. We show that models with adequate compression can be run in real-time on the cloud while outperforming local detection performance.
△ Less
Submitted 9 August, 2023;
originally announced August 2023.
-
Towards a Safe Real-Time Motion Planning Framework for Autonomous Driving Systems: An MPPI Approach
Authors:
Mehdi Testouri,
Gamal Elghazaly,
Raphael Frank
Abstract:
Planning safe trajectories in Autonomous Driving Systems (ADS) is a complex problem to solve in real-time. The main challenge to solve this problem arises from the various conditions and constraints imposed by road geometry, semantics and traffic rules, as well as the presence of dynamic agents. Recently, Model Predictive Path Integral (MPPI) has shown to be an effective framework for optimal moti…
▽ More
Planning safe trajectories in Autonomous Driving Systems (ADS) is a complex problem to solve in real-time. The main challenge to solve this problem arises from the various conditions and constraints imposed by road geometry, semantics and traffic rules, as well as the presence of dynamic agents. Recently, Model Predictive Path Integral (MPPI) has shown to be an effective framework for optimal motion planning and control in robot navigation in unstructured and highly uncertain environments. In this paper, we formulate the motion planning problem in ADS as a nonlinear stochastic dynamic optimization problem that can be solved using an MPPI strategy. The main technical contribution of this work is a method to handle obstacles within the MPPI formulation safely. In this method, obstacles are approximated by circles that can be easily integrated into the MPPI cost formulation while considering safety margins. The proposed MPPI framework has been efficiently implemented in our autonomous vehicle and experimentally validated using three different primitive scenarios. Experimental results show that generated trajectories are safe, feasible and perfectly achieve the planning objective. The video results as well as the open-source implementation are available at: https://gitlab.uni.lu/360lab-public/mppi
△ Less
Submitted 6 May, 2024; v1 submitted 3 August, 2023;
originally announced August 2023.
-
The Resume Paradox: Greater Language Differences, Smaller Pay Gaps
Authors:
Joshua R. Minot,
Marc Maier,
Bradford Demarest,
Nicholas Cheney,
Christopher M. Danforth,
Peter Sheridan Dodds,
Morgan R. Frank
Abstract:
Over the past decade, the gender pay gap has remained steady with women earning 84 cents for every dollar earned by men on average. Many studies explain this gap through demand-side bias in the labor market represented through employers' job postings. However, few studies analyze potential bias from the worker supply-side. Here, we analyze the language in millions of US workers' resumes to investi…
▽ More
Over the past decade, the gender pay gap has remained steady with women earning 84 cents for every dollar earned by men on average. Many studies explain this gap through demand-side bias in the labor market represented through employers' job postings. However, few studies analyze potential bias from the worker supply-side. Here, we analyze the language in millions of US workers' resumes to investigate how differences in workers' self-representation by gender compare to differences in earnings. Across US occupations, language differences between male and female resumes correspond to 11% of the variation in gender pay gap. This suggests that females' resumes that are semantically similar to males' resumes may have greater wage parity. However, surprisingly, occupations with greater language differences between male and female resumes have lower gender pay gaps. A doubling of the language difference between female and male resumes results in an annual wage increase of $2,797 for the average female worker. This result holds with controls for gender-biases of resume text and we find that per-word bias poorly describes the variance in wage gap. The results demonstrate that textual data and self-representation are valuable factors for improving worker representations and understanding employment inequities.
△ Less
Submitted 17 July, 2023;
originally announced July 2023.
-
Art and the science of generative AI: A deeper dive
Authors:
Ziv Epstein,
Aaron Hertzmann,
Laura Herman,
Robert Mahari,
Morgan R. Frank,
Matthew Groh,
Hope Schroeder,
Amy Smith,
Memo Akten,
Jessica Fjeld,
Hany Farid,
Neil Leach,
Alex Pentland,
Olga Russakovsky
Abstract:
A new class of tools, colloquially called generative AI, can produce high-quality artistic media for visual arts, concept art, music, fiction, literature, video, and animation. The generative capabilities of these tools are likely to fundamentally alter the creative processes by which creators formulate ideas and put them into production. As creativity is reimagined, so too may be many sectors of…
▽ More
A new class of tools, colloquially called generative AI, can produce high-quality artistic media for visual arts, concept art, music, fiction, literature, video, and animation. The generative capabilities of these tools are likely to fundamentally alter the creative processes by which creators formulate ideas and put them into production. As creativity is reimagined, so too may be many sectors of society. Understanding the impact of generative AI - and making policy decisions around it - requires new interdisciplinary scientific inquiry into culture, economics, law, algorithms, and the interaction of technology and creativity. We argue that generative AI is not the harbinger of art's demise, but rather is a new medium with its own distinct affordances. In this vein, we consider the impacts of this new medium on creators across four themes: aesthetics and culture, legal questions of ownership and credit, the future of creative work, and impacts on the contemporary media ecosystem. Across these themes, we highlight key research questions and directions to inform policy and beneficial uses of the technology.
△ Less
Submitted 7 June, 2023;
originally announced June 2023.
-
False perspectives on human language: why statistics needs linguistics
Authors:
Matteo Greco,
Andrea Cometa,
Fiorenzo Artoni,
Robert Frank,
Andrea Moro
Abstract:
A sharp tension exists about the nature of human language between two opposite parties: those who believe that statistical surface distributions, in particular using measures like surprisal, provide a better understanding of language processing, vs. those who believe that discrete hierarchical structures implementing linguistic information such as syntactic ones are a better tool. In this paper, w…
▽ More
A sharp tension exists about the nature of human language between two opposite parties: those who believe that statistical surface distributions, in particular using measures like surprisal, provide a better understanding of language processing, vs. those who believe that discrete hierarchical structures implementing linguistic information such as syntactic ones are a better tool. In this paper, we show that this dichotomy is a false one. Relying on the fact that statistical measures can be defined on the basis of either structural or non-structural models, we provide empirical evidence that only models of surprisal that reflect syntactic structure are able to account for language regularities.
△ Less
Submitted 17 February, 2023;
originally announced February 2023.
-
How poor is the stimulus? Evaluating hierarchical generalization in neural networks trained on child-directed speech
Authors:
Aditya Yedetore,
Tal Linzen,
Robert Frank,
R. Thomas McCoy
Abstract:
When acquiring syntax, children consistently choose hierarchical rules over competing non-hierarchical possibilities. Is this preference due to a learning bias for hierarchical structure, or due to more general biases that interact with hierarchical cues in children's linguistic input? We explore these possibilities by training LSTMs and Transformers - two types of neural networks without a hierar…
▽ More
When acquiring syntax, children consistently choose hierarchical rules over competing non-hierarchical possibilities. Is this preference due to a learning bias for hierarchical structure, or due to more general biases that interact with hierarchical cues in children's linguistic input? We explore these possibilities by training LSTMs and Transformers - two types of neural networks without a hierarchical bias - on data similar in quantity and content to children's linguistic input: text from the CHILDES corpus. We then evaluate what these models have learned about English yes/no questions, a phenomenon for which hierarchical structure is crucial. We find that, though they perform well at capturing the surface statistics of child-directed speech (as measured by perplexity), both model types generalize in a way more consistent with an incorrect linear rule than the correct hierarchical rule. These results suggest that human-like generalization from text alone requires stronger biases than the general sequence-processing biases of standard neural network architectures.
△ Less
Submitted 6 June, 2023; v1 submitted 26 January, 2023;
originally announced January 2023.
-
FastCycle: A Message Sharing Framework for Modular Automated Driving Systems
Authors:
Mehdi Testouri,
Gamal Elghazaly,
Raphael Frank
Abstract:
Automated Driving Systems (ADS) have rapidly evolved in recent years and their architecture becomes sophisticated. Ensuring robustness, reliability and safety of performance is particularly important. The main challenge in building an ADS is the ability to meet certain stringent performance requirements in terms of both making safe operational decisions and finishing processing in real-time. Middl…
▽ More
Automated Driving Systems (ADS) have rapidly evolved in recent years and their architecture becomes sophisticated. Ensuring robustness, reliability and safety of performance is particularly important. The main challenge in building an ADS is the ability to meet certain stringent performance requirements in terms of both making safe operational decisions and finishing processing in real-time. Middlewares play a crucial role to handle these requirements in ADS. The way middlewares share data between the different system components has a direct impact on the overall performance, particularly the latency overhead. To this end, this paper presents FastCycle as a lightweight multi-threaded zero-copy messaging broker to meet the requirements of a high fidelity ADS in terms of modularity, real-time performance and security. We discuss the architecture and the main features of the proposed framework. Evaluation of the proposed framework based on standard metrics in comparison with popular middlewares used in robotics and automated driving shows the improved performance of our framework. The implementation of FastCycle and the associated comparisons with other frameworks are open sourced.
△ Less
Submitted 28 November, 2022;
originally announced November 2022.
-
Connected Vehicle Platforms for Dynamic Insurance
Authors:
Christian Colot,
Francois Robinet,
Geoffrey Nichils,
Raphael Frank
Abstract:
Following a regulatory change in Europe which mandates that car manufacturers include an eCall system in new vehicles, many car manufacturers are adding additional services on top, so that more and more cars become connected vehicles and act like IoT sensors. In the following study, we analyse the maturity level of this new technology to build insurance products that would take vehicle usage into…
▽ More
Following a regulatory change in Europe which mandates that car manufacturers include an eCall system in new vehicles, many car manufacturers are adding additional services on top, so that more and more cars become connected vehicles and act like IoT sensors. In the following study, we analyse the maturity level of this new technology to build insurance products that would take vehicle usage into account. For this, the connectivity of recent cars a-priori eligible has been first tested. Then, an ad-hoc platform has been designed to collect driving data. In particular, 4 cars have been connected to this platform for periods of over one month. Our results highlight that, while this technological innovation appears very promising in the future, the pricing, the lack of uniformity of data collected and the enrollment process are currently three pain points that should be addressed to offer large-scale opportunities. In the meantime, this technology might still be used for high value use cases such as the insurance of luxurious cars.
△ Less
Submitted 1 August, 2022;
originally announced August 2022.
-
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models
Authors:
Aarohi Srivastava,
Abhinav Rastogi,
Abhishek Rao,
Abu Awal Md Shoeb,
Abubakar Abid,
Adam Fisch,
Adam R. Brown,
Adam Santoro,
Aditya Gupta,
Adrià Garriga-Alonso,
Agnieszka Kluska,
Aitor Lewkowycz,
Akshat Agarwal,
Alethea Power,
Alex Ray,
Alex Warstadt,
Alexander W. Kocurek,
Ali Safaya,
Ali Tazarv,
Alice Xiang,
Alicia Parrish,
Allen Nie,
Aman Hussain,
Amanda Askell,
Amanda Dsouza
, et al. (426 additional authors not shown)
Abstract:
Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-futur…
▽ More
Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-future capabilities and limitations of language models. To address this challenge, we introduce the Beyond the Imitation Game benchmark (BIG-bench). BIG-bench currently consists of 204 tasks, contributed by 450 authors across 132 institutions. Task topics are diverse, drawing problems from linguistics, childhood development, math, common-sense reasoning, biology, physics, social bias, software development, and beyond. BIG-bench focuses on tasks that are believed to be beyond the capabilities of current language models. We evaluate the behavior of OpenAI's GPT models, Google-internal dense transformer architectures, and Switch-style sparse transformers on BIG-bench, across model sizes spanning millions to hundreds of billions of parameters. In addition, a team of human expert raters performed all tasks in order to provide a strong baseline. Findings include: model performance and calibration both improve with scale, but are poor in absolute terms (and when compared with rater performance); performance is remarkably similar across model classes, though with benefits from sparsity; tasks that improve gradually and predictably commonly involve a large knowledge or memorization component, whereas tasks that exhibit "breakthrough" behavior at a critical scale often involve multiple steps or components, or brittle metrics; social bias typically increases with scale in settings with ambiguous context, but this can be improved with prompting.
△ Less
Submitted 12 June, 2023; v1 submitted 9 June, 2022;
originally announced June 2022.
-
Formal Language Recognition by Hard Attention Transformers: Perspectives from Circuit Complexity
Authors:
Yiding Hao,
Dana Angluin,
Robert Frank
Abstract:
This paper analyzes three formal models of Transformer encoders that differ in the form of their self-attention mechanism: unique hard attention (UHAT); generalized unique hard attention (GUHAT), which generalizes UHAT; and averaging hard attention (AHAT). We show that UHAT and GUHAT Transformers, viewed as string acceptors, can only recognize formal languages in the complexity class AC$^0$, the c…
▽ More
This paper analyzes three formal models of Transformer encoders that differ in the form of their self-attention mechanism: unique hard attention (UHAT); generalized unique hard attention (GUHAT), which generalizes UHAT; and averaging hard attention (AHAT). We show that UHAT and GUHAT Transformers, viewed as string acceptors, can only recognize formal languages in the complexity class AC$^0$, the class of languages recognizable by families of Boolean circuits of constant depth and polynomial size. This upper bound subsumes Hahn's (2020) results that GUHAT cannot recognize the DYCK languages or the PARITY language, since those languages are outside AC$^0$ (Furst et al., 1984). In contrast, the non-AC$^0$ languages MAJORITY and DYCK-1 are recognizable by AHAT networks, implying that AHAT can recognize languages that UHAT and GUHAT cannot.
△ Less
Submitted 13 April, 2022;
originally announced April 2022.
-
Coloring the Blank Slate: Pre-training Imparts a Hierarchical Inductive Bias to Sequence-to-sequence Models
Authors:
Aaron Mueller,
Robert Frank,
Tal Linzen,
Luheng Wang,
Sebastian Schuster
Abstract:
Relations between words are governed by hierarchical structure rather than linear ordering. Sequence-to-sequence (seq2seq) models, despite their success in downstream NLP applications, often fail to generalize in a hierarchy-sensitive manner when performing syntactic transformations - for example, transforming declarative sentences into questions. However, syntactic evaluations of seq2seq models h…
▽ More
Relations between words are governed by hierarchical structure rather than linear ordering. Sequence-to-sequence (seq2seq) models, despite their success in downstream NLP applications, often fail to generalize in a hierarchy-sensitive manner when performing syntactic transformations - for example, transforming declarative sentences into questions. However, syntactic evaluations of seq2seq models have only observed models that were not pre-trained on natural language data before being trained to perform syntactic transformations, in spite of the fact that pre-training has been found to induce hierarchical linguistic generalizations in language models; in other words, the syntactic capabilities of seq2seq models may have been greatly understated. We address this gap using the pre-trained seq2seq models T5 and BART, as well as their multilingual variants mT5 and mBART. We evaluate whether they generalize hierarchically on two transformations in two languages: question formation and passivization in English and German. We find that pre-trained seq2seq models generalize hierarchically when performing syntactic transformations, whereas models trained from scratch on syntactic transformations do not. This result presents evidence for the learnability of hierarchical syntactic information from non-annotated natural language text while also demonstrating that seq2seq models are capable of syntactic generalization, though only after exposure to much more language data than human learners receive.
△ Less
Submitted 17 March, 2022;
originally announced March 2022.
-
Do Language Models Learn Position-Role Mappings?
Authors:
Jackson Petty,
Michael Wilson,
Robert Frank
Abstract:
How is knowledge of position-role mappings in natural language learned? We explore this question in a computational setting, testing whether a variety of well-performing pertained language models (BERT, RoBERTa, and DistilBERT) exhibit knowledge of these mappings, and whether this knowledge persists across alternations in syntactic, structural, and lexical alternations. In Experiment 1, we show th…
▽ More
How is knowledge of position-role mappings in natural language learned? We explore this question in a computational setting, testing whether a variety of well-performing pertained language models (BERT, RoBERTa, and DistilBERT) exhibit knowledge of these mappings, and whether this knowledge persists across alternations in syntactic, structural, and lexical alternations. In Experiment 1, we show that these neural models do indeed recognize distinctions between theme and recipient roles in ditransitive constructions, and that these distinct patterns are shared across construction type. We strengthen this finding in Experiment 2 by showing that fine-tuning these language models on novel theme- and recipient-like tokens in one paradigm allows the models to make correct predictions about their placement in other paradigms, suggesting that the knowledge of these mappings is shared rather than independently learned. We do, however, observe some limitations of this generalization when tasks involve constructions with novel ditransitive verbs, hinting at a degree of lexical specificity which underlies model performance.
△ Less
Submitted 7 February, 2022;
originally announced February 2022.
-
Exposure of occupations to technologies of the fourth industrial revolution
Authors:
Benjamin Meindl,
Morgan R. Frank,
Joana Mendonça
Abstract:
The fourth industrial revolution (4IR) is likely to have a substantial impact on the economy. Companies need to build up capabilities to implement new technologies, and automation may make some occupations obsolete. However, where, when, and how the change will happen remain to be determined. Robust empirical indicators of technological progress linked to occupations can help to illuminate this ch…
▽ More
The fourth industrial revolution (4IR) is likely to have a substantial impact on the economy. Companies need to build up capabilities to implement new technologies, and automation may make some occupations obsolete. However, where, when, and how the change will happen remain to be determined. Robust empirical indicators of technological progress linked to occupations can help to illuminate this change. With this aim, we provide such an indicator based on patent data. Using natural language processing, we calculate patent exposure scores for more than 900 occupations, which represent the technological progress related to them. To provide a lens on the impact of the 4IR, we differentiate between traditional and 4IR patent exposure. Our method differs from previous approaches in that it both accounts for the diversity of task-level patent exposures within an occupation and reflects work activities more accurately. We find that exposure to 4IR patents differs from traditional patent exposure. Manual tasks, and accordingly occupations such as construction and production, are exposed mainly to traditional (non-4IR) patents but have low exposure to 4IR patents. The analysis suggests that 4IR technologies may have a negative impact on job growth; this impact appears 10 to 20 years after patent filing. Further, we compared the 4IR exposure to other automation and AI exposure scores. Whereas many measures refer to theoretical automation potential, our patent-based indicator reflects actual technology diffusion. Our work not only allows analyses of the impact of 4IR technologies as a whole, but also provides exposure scores for more than 300 technology fields, such as AI and smart office technologies. Finally, the work provides a general mapping of patents to tasks and occupations, which enables future researchers to construct individual exposure measures.
△ Less
Submitted 25 October, 2021;
originally announced October 2021.
-
Transformers Generalize Linearly
Authors:
Jackson Petty,
Robert Frank
Abstract:
Natural language exhibits patterns of hierarchically governed dependencies, in which relations between words are sensitive to syntactic structure rather than linear ordering. While re-current network models often fail to generalize in a hierarchically sensitive way (McCoy et al.,2020) when trained on ambiguous data, the improvement in performance of newer Trans-former language models (Vaswani et a…
▽ More
Natural language exhibits patterns of hierarchically governed dependencies, in which relations between words are sensitive to syntactic structure rather than linear ordering. While re-current network models often fail to generalize in a hierarchically sensitive way (McCoy et al.,2020) when trained on ambiguous data, the improvement in performance of newer Trans-former language models (Vaswani et al., 2017)on a range of syntactic benchmarks trained on large data sets (Goldberg, 2019; Warstadtet al., 2019) opens the question of whether these models might exhibit hierarchical generalization in the face of impoverished data.In this paper we examine patterns of structural generalization for Transformer sequence-to-sequence models and find that not only do Transformers fail to generalize hierarchically across a wide variety of grammatical mapping tasks, but they exhibit an even stronger preference for linear generalization than comparable recurrent networks
△ Less
Submitted 24 September, 2021;
originally announced September 2021.
-
Semantic Knowledge Discovery and Discussion Mining of Incel Online Community: Topic modeling
Authors:
Hamed Jelodar,
Richard Frank
Abstract:
Online forums provide a unique opportunity for online users to share comments and exchange information on a particular topic. Understanding user behaviour is valuable to organizations and has applications for social and security strategies, for instance, identifying user opinions within a community or predicting future behaviour. Discovering the semantic aspects in Incel forums are the main goal o…
▽ More
Online forums provide a unique opportunity for online users to share comments and exchange information on a particular topic. Understanding user behaviour is valuable to organizations and has applications for social and security strategies, for instance, identifying user opinions within a community or predicting future behaviour. Discovering the semantic aspects in Incel forums are the main goal of this research; we apply Natural language processing techniques based on topic modeling to latent topic discovery and opinion mining of users from a popular online Incel discussion forum. To prepare the input data for our study, we extracted the comments from Incels.co. The research experiments show that Artificial Intelligence (AI) based on NLP models can be effective for semantic and emotion knowledge discovery and retrieval of useful information from the Incel community. For example, we discovered semantic-related words that describe issues within a large volume of Incel comments, which is difficult with manual methods.
△ Less
Submitted 21 April, 2021; v1 submitted 19 April, 2021;
originally announced April 2021.
-
A Survey on Data Plane Programming with P4: Fundamentals, Advances, and Applied Research
Authors:
Frederik Hauser,
Marco Häberle,
Daniel Merling,
Steffen Lindner,
Vladimir Gurevich,
Florian Zeiger,
Reinhard Frank,
Michael Menth
Abstract:
Programmable data planes allow users to define their own data plane algorithms for network devices including appropriate data plane application programming interfaces (APIs) which may be leveraged by user-defined software-defined networking (SDN) control. This offers great flexibility for network customization, be it for specialized, commercial appliances, e.g., in 5G or data center networks, or f…
▽ More
Programmable data planes allow users to define their own data plane algorithms for network devices including appropriate data plane application programming interfaces (APIs) which may be leveraged by user-defined software-defined networking (SDN) control. This offers great flexibility for network customization, be it for specialized, commercial appliances, e.g., in 5G or data center networks, or for rapid prototyping in industrial and academic research. Programming protocol-independent packet processors (P4) has emerged as the currently most widespread abstraction, programming language, and concept for data plane programming. It is developed and standardized by an open community, and it is supported by various software and hardware platforms. In the first part of this paper we give a tutorial of data plane programming models, the P4 programming language, architectures, compilers, targets, and data plane APIs. We also consider research efforts to advance P4 technology. In the second part, we categorize a large body of literature of P4-based applied research into different research domains, summarize the contributions of these papers, and extract prototypes, target platforms, and source code availability. For each research domain, we analyze how the reviewed works benefit from P4's core features. Finally, we discuss potential next steps based on our findings.
△ Less
Submitted 4 August, 2021; v1 submitted 26 January, 2021;
originally announced January 2021.
-
Sequence-to-Sequence Networks Learn the Meaning of Reflexive Anaphora
Authors:
Robert Frank,
Jackson Petty
Abstract:
Reflexive anaphora present a challenge for semantic interpretation: their meaning varies depending on context in a way that appears to require abstract variables. Past work has raised doubts about the ability of recurrent networks to meet this challenge. In this paper, we explore this question in the context of a fragment of English that incorporates the relevant sort of contextual variability. We…
▽ More
Reflexive anaphora present a challenge for semantic interpretation: their meaning varies depending on context in a way that appears to require abstract variables. Past work has raised doubts about the ability of recurrent networks to meet this challenge. In this paper, we explore this question in the context of a fragment of English that incorporates the relevant sort of contextual variability. We consider sequence-to-sequence architectures with recurrent units and show that such networks are capable of learning semantic interpretations for reflexive anaphora which generalize to novel antecedents. We explore the effect of attention mechanisms and different recurrent unit types on the type of training data that is needed for success as measured in two ways: how much lexical support is needed to induce an abstract reflexive meaning (i.e., how many distinct reflexive antecedents must occur during training) and what contexts must a noun phrase occur in to support generalization of reflexive interpretation to this noun phrase?
△ Less
Submitted 1 November, 2020;
originally announced November 2020.
-
Industrial Topics in Urban Labor System
Authors:
Jaehyuk Park,
Morgan R. Frank,
Lijun Sun,
Hyejin Youn
Abstract:
Categorization is an essential component for us to understand the world for ourselves and to communicate it collectively. It is therefore important to recognize that classification system are not necessarily static, especially for economic systems, and even more so in urban areas where most innovation takes place and is implemented. Out-of-date classification systems would potentially limit furthe…
▽ More
Categorization is an essential component for us to understand the world for ourselves and to communicate it collectively. It is therefore important to recognize that classification system are not necessarily static, especially for economic systems, and even more so in urban areas where most innovation takes place and is implemented. Out-of-date classification systems would potentially limit further understanding of the current economy because things constantly change. Here, we develop an occupation-based classification system for the US labor economy, called industrial topics, that satisfy adaptability and representability. By leveraging the distributions of occupations across the US urban areas, we identify industrial topics - clusters of occupations based on their co-existence pattern. Industrial topics indicate the mechanisms under the systematic allocation of different occupations. Considering the densely connected occupations as an industrial topic, our approach characterizes regional economies by their topical composition. Unlike the existing survey-based top-down approach, our method provides timely information about the underlying structure of the regional economy, which is critical for policymakers and business leaders, especially in our fast-changing economy.
△ Less
Submitted 17 September, 2020;
originally announced September 2020.
-
Probabilistic Predictions of People Perusing: Evaluating Metrics of Language Model Performance for Psycholinguistic Modeling
Authors:
Yiding Hao,
Simon Mendelsohn,
Rachel Sterneck,
Randi Martinez,
Robert Frank
Abstract:
By positing a relationship between naturalistic reading times and information-theoretic surprisal, surprisal theory (Hale, 2001; Levy, 2008) provides a natural interface between language models and psycholinguistic models. This paper re-evaluates a claim due to Goodkind and Bicknell (2018) that a language model's ability to model reading times is a linear function of its perplexity. By extending G…
▽ More
By positing a relationship between naturalistic reading times and information-theoretic surprisal, surprisal theory (Hale, 2001; Levy, 2008) provides a natural interface between language models and psycholinguistic models. This paper re-evaluates a claim due to Goodkind and Bicknell (2018) that a language model's ability to model reading times is a linear function of its perplexity. By extending Goodkind and Bicknell's analysis to modern neural architectures, we show that the proposed relation does not always hold for Long Short-Term Memory networks, Transformers, and pre-trained models. We introduce an alternate measure of language modeling performance called predictability norm correlation based on Cloze probabilities measured from human subjects. Our new metric yields a more robust relationship between language model quality and psycholinguistic modeling performance that allows for comparison between models with different training configurations.
△ Less
Submitted 8 September, 2020;
originally announced September 2020.
-
Generalized Word Shift Graphs: A Method for Visualizing and Explaining Pairwise Comparisons Between Texts
Authors:
Ryan J. Gallagher,
Morgan R. Frank,
Lewis Mitchell,
Aaron J. Schwartz,
Andrew J. Reagan,
Christopher M. Danforth,
Peter Sheridan Dodds
Abstract:
A common task in computational text analyses is to quantify how two corpora differ according to a measurement like word frequency, sentiment, or information content. However, collapsing the texts' rich stories into a single number is often conceptually perilous, and it is difficult to confidently interpret interesting or unexpected textual patterns without looming concerns about data artifacts or…
▽ More
A common task in computational text analyses is to quantify how two corpora differ according to a measurement like word frequency, sentiment, or information content. However, collapsing the texts' rich stories into a single number is often conceptually perilous, and it is difficult to confidently interpret interesting or unexpected textual patterns without looming concerns about data artifacts or measurement validity. To better capture fine-grained differences between texts, we introduce generalized word shift graphs, visualizations which yield a meaningful and interpretable summary of how individual words contribute to the variation between two texts for any measure that can be formulated as a weighted average. We show that this framework naturally encompasses many of the most commonly used approaches for comparing texts, including relative frequencies, dictionary scores, and entropy-based measures like the Kullback-Leibler and Jensen-Shannon divergences. Through several case studies, we demonstrate how generalized word shift graphs can be flexibly applied across domains for diagnostic investigation, hypothesis generation, and substantive interpretation. By providing a detailed lens into textual shifts between corpora, generalized word shift graphs help computational social scientists, digital humanists, and other text analysis practitioners fashion more robust scientific narratives.
△ Less
Submitted 5 August, 2020;
originally announced August 2020.
-
Does syntax need to grow on trees? Sources of hierarchical inductive bias in sequence-to-sequence networks
Authors:
R. Thomas McCoy,
Robert Frank,
Tal Linzen
Abstract:
Learners that are exposed to the same training data might generalize differently due to differing inductive biases. In neural network models, inductive biases could in theory arise from any aspect of the model architecture. We investigate which architectural factors affect the generalization behavior of neural sequence-to-sequence models trained on two syntactic tasks, English question formation a…
▽ More
Learners that are exposed to the same training data might generalize differently due to differing inductive biases. In neural network models, inductive biases could in theory arise from any aspect of the model architecture. We investigate which architectural factors affect the generalization behavior of neural sequence-to-sequence models trained on two syntactic tasks, English question formation and English tense reinflection. For both tasks, the training set is consistent with a generalization based on hierarchical structure and a generalization based on linear order. All architectural factors that we investigated qualitatively affected how models generalized, including factors with no clear connection to hierarchical structure. For example, LSTMs and GRUs displayed qualitatively different inductive biases. However, the only factor that consistently contributed a hierarchical bias across tasks was the use of a tree-structured model rather than a model with sequential recurrence, suggesting that human-like syntactic generalization requires architectural syntactic structure.
△ Less
Submitted 10 January, 2020;
originally announced January 2020.
-
Open Sesame: Getting Inside BERT's Linguistic Knowledge
Authors:
Yongjie Lin,
Yi Chern Tan,
Robert Frank
Abstract:
How and to what extent does BERT encode syntactically-sensitive hierarchical information or positionally-sensitive linear information? Recent work has shown that contextual representations like BERT perform well on tasks that require sensitivity to linguistic structure. We present here two studies which aim to provide a better understanding of the nature of BERT's representations. The first of the…
▽ More
How and to what extent does BERT encode syntactically-sensitive hierarchical information or positionally-sensitive linear information? Recent work has shown that contextual representations like BERT perform well on tasks that require sensitivity to linguistic structure. We present here two studies which aim to provide a better understanding of the nature of BERT's representations. The first of these focuses on the identification of structurally-defined elements using diagnostic classifiers, while the second explores BERT's representation of subject-verb agreement and anaphor-antecedent dependencies through a quantitative assessment of self-attention vectors. In both cases, we find that BERT encodes positional information about word tokens well on its lower layers, but switches to a hierarchically-oriented encoding on higher layers. We conclude then that BERT's representations do indeed model linguistically relevant aspects of hierarchical structure, though they do not appear to show the sharp sensitivity to hierarchical structure that is found in human processing of reflexive anaphora.
△ Less
Submitted 4 June, 2019;
originally announced June 2019.
-
Detecting Syntactic Change Using a Neural Part-of-Speech Tagger
Authors:
William Merrill,
Gigi Felice Stark,
Robert Frank
Abstract:
We train a diachronic long short-term memory (LSTM) part-of-speech tagger on a large corpus of American English from the 19th, 20th, and 21st centuries. We analyze the tagger's ability to implicitly learn temporal structure between years, and the extent to which this knowledge can be transferred to date new sentences. The learned year embeddings show a strong linear correlation between their first…
▽ More
We train a diachronic long short-term memory (LSTM) part-of-speech tagger on a large corpus of American English from the 19th, 20th, and 21st centuries. We analyze the tagger's ability to implicitly learn temporal structure between years, and the extent to which this knowledge can be transferred to date new sentences. The learned year embeddings show a strong linear correlation between their first principal component and time. We show that temporal information encoded in the model can be used to predict novel sentences' years of composition relatively well. Comparisons to a feedforward baseline suggest that the temporal change learned by the LSTM is syntactic rather than purely lexical. Thus, our results suggest that our tagger is implicitly learning to model syntactic change in American English over the course of the 19th, 20th, and early 21st centuries.
△ Less
Submitted 9 July, 2019; v1 submitted 4 June, 2019;
originally announced June 2019.
-
Finding Syntactic Representations in Neural Stacks
Authors:
William Merrill,
Lenny Khazan,
Noah Amsel,
Yiding Hao,
Simon Mendelsohn,
Robert Frank
Abstract:
Neural network architectures have been augmented with differentiable stacks in order to introduce a bias toward learning hierarchy-sensitive regularities. It has, however, proven difficult to assess the degree to which such a bias is effective, as the operation of the differentiable stack is not always interpretable. In this paper, we attempt to detect the presence of latent representations of hie…
▽ More
Neural network architectures have been augmented with differentiable stacks in order to introduce a bias toward learning hierarchy-sensitive regularities. It has, however, proven difficult to assess the degree to which such a bias is effective, as the operation of the differentiable stack is not always interpretable. In this paper, we attempt to detect the presence of latent representations of hierarchical structure through an exploration of the unsupervised learning of constituency structure. Using a technique due to Shen et al. (2018a,b), we extract syntactic trees from the pushing behavior of stack RNNs trained on language modeling and classification objectives. We find that our models produce parses that reflect natural language syntactic constituencies, demonstrating that stack RNNs do indeed infer linguistically relevant hierarchical structure.
△ Less
Submitted 4 June, 2019;
originally announced June 2019.
-
Syntax-aware Neural Semantic Role Labeling with Supertags
Authors:
Jungo Kasai,
Dan Friedman,
Robert Frank,
Dragomir Radev,
Owen Rambow
Abstract:
We introduce a new syntax-aware model for dependency-based semantic role labeling that outperforms syntax-agnostic models for English and Spanish. We use a BiLSTM to tag the text with supertags extracted from dependency parses, and we feed these supertags, along with words and parts of speech, into a deep highway BiLSTM for semantic role labeling. Our model combines the strengths of earlier models…
▽ More
We introduce a new syntax-aware model for dependency-based semantic role labeling that outperforms syntax-agnostic models for English and Spanish. We use a BiLSTM to tag the text with supertags extracted from dependency parses, and we feed these supertags, along with words and parts of speech, into a deep highway BiLSTM for semantic role labeling. Our model combines the strengths of earlier models that performed SRL on the basis of a full dependency parse with more recent models that use no syntactic information at all. Our local and non-ensemble model achieves state-of-the-art performance on the CoNLL 09 English and Spanish datasets. SRL models benefit from syntactic information, and we show that supertagging is a simple, powerful, and robust way to incorporate syntax into a neural SRL system.
△ Less
Submitted 3 April, 2019; v1 submitted 12 March, 2019;
originally announced March 2019.
-
Context-Free Transductions with Neural Stacks
Authors:
Yiding Hao,
William Merrill,
Dana Angluin,
Robert Frank,
Noah Amsel,
Andrew Benz,
Simon Mendelsohn
Abstract:
This paper analyzes the behavior of stack-augmented recurrent neural network (RNN) models. Due to the architectural similarity between stack RNNs and pushdown transducers, we train stack RNN models on a number of tasks, including string reversal, context-free language modelling, and cumulative XOR evaluation. Examining the behavior of our networks, we show that stack-augmented RNNs can discover in…
▽ More
This paper analyzes the behavior of stack-augmented recurrent neural network (RNN) models. Due to the architectural similarity between stack RNNs and pushdown transducers, we train stack RNN models on a number of tasks, including string reversal, context-free language modelling, and cumulative XOR evaluation. Examining the behavior of our networks, we show that stack-augmented RNNs can discover intuitive stack-based strategies for solving our tasks. However, stack RNNs are more difficult to train than classical architectures such as LSTMs. Rather than employ stack-based strategies, more complex networks often find approximate solutions by using the stack as unstructured memory.
△ Less
Submitted 8 September, 2018;
originally announced September 2018.
-
A Tight Lower Bound for Clock Synchronization in Odd-Ary M-Toroids
Authors:
Reginald Frank,
Jennifer L. Welch
Abstract:
Synchronizing clocks in a distributed system in which processes communicate through messages with uncertain delays is subject to inherent errors. Prior work has shown upper and lower bounds on the best synchronization achievable in a variety of network topologies and assumptions about the uncertainty on the message delays. However, until now there has not been a tight closed-form expression for th…
▽ More
Synchronizing clocks in a distributed system in which processes communicate through messages with uncertain delays is subject to inherent errors. Prior work has shown upper and lower bounds on the best synchronization achievable in a variety of network topologies and assumptions about the uncertainty on the message delays. However, until now there has not been a tight closed-form expression for the optimal synchronization in $k$-ary $m$-cubes with wraparound, where $k$ is odd. In this paper, we prove a lower bound of $\frac{1}{4}um\left(k-\frac{1}{k}\right)$, where $k$ is the (odd) number of processes in the each of the $m$ dimensions, and $u$ is the uncertainty in delay on every link. Our lower bound matches the previously known upper bound.
△ Less
Submitted 13 July, 2018;
originally announced July 2018.
-
End-to-end Graph-based TAG Parsing with Neural Networks
Authors:
Jungo Kasai,
Robert Frank,
Pauli Xu,
William Merrill,
Owen Rambow
Abstract:
We present a graph-based Tree Adjoining Grammar (TAG) parser that uses BiLSTMs, highway connections, and character-level CNNs. Our best end-to-end parser, which jointly performs supertagging, POS tagging, and parsing, outperforms the previously reported best results by more than 2.2 LAS and UAS points. The graph-based parsing architecture allows for global inference and rich feature representation…
▽ More
We present a graph-based Tree Adjoining Grammar (TAG) parser that uses BiLSTMs, highway connections, and character-level CNNs. Our best end-to-end parser, which jointly performs supertagging, POS tagging, and parsing, outperforms the previously reported best results by more than 2.2 LAS and UAS points. The graph-based parsing architecture allows for global inference and rich feature representations for TAG parsing, alleviating the fundamental trade-off between transition-based and graph-based parsing systems. We also demonstrate that the proposed parser achieves state-of-the-art performance in the downstream tasks of Parsing Evaluation using Textual Entailments (PETE) and Unbounded Dependency Recovery. This provides further support for the claim that TAG is a viable formalism for problems that require rich structural analysis of sentences.
△ Less
Submitted 27 April, 2018; v1 submitted 18 April, 2018;
originally announced April 2018.
-
Revisiting the poverty of the stimulus: hierarchical generalization without a hierarchical bias in recurrent neural networks
Authors:
R. Thomas McCoy,
Robert Frank,
Tal Linzen
Abstract:
Syntactic rules in natural language typically need to make reference to hierarchical sentence structure. However, the simple examples that language learners receive are often equally compatible with linear rules. Children consistently ignore these linear explanations and settle instead on the correct hierarchical one. This fact has motivated the proposal that the learner's hypothesis space is cons…
▽ More
Syntactic rules in natural language typically need to make reference to hierarchical sentence structure. However, the simple examples that language learners receive are often equally compatible with linear rules. Children consistently ignore these linear explanations and settle instead on the correct hierarchical one. This fact has motivated the proposal that the learner's hypothesis space is constrained to include only hierarchical rules. We examine this proposal using recurrent neural networks (RNNs), which are not constrained in such a way. We simulate the acquisition of question formation, a hierarchical transformation, in a fragment of English. We find that some RNN architectures tend to learn the hierarchical rule, suggesting that hierarchical cues within the language, combined with the implicit architectural biases inherent in certain RNNs, may be sufficient to induce hierarchical generalizations. The likelihood of acquiring the hierarchical generalization increased when the language included an additional cue to hierarchy in the form of subject-verb agreement, underscoring the role of cues to hierarchy in the learner's input.
△ Less
Submitted 8 June, 2018; v1 submitted 25 February, 2018;
originally announced February 2018.
-
Symplectomorphic registration with phase space regularization by entropy spectrum pathways
Authors:
Vitaly L. Galinsky,
Lawrence R. Frank
Abstract:
The ability to register image data to a common coordinate system is a critical feature of virtually all imaging studies that require multiple subject analysis, combining single subject data from multiple modalities, or both. However, in spite of the abundance of literature on the subject and the existence of several variants of registration algorithms, their practical utility remains problematic,…
▽ More
The ability to register image data to a common coordinate system is a critical feature of virtually all imaging studies that require multiple subject analysis, combining single subject data from multiple modalities, or both. However, in spite of the abundance of literature on the subject and the existence of several variants of registration algorithms, their practical utility remains problematic, as commonly acknowledged even by developers of these methods because the complexity of the problem has resisted a general, flexible, and robust theoretical and computational framework.
To address this issue, we present a new registration method that is similar in spirit to the current state-of-the-art technique of diffeomorphic mapping, but is more general and flexible. The method utilizes a Hamiltonian formalism and constructs registration as a sequence of symplectomorphic maps in conjunction with a novel phase space regularization based on the powerful entropy spectrum pathways (ESP) framework.
The method is demonstrated on the three different magnetic resonance imaging (MRI) modalities routinely used for human neuroimaging applications by mapping between high resolution anatomical (HRA) volumes, medium resolution diffusion weighted MRI (DW-MRI) and HRA volumes, and low resolution functional MRI (fMRI) and HRA volumes. The typical processing time for high quality mapping ranges from less than a minute to several minutes on a modern multi core CPU for typical high resolution anatomical (~256x256x256 voxels) MRI volumes.
△ Less
Submitted 15 June, 2017;
originally announced June 2017.
-
Small cities face greater impact from automation
Authors:
Morgan R. Frank,
Lijun Sun,
Manuel Cebrian,
Hyejin Youn,
Iyad Rahwan
Abstract:
The city has proven to be the most successful form of human agglomeration and provides wide employment opportunities for its dwellers. As advances in robotics and artificial intelligence revive concerns about the impact of automation on jobs, a question looms: How will automation affect employment in cities? Here, we provide a comparative picture of the impact of automation across U.S. urban areas…
▽ More
The city has proven to be the most successful form of human agglomeration and provides wide employment opportunities for its dwellers. As advances in robotics and artificial intelligence revive concerns about the impact of automation on jobs, a question looms: How will automation affect employment in cities? Here, we provide a comparative picture of the impact of automation across U.S. urban areas. Small cities will undertake greater adjustments, such as worker displacement and job content substitutions. We demonstrate that large cities exhibit increased occupational and skill specialization due to increased abundance of managerial and technical professions. These occupations are not easily automatable, and, thus, reduce the potential impact of automation in large cities. Our results pass several robustness checks including potential errors in the estimation of occupational automation and sub-sampling of occupations. Our study provides the first empirical law connecting two societal forces: urban agglomeration and automation's impact on employment.
△ Less
Submitted 21 September, 2017; v1 submitted 16 May, 2017;
originally announced May 2017.
-
The Lexicocalorimeter: Gauging public health through caloric input and output on social media
Authors:
S. E. Alajajian,
J. R. Williams,
A. J. Reagan,
S. C. Alajajian,
M. R. Frank,
L. Mitchell,
J. Lahne,
C. M. Danforth,
P. S. Dodds
Abstract:
We propose and develop a Lexicocalorimeter: an online, interactive instrument for measuring the "caloric content" of social media and other large-scale texts. We do so by constructing extensive yet improvable tables of food and activity related phrases, and respectively assigning them with sourced estimates of caloric intake and expenditure. We show that for Twitter, our naive measures of "caloric…
▽ More
We propose and develop a Lexicocalorimeter: an online, interactive instrument for measuring the "caloric content" of social media and other large-scale texts. We do so by constructing extensive yet improvable tables of food and activity related phrases, and respectively assigning them with sourced estimates of caloric intake and expenditure. We show that for Twitter, our naive measures of "caloric input", "caloric output", and the ratio of these measures are all strong correlates with health and well-being measures for the contiguous United States. Our caloric balance measure in many cases outperforms both its constituent quantities, is tunable to specific health and well-being measures such as diabetes rates, has the capability of providing a real-time signal reflecting a population's health, and has the potential to be used alongside traditional survey data in the development of public policy and collective self-awareness. Because our Lexicocalorimeter is a linear superposition of principled phrase scores, we also show we can move beyond correlations to explore what people talk about in collective detail, and assist in the understanding and explanation of how population-scale conditions vary, a capacity unavailable to black-box type methods.
△ Less
Submitted 10 January, 2017; v1 submitted 17 July, 2015;
originally announced July 2015.
-
Reply to Garcia et al.: Common mistakes in measuring frequency dependent word characteristics
Authors:
P. S. Dodds,
E. M. Clark,
S. Desu,
M. R. Frank,
A. J. Reagan,
J. R. Williams,
L. Mitchell,
K. D. Harris,
I. M. Kloumann,
J. P. Bagrow,
K. Megerdoomian,
M. T. McMahon,
B. F. Tivnan,
C. M. Danforth
Abstract:
We demonstrate that the concerns expressed by Garcia et al. are misplaced, due to (1) a misreading of our findings in [1]; (2) a widespread failure to examine and present words in support of asserted summary quantities based on word usage frequencies; and (3) a range of misconceptions about word usage frequency, word rank, and expert-constructed word lists. In particular, we show that the English…
▽ More
We demonstrate that the concerns expressed by Garcia et al. are misplaced, due to (1) a misreading of our findings in [1]; (2) a widespread failure to examine and present words in support of asserted summary quantities based on word usage frequencies; and (3) a range of misconceptions about word usage frequency, word rank, and expert-constructed word lists. In particular, we show that the English component of our study compares well statistically with two related surveys, that no survey design influence is apparent, and that estimates of measurement error do not explain the positivity biases reported in our work and that of others. We further demonstrate that for the frequency dependence of positivity---of which we explored the nuances in great detail in [1]---Garcia et al. did not perform a reanalysis of our data---they instead carried out an analysis of a different, statistically improper data set and introduced a nonlinearity before performing linear regression.
△ Less
Submitted 28 May, 2015; v1 submitted 25 May, 2015;
originally announced May 2015.
-
Constructing a taxonomy of fine-grained human movement and activity motifs through social media
Authors:
Morgan R. Frank,
Jake Ryland Williams,
Lewis Mitchell,
James P. Bagrow,
Peter Sheridan Dodds,
Christopher M. Danforth
Abstract:
Profiting from the emergence of web-scale social data sets, numerous recent studies have systematically explored human mobility patterns over large populations and large time scales. Relatively little attention, however, has been paid to mobility and activity over smaller time-scales, such as a day. Here, we use Twitter to identify people's frequently visited locations along with their likely acti…
▽ More
Profiting from the emergence of web-scale social data sets, numerous recent studies have systematically explored human mobility patterns over large populations and large time scales. Relatively little attention, however, has been paid to mobility and activity over smaller time-scales, such as a day. Here, we use Twitter to identify people's frequently visited locations along with their likely activities as a function of time of day and day of week, capitalizing on both the content and geolocation of messages. We subsequently characterize people's transition pattern motifs and demonstrate that spatial information is encoded in word choice.
△ Less
Submitted 11 May, 2015; v1 submitted 28 September, 2014;
originally announced October 2014.
-
Human language reveals a universal positivity bias
Authors:
Peter Sheridan Dodds,
Eric M. Clark,
Suma Desu,
Morgan R. Frank,
Andrew J. Reagan,
Jake Ryland Williams,
Lewis Mitchell,
Kameron Decker Harris,
Isabel M. Kloumann,
James P. Bagrow,
Karine Megerdoomian,
Matthew T. McMahon,
Brian F. Tivnan,
Christopher M. Danforth
Abstract:
Using human evaluation of 100,000 words spread across 24 corpora in 10 languages diverse in origin and culture, we present evidence of a deep imprint of human sociality in language, observing that (1) the words of natural human language possess a universal positivity bias; (2) the estimated emotional content of words is consistent between languages under translation; and (3) this positivity bias i…
▽ More
Using human evaluation of 100,000 words spread across 24 corpora in 10 languages diverse in origin and culture, we present evidence of a deep imprint of human sociality in language, observing that (1) the words of natural human language possess a universal positivity bias; (2) the estimated emotional content of words is consistent between languages under translation; and (3) this positivity bias is strongly independent of frequency of word usage. Alongside these general regularities, we describe inter-language variations in the emotional spectrum of languages which allow us to rank corpora. We also show how our word evaluations can be used to construct physical-like instruments for both real-time and offline measurement of the emotional content of large-scale texts.
△ Less
Submitted 15 June, 2014;
originally announced June 2014.
-
Shadow networks: Discovering hidden nodes with models of information flow
Authors:
James P. Bagrow,
Suma Desu,
Morgan R. Frank,
Narine Manukyan,
Lewis Mitchell,
Andrew Reagan,
Eric E. Bloedorn,
Lashon B. Booker,
Luther K. Branting,
Michael J. Smith,
Brian F. Tivnan,
Christopher M. Danforth,
Peter S. Dodds,
Joshua C. Bongard
Abstract:
Complex, dynamic networks underlie many systems, and understanding these networks is the concern of a great span of important scientific and engineering problems. Quantitative description is crucial for this understanding yet, due to a range of measurement problems, many real network datasets are incomplete. Here we explore how accidentally missing or deliberately hidden nodes may be detected in n…
▽ More
Complex, dynamic networks underlie many systems, and understanding these networks is the concern of a great span of important scientific and engineering problems. Quantitative description is crucial for this understanding yet, due to a range of measurement problems, many real network datasets are incomplete. Here we explore how accidentally missing or deliberately hidden nodes may be detected in networks by the effect of their absence on predictions of the speed with which information flows through the network. We use Symbolic Regression (SR) to learn models relating information flow to network topology. These models show localized, systematic, and non-random discrepancies when applied to test networks with intentionally masked nodes, demonstrating the ability to detect the presence of missing nodes and where in the network those nodes are likely to reside.
△ Less
Submitted 20 December, 2013;
originally announced December 2013.
-
Monotonicity of a relative Rényi entropy
Authors:
Rupert L. Frank,
Elliott H. Lieb
Abstract:
We show that a recent definition of relative Rényi entropy is monotone under completely positive, trace preserving maps. This proves a recent conjecture of Müller-Lennert et al.
We show that a recent definition of relative Rényi entropy is monotone under completely positive, trace preserving maps. This proves a recent conjecture of Müller-Lennert et al.
△ Less
Submitted 1 October, 2013; v1 submitted 22 June, 2013;
originally announced June 2013.
-
An Evolutionary Algorithm Approach to Link Prediction in Dynamic Social Networks
Authors:
Catherine A. Bliss,
Morgan R. Frank,
Christopher M. Danforth,
Peter Sheridan Dodds
Abstract:
Many real world, complex phenomena have underlying structures of evolving networks where nodes and links are added and removed over time. A central scientific challenge is the description and explanation of network dynamics, with a key test being the prediction of short and long term changes. For the problem of short-term link prediction, existing methods attempt to determine neighborhood metrics…
▽ More
Many real world, complex phenomena have underlying structures of evolving networks where nodes and links are added and removed over time. A central scientific challenge is the description and explanation of network dynamics, with a key test being the prediction of short and long term changes. For the problem of short-term link prediction, existing methods attempt to determine neighborhood metrics that correlate with the appearance of a link in the next observation period. Recent work has suggested that the incorporation of topological features and node attributes can improve link prediction. We provide an approach to predicting future links by applying the Covariance Matrix Adaptation Evolution Strategy (CMA-ES) to optimize weights which are used in a linear combination of sixteen neighborhood and node similarity indices. We examine a large dynamic social network with over $10^6$ nodes (Twitter reciprocal reply networks), both as a test of our general method and as a problem of scientific interest in itself. Our method exhibits fast convergence and high levels of precision for the top twenty predicted links. Based on our findings, we suggest possible factors which may be driving the evolution of Twitter reciprocal reply networks.
△ Less
Submitted 13 August, 2014; v1 submitted 23 April, 2013;
originally announced April 2013.
-
Happiness and the Patterns of Life: A Study of Geolocated Tweets
Authors:
Morgan R. Frank,
Lewis Mitchell,
Peter S. Dodds,
Christopher M. Danforth
Abstract:
The patterns of life exhibited by large populations have been described and modeled both as a basic science exercise and for a range of applied goals such as reducing automotive congestion, improving disaster response, and even predicting the location of individuals. However, these studies previously had limited access to conversation content, rendering changes in expression as a function of movem…
▽ More
The patterns of life exhibited by large populations have been described and modeled both as a basic science exercise and for a range of applied goals such as reducing automotive congestion, improving disaster response, and even predicting the location of individuals. However, these studies previously had limited access to conversation content, rendering changes in expression as a function of movement invisible. In addition, they typically use the communication between a mobile phone and its nearest antenna tower to infer position, limiting the spatial resolution of the data to the geographical region serviced by each cellphone tower. We use a collection of 37 million geolocated tweets to characterize the movement patterns of 180,000 individuals, taking advantage of several orders of magnitude of increased spatial accuracy relative to previous work. Employing the recently developed sentiment analysis instrument known as the 'hedonometer', we characterize changes in word usage as a function of movement, and find that expressed happiness increases logarithmically with distance from an individual's average location.
△ Less
Submitted 12 September, 2013; v1 submitted 4 April, 2013;
originally announced April 2013.
-
The Geography of Happiness: Connecting Twitter sentiment and expression, demographics, and objective characteristics of place
Authors:
Lewis Mitchell,
Kameron Decker Harris,
Morgan R. Frank,
Peter Sheridan Dodds,
Christopher M. Danforth
Abstract:
We conduct a detailed investigation of correlations between real-time expressions of individuals made across the United States and a wide range of emotional, geographic, demographic, and health characteristics. We do so by combining (1) a massive, geo-tagged data set comprising over 80 million words generated over the course of several recent years on the social network service Twitter and (2) ann…
▽ More
We conduct a detailed investigation of correlations between real-time expressions of individuals made across the United States and a wide range of emotional, geographic, demographic, and health characteristics. We do so by combining (1) a massive, geo-tagged data set comprising over 80 million words generated over the course of several recent years on the social network service Twitter and (2) annually-surveyed characteristics of all 50 states and close to 400 urban populations. Among many results, we generate taxonomies of states and cities based on their similarities in word use; estimate the happiness levels of states and cities; correlate highly-resolved demographic characteristics with happiness levels; and connect word choice and message length with urban characteristics such as education levels and obesity rates. Our results show how social media may potentially be used to estimate real-time levels and changes in population-level measures such as obesity rates.
△ Less
Submitted 18 May, 2013; v1 submitted 13 February, 2013;
originally announced February 2013.
-
Using Single Layer Networks for Discrete, Sequential Data: An Example from Natural Language Processing
Authors:
Caroline Lyon,
Ray Frank
Abstract:
A natural language parser which has been successfully implemented is described. This is a hybrid system, in which neural networks operate within a rule based framework. It can be accessed via telnet for users to try on their own text. (For details, contact the author.) Tested on technical manuals, the parser finds the subject and head of the subject in over 90% of declarative sentences.
The ne…
▽ More
A natural language parser which has been successfully implemented is described. This is a hybrid system, in which neural networks operate within a rule based framework. It can be accessed via telnet for users to try on their own text. (For details, contact the author.) Tested on technical manuals, the parser finds the subject and head of the subject in over 90% of declarative sentences.
The neural processing components belong to the class of Generalized Single Layer Networks (GSLN). In general, supervised, feed-forward networks need more than one layer to process data. However, in some cases data can be pre-processed with a non-linear transformation, and then presented in a linearly separable form for subsequent processing by a single layer net. Such networks offer advantages of functional transparency and operational speed.
For our parser, the initial stage of processing maps linguistic data onto a higher order representation, which can then be analysed by a single layer network. This transformation is supported by information theoretic analysis.
△ Less
Submitted 23 September, 1997;
originally announced September 1997.
-
From Regular to Context Free to Mildly Context Sensitive Tree Rewriting Systems: The Path of Child Language Acquisition
Authors:
Robert Frank
Abstract:
Current syntactic theory limits the range of grammatical variation so severely that the logical problem of grammar learning is trivial. Yet, children exhibit characteristic stages in syntactic development at least through their sixth year. Rather than positing maturational delays, I suggest that acquisition difficulties are the result of limitations in manipulating grammatical representations. I…
▽ More
Current syntactic theory limits the range of grammatical variation so severely that the logical problem of grammar learning is trivial. Yet, children exhibit characteristic stages in syntactic development at least through their sixth year. Rather than positing maturational delays, I suggest that acquisition difficulties are the result of limitations in manipulating grammatical representations. I argue that the genesis of complex sentences reflects increasing generative capacity in the systems generating structural descriptions: conjoined clauses demand only a regular tree rewriting system; sentential embedding uses a context-free tree substitution grammar; modification requires TAG, a mildly context-sensitive system.
△ Less
Submitted 4 November, 1994;
originally announced November 1994.