Sutra Contemporary: Developments in multi -language LLMS and effective
Links table
Abstract and 1 introduction
2 related work
3 Sutra approach
3.1 What is a Sutra?
3.2 Architecture
3.3 Training data
4 Training of multi -language features
5 MMLU multi -language
5.1 Understanding a huge multi -task language
5.2 MMLU extending to multiple languages and 5.3 consistent performances across languages
5.4 Compared to the leading models of multi -language performance
6 The quantitative evaluation of the actual time inquiries
7 Discussion and conclusion, and references
Language and multi -language language models: LLMS models have witnessed great progress, especially by developing models such as GPT-3 [Brown et al., 2020] And Bert [Devlin et al., 2018]Which has set new criteria in understanding and generating the language. These models use huge quantities of data to learn complex patterns and create a coherent text, but their basic restrictions were largely focused on English language data. In response to the need to support global linguistic diversity, research has expanded to multi -language LLMS. Leading works like mbert [Devlin et al., 2018] And xlm-r [Conneau et al., 2020] It has shown great learning capabilities that have been circulated through languages. However, these models often face challenges in the performance budget across languages, especially for those less representative in training data collections. [Conneau et al., 2020]. Moreover, with an increase in the number of languages, the ability to expand and the efficiency of these models often decompose, which requires more than one specialized structures to deal with the diversity of languages effectively [Smith et al., 2021].
Neurological automatic translationNPA (NMT) was an integral part of progress in performing a multi -language model. Early NMT systems were limited due to the complexity of their structures and the quality of their translations, especially in low resources languages [Wu et al., 2019]. Recently studies in the basic challenges of automatic translation in the context of the advanced large language models (LLMS). Quinn and Nolls’ work [2017] It provides an insight into the constant importance of challenges such as domain mismatch, prediction of rare words, and translation of long sentences, even because LLMS showed significant improvements in these areas. In addition, study by SON and Kim [2023] Explore the performance of the LLMS translation from the user’s perspective, with a highlight of its capabilities to enhance the translation of long sentences while identifying the ongoing challenges about the mismatch of the field and the prediction of rare words. Work and others. [2016] On Google’s Automated Nervous Translation System as a criterion for progress in this field, and filling the gap between human and automatic translation. Recently, Costa Josa and others have worked. [2022] It has been shown that an expert structure mix can be used effectively in the context of the nerve automatic translation and has great gains in the performance of translation on various low -resources.
A mixture of experts: A mixture of experts (MEE) appeared as a promising architecture to manage mathematical costs associated with increasing large language models (LLMS). Recent studies have explored the benefits of MEE in this context. Chu and others. [2022] He suggested a mixture of experts with expert selection, which allows the dynamic customization of data among different experts, allowing each expert to focus on his experience and achieve a typical debate. Likewise, Zouf [2022] Investigating the design of effective sporadic expert models, while highlighting the importance of the budget carefully for the number and size of experts to improve performance. In addition, ott et al. [2022] The OPT family has provided the previously trained open transformer models, which benefit from MEE to achieve significant improvements in efficiency and expansion compared to dense models. Moreover, Zheng et al. [2019] Explore the MEE application in the context of Chinese terms data collections, which indicates the potential of this approach to enhance language understanding tasks. These studies collectively indicate that MEE can serve as an effective choice for building LLMS very capable and effective.
Multimedia LLMS: Researchers also explore the capabilities of large multimedia language models that can process and create content via various methods, such as text, photos and videos. For example, Dai and others work. [2019] He investigated the use of multimedia models for tasks such as the illustrations of photos and answering visual questions, which indicates their ability to benefit from information via media to enhance performance. Likewise, the study of Nichols and a warning [2008] Explore the application of multimedia models in the mathematical linguistic linguistic context, while highlighting its capabilities to reveal visions of various data sources. In addition, the recent developments in the field of multimedia automatic translation, as discussed by Persh [2021]The benefits of merging visual information into language models to improve translation quality.
LLMS online: Modern large language models such as Llama2, GPT-3.5 and GPT-4 are designed as comprehensive and open chat programs capable of engaging in extended dialogues about a variety of topics. However, they face great restrictions: their data is closed to time, which leads to a date of cutting. Because of this, these models sometimes generate reasonable but incorrect responses, which reduces the reliability of their production as noted by Vu et al. [2023] Press et al. [2022] Inaccuracy is often associated with the old information included in the parameters of the form. Table 1 shows a detailed menu of the dates of cutting knowledge of the main models. While this can be somewhat corrected through additional training with human comments or by integrating intensive tasks of knowledge, limiting these solutions to accommodate updates in actual time, such as changes in stock prices, still represents a challenge [Komeili et al., 2021]. In the context, learning provides a promising alternative, which allows the incurable data to merge directly into the form of the form to direct the response generation. Although there are continuous efforts to enhance LLMS with search results on the Internet, effective benefit from these external data to improve the accuracy of LLM outputs is still under development. In this context, Sutra emerges by providing an organized approach to increasing the response, providing the ability to learn, reason and interpret information from different sources of knowledge.