Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Guide to adding new Wiktionaries #205

Open
Vuizur opened this issue Jan 24, 2023 · 13 comments
Open

Guide to adding new Wiktionaries #205

Vuizur opened this issue Jan 24, 2023 · 13 comments

Comments

@Vuizur
Copy link
Contributor

Vuizur commented Jan 24, 2023

Hi everyone,

I thought it would be really cool to have a guide to add new other-language Wiktionaries. I was trying to do some work on the Russian Wiktionary and wrote my current understanding down.

Guide to adding new languages

  1. Create the necessary language data in wiktextract/data.

    • First you need to create languages.json, mapping language codes to language names. (For the English Wiktionary, this would map "en" to "English"). The generation of these files is handled in get_languages.py. Depending on the Wiktionary, the best way to do this varies, you either have to expand some templates or parse source code. (Simply look at the examples.)
    • Manually create:
      • pos_subtitles: The translated names of the parts of speech. (For Russian Wiktionary this might be problematic, because the POS are at the beginning of a string containing all sorts of grammar information, so one would have to use a regex or so)
      • linkage_subtitles.json: Contains the translated names of synonym/antonym/... sections
      • other_subtitles.json: Has the translated names of the inflection/etymology sections (lower case)
      • zh_pron_tags.json: Not sure what it does exactly, but the file has to exist and contain an empty dictionary {} at least
      • form_of_templates.json:
  2. Run the program using

wiktwords --all --all-languages --out data.json --dump-file-language-code <yourlangcode> enwiktionary-20201201-pages-articles.xml.bz2 

For Russian: wiktwords --all --all-languages --out data.json --dump-file-language-code <yourlangcode> ruwiktionary-20230101-pages-articles-multistream.xml

I am currently still a bit confused about the "compounds" key in other_subtitles.json. What section exactly does it refer to? I cannot seem to find it in the Russian Wiktionary.

@xxyzz

@xxyzz
Copy link
Collaborator

xxyzz commented Jan 25, 2023

Here is an example of the compounds section: https://en.wiktionary.org/wiki/polku#Compounds

zh_pron_tags.json file contains tags for different Chinese dialect pronunciations. Example page: https://en.wiktionary.org/wiki/我#Pronunciation

The form_of_templates.json file is used to add "form of" tag for non-English Wiktionary dump file, is was added in #179.

The namespace.json file in the wikitextprocessor repo also need to be created with get_namespaces.py then inspected manually.

@kristian-clausal
Copy link
Collaborator

When you think you've got it all ironed out, put it in the readme in a pull request and I'll merge it!

@Vuizur
Copy link
Contributor Author

Vuizur commented Jan 31, 2023

I haven't quite gotten it to work, my current version prints a huge number of error. Some small selection:

еділя: DEBUG: UNIMPLEMENTED top-level template: -uk- {} at ['неділя', '-uk-']
Senin: DEBUG: UNIMPLEMENTED top-level template: неделя id {} at ['Senin', 'неделя id']
Senin: DEBUG: unexpected top-level node: <BOLD(){} 'Senin'> at ['Senin']
неділя: DEBUG: UNIMPLEMENTED top-level template: неделя uk {} at ['неділя', 'неделя uk']
Senin: DEBUG: unexpected top-level node: <LEVEL3(['Произношение']){} '\n</li></ul>', <TEMPLATE(['main other'], [<LINK(['Категория:Нужно произношение']){} >]){} >, '\n', <TEMPLATE(['длина слова'], ['5'], ['lang=id']){} >, '\n\n', <LEVEL6(['Семантические свойства']){} '\n\n'>, <LEVEL6(['Значение']){} '\n', <LINK(['понедельник']){} >, '\n\n'>, <LEVEL6(['Родственные слова']){} '\n', <TEMPLATE(['родств-блок\n'], ['умласк=\n'], ['уничиж=\n'], ['увелич=\n'], ['имена-собственные=\n'], ['существительные=\n'], ['прилагательные=', <URL([<URL([]){} >]){} >, '\n'], ['числительные=\n'], ['местоимения=\n'], ['глаголы=\n'], ['наречия=\n'], ['предикативы=\n'], ['предлоги=\n'], ['полн=\n']){} >, '\n\n'>> at ['Senin']
Senin: DEBUG: unexpected top-level node: <LEVEL3(['Этимология']){} > at ['Senin']
неділя: DEBUG: unexpected top-level node: <LEVEL5(['Морфологические и синтаксические свойства']){} '\n', <TEMPLATE(['сущ uk f ina 2a'], ['слоги=', <TEMPLATE(['по-слогам'], ['не'], ['ді'], ['ля']){} >], ['неді́л'], []){} >, '\n\n'> at ['неділя']
неділя: DEBUG: unexpected top-level node: <LEVEL3(['Произношение']){} '\n', <TEMPLATE(['transcriptions'], ['neˈɟiʎɑ'], ['neˈɟiʎi']){} >, ' ', <TEMPLATE(['медиа'], ['Uk-неділя.ogg']){} >, '\n\n', <LEVEL6(['Семантические свойства']){} '\n\n'>, <LEVEL6(['Значение']){} '\n', <LIST(#){} <LIST_ITEM(#){} ' ', <LINK(['воскресенье']){} >, '\n'>, <LIST_ITEM(#){} ' \n\n'>>>, <LEVEL5(['Синонимы']){} '\n', <LIST(#){} <LIST_ITEM(#){} ' -\n'>, <LIST_ITEM(#){} ' \n\n'>>>, <LEVEL5(['Антонимы']){} '\n', <LIST(#){} <LIST_ITEM(#){} ' -\n'>, <LIST_ITEM(#){} ' \n\n'>>>, <LEVEL5(['Гиперонимы']){} '\n', <LIST(#){} <LIST_ITEM(#){} ' ', <LINK(['день']){} >, ', ', <LINK(['тиждень']){} >, '\n'>, <LIST_ITEM(#){} ' \n\n'>>>, <LEVEL5(['Гипонимы']){} '\n', <LIST(#){} <LIST_ITEM(#){} ' -\n'>, <LIST_ITEM(#){} ' \n\n'>>, <LEVEL6(['Родственные слова']){} '\n', <TEMPLATE(['родств-блок\n'], ['умласк=\n'], ['имена-собственные=\n'], ['существительные=\n'], ['прилагательные=недільний\n'], ['глаголы=\n'], ['наречия=\n']){} >, '\n\n'>>> at ['неділя']
неділя: DEBUG: unexpected top-level node: <LEVEL3(['Этимология']){} '\nПроисходит от ', <TEMPLATE(['этимология:неделя'], ['uk']){} >, '\n\n', <LEVEL6(['Фразеологизмы и устойчивые сочетания']){} '\n', <LIST(*){} <LIST_ITEM(*){} ' \n\n\n'>>, <TEMPLATE(['improve'], ['uk'], ['морфо'], ['пример']){} >, '\n', <TEMPLATE(['Категория'], ['язык=uk'], [], [], []){} >, '\n', <TEMPLATE(['длина слова'], ['6'], ['uk']){} >>> at ['неділя']
Selasa: DEBUG: UNIMPLEMENTED top-level template: -id- {1: 'selasa'} at ['Selasa', '-id-']
eignarfall: DEBUG: UNIMPLEMENTED top-level template: -is- {} at ['eignarfall', '-is-']
Selasa: DEBUG: unexpected top-level node: <LEVEL5(['Морфологические и синтаксические свойства']){} '\n', <TEMPLATE(['неделя id']){} >, '\n', <TEMPLATE(['сущ id'], ['слоги=', <TEMPLATE(['по-слогам'], ['Se'], ['la'], ['sa']){} >, '\n']){} >, '\n\n'> at ['Selasa']
Selasa: DEBUG: unexpected top-level node: <LEVEL3(['Произношение']){} '\n', <TEMPLATE(['transcriptions'], [], []){} >, '\n\n', <LEVEL6(['Семантические свойства']){} '\n'>, <LEVEL6(['Значение']){} '\n', <LINK(['вторник']){} >, ' ', <TEMPLATE(['пример'], [<TEMPLATE(['выдел'], ['Selasa']){} >, ' adalah hari 2. dalam satu pekan.'], ['перевод=', <TEMPLATE(['выдел'], ['Вторник']){} >, <TEMPLATE(['-']){} >, 'второй день недели.']){} >, '\n'>, <LEVEL5(['Синонимы']){} '\n\n'>, <LEVEL5(['Гиперонимы']){} '\n', <LINK(['hari']){} >, '\n'>, <LEVEL5(['Гипонимы']){} '\n\n', <LEVEL6(['Родственные слова']){} '\n', <TEMPLATE(['родств-блок'], [], ['\n'], ['существительные=\n'], ['прилагательные=\n'], ['глаголы=\n'], ['наречия=\n']){} >, '\n'>>> at ['Selasa']
Selasa: DEBUG: unexpected top-level node: <LEVEL3(['Этимология']){} '\n\n', <TEMPLATE(['unfinished'], ['id']){} >, '\n', <TEMPLATE(['длина слова'], ['6'], ['lang=id']){} >> at ['Selasa']
þágufall: DEBUG: UNIMPLEMENTED top-level template: -is- {} at ['þágufall', '-is-']
þágufall: DEBUG: unexpected top-level node: <LEVEL5(['Морфологические и синтаксические свойства']){} '\n', <TEMPLATE(['сущ is hk sb 01 ö'], ['þáguf'], ['ll'], ['слоги=', <TEMPLATE(['по слогам'], ['þágufall']){} >]){} >, '\n\n', <TEMPLATE(['морфо'], ['прист1='], ['корень1='], ['суфф1='], ['оконч=']){} >, '\n\n'> at ['þágufall']
eignarfall: DEBUG: unexpected top-level node: <LEVEL5(['Морфологические и синтаксические свойства']){} '\n', <TEMPLATE(['сущ is hk sb 01 ö'], ['eignarf'], ['ll'], ['слоги=', <TEMPLATE(['по слогам'], ['eignarfall']){} >]){} >, '\n\n', <TEMPLATE(['морфо'], ['прист1='], ['корень1='], ['суфф1='], ['оконч=']){} >, '\n\n'> at ['eignarfall']
eignarfall: DEBUG: unexpected top-level node: <LEVEL3(['Произношение']){} '\n', <HTML(ul){'class': 'transcription', 'style': 'margin-left:0; list-style:none;'} <HTML(li){} <LINK(['w:Международный фонетический алфавит'], ['МФА']){} >, ':&nbsp;&#91;', <HTML(span){'class': 'IPA', 'style': 'white-space: nowrap;'} 'ˈeiknarˌfatl'>, '&#93;'>>, <TEMPLATE(['main other'], []){} >, '\n\n', <LEVEL6(['Семантические свойства']){} '\n\n'>, <LEVEL6(['Значение']){} '\n', <LIST(#){} <LIST_ITEM(#){} ' ', <TEMPLATE(['лингв.'], ['is']){} >, ', ', <TEMPLATE(['грам.'], ['is']){} >, ' ', <LINK(['родительный падеж']){} >, ', ', <LINK(['генитив']){} >, ' ', <TEMPLATE(['пример'], [], ['перевод=']){} >, '\n'>>, '\n'>, <LEVEL5(['Синонимы']){} '\n', <LIST(#){} <LIST_ITEM(#){} ' \n\n'>>>, <LEVEL5(['Антонимы']){} '\n', <LIST(#){} <LIST_ITEM(#){} ' ', <LINK(['nefnifall']){} >, ', ', <LINK(['þolfall']){} >, ', ', <LINK(['þágufall']){} >, '\n'>>, '\n'>, <LEVEL5(['Гиперонимы']){} '\n', <LIST(#){} <LIST_ITEM(#){} ' ', <LINK(['fall']){} >, '\n'>>, '\n'>, <LEVEL5(['Гипонимы']){} '\n', <LIST(#){} <LIST_ITEM(#){} ' \n\n'>>, <LEVEL6(['Родственные слова']){} '\n', <TEMPLATE(['родств-блок\n'], ['умласк=\n'], ['уничиж=\n'], ['увелич=\n'], ['имена-собственные=\n'], ['существительные=\n'], ['прилагательные=\n'], ['числительные=\n'], ['местоимения=\n'], ['глаголы=\n'], ['наречия=\n'], ['предикативы=\n'], ['предлоги=\n']){} >, '\n\n'>>> at ['eignarfall']
þágufall: DEBUG: unexpected top-level node: <LEVEL3(['Произношение']){} '\n', <HTML(ul){'class': 'transcription', 'style': 'margin-left:0; list-style:none;'} <HTML(li){} <LINK(['w:Международный фонетический алфавит'], ['МФА']){} >, ':&nbsp;&#91;', <HTML(span){'class': 'IPA', 'style': 'white-space: nowrap;'} 'ˈθauːʏˌfatl'>, '&#93;'>>, <TEMPLATE(['main other'], []){} >, '\n\n', <LEVEL6(['Семантические свойства']){} '\n\n'>, <LEVEL6(['Значение']){} '\n', <LIST(#){} <LIST_ITEM(#){} ' ', <TEMPLATE(['лингв.'], ['is']){} >, ', ', <TEMPLATE(['грам.'], ['is']){} >, ' ', <LINK(['дательный падеж']){} >, ', ', <LINK(['датив']){} >, ' ', <TEMPLATE(['пример'], [], ['перевод=']){} >, '\n'>>, '\n'>, <LEVEL5(['Синонимы']){} '\n', <LIST(#){} <LIST_ITEM(#){} ' \n\n'>>>, <LEVEL5(['Антонимы']){} '\n', <LIST(#){} <LIST_ITEM(#){} ' ', <LINK(['nefnifall']){} >, ', ', <LINK(['þolfall']){} >, ', ', <LINK(['eignarfall']){} >, '\n'>>, '\n'>, <LEVEL5(['Гиперонимы']){} '\n', <LIST(#){} <LIST_ITEM(#){} ' ', <LINK(['fall']){} >, '\n'>>, '\n'>, <LEVEL5(['Гипонимы']){} '\n', <LIST(#){} <LIST_ITEM(#){} ' \n\n'>>, <LEVEL6(['Родственные слова']){} '\n', <TEMPLATE(['родств-блок\n'], ['умласк=\n'], ['уничиж=\n'], ['увелич=\n'], ['имена-собственные=\n'], ['существительные=\n'], ['прилагательные=\n'], ['числительные=\n'], ['местоимения=\n'], ['глаголы=\n'], ['наречия=\n'], ['предикативы=\n'], ['предлоги=\n']){} >, '\n\n'>>> at ['þágufall']
eignarfall: DEBUG: unexpected top-level node: <LEVEL3(['Этимология']){} '\nПроисходит от ', <TEMPLATE(['этимология:'], ['is']){} >, '\n\n', <LEVEL6(['Фразеологизмы и устойчивые сочетания']){} '\n', <LIST(*){} <LIST_ITEM(*){} ' \n\n'>>>, <LEVEL6(['Библиография']){} '\n', <LIST(*){} <LIST_ITEM(*){} ' \n\n'>>, <TEMPLATE(['improve'], ['is'], ['морфо'], ['пример'], ['синонимы'], ['этимология']){} >, '\n', <TEMPLATE(['Категория'], ['язык=is'], ['Падежи']){} >, '\n', <TEMPLATE(['длина слова'], ['10'], ['is']){} >>> at ['eignarfall']
þágufall: DEBUG: unexpected top-level node: <LEVEL3(['Этимология']){} '\nПроисходит от ', <TEMPLATE(['этимология:'], ['is']){} >, '\n\n', <LEVEL6(['Фразеологизмы и устойчивые сочетания']){} '\n', <LIST(*){} <LIST_ITEM(*){} ' \n\n'>>>, <LEVEL6(['Библиография']){} '\n', <LIST(*){} <LIST_ITEM(*){} ' \n\n'>>, <TEMPLATE(['improve'], ['is'], ['морфо'], ['пример'], ['синонимы'], ['этимология']){} >, '\n', <TEMPLATE(['Категория'], ['язык=is'], ['Падежи']){} >, '\n', <TEMPLATE(['длина слова'], ['8'], ['is']){} >>> at ['þágufall']
заседать: DEBUG: HTML tag <span> not properly closed at ['заседать'] parsing Произношение
started on line 138, detected on line 138
картофель: DEBUG: UNIMPLEMENTED top-level template: semiprotected {} at ['картофель', 'semiprotected']
заседать: DEBUG: no corresponding start tag found for </span> at ['заседать'] parsing Произношение
картофель: DEBUG: UNIMPLEMENTED top-level template: lang {1: ''} at ['картофель', 'wikipedia', 'Википедия', 'ARGVAL-1', 'lang']
Jumat: DEBUG: no corresponding start tag found for </li> at ['Jumat'] parsing Произношение
Jumat: DEBUG: no corresponding start tag found for </ul> at ['Jumat'] parsing Произношение
картофель: DEBUG: UNIMPLEMENTED top-level template: Википедия {1: '', 2: '', 3: 'картофель', 4: 'картофель', 5: ''} at ['картофель', 'wikipedia', 'Википедия']
Rabu: DEBUG: no corresponding start tag found for </li> at ['Rabu'] parsing Произношение
Rabu: DEBUG: no corresponding start tag found for </ul> at ['Rabu'] parsing Произношение
картофель: DEBUG: UNIMPLEMENTED top-level template: слово дня {1: '1', 2: '4', 3: '2009'} at ['картофель', 'слово дня']
картофель: DEBUG: UNIMPLEMENTED top-level template: -ru- {} at ['картофель', '-ru-']
картофель: DEBUG: UNIMPLEMENTED top-level template: Лексема в Викиданных {1: 'L115433'} at ['картофель', 'Лексема в Викиданных']
картофель: DEBUG: unexpected top-level node: <LEVEL5(['Морфологические и синтаксические свойства']){} '\n', <TEMPLATE(['сущ ru m ina 2a\n'], ['основа=карто́фел\n'], ['слоги=', <TEMPLATE(['по-слогам'], ['кар'], ['то́'], ['фель']){} >, '\n'], ['st=1\n']){} >, '\n\n', <TEMPLATE(['морфо-ru'], ['картофель'], ['и=т']){} >, '\n\n'> at ['картофель']
Jumat: DEBUG: UNIMPLEMENTED top-level template: -id- {1: 'jumat'} at ['Jumat', '-id-']
картофель: DEBUG: unexpected top-level node: <LEVEL3(['Произношение']){} '\n', <HTML(span){'class': 'rutr'} >, <HTML(ul){'class': 'transcription', 'style': 'margin-left:0; list-style:none;'} <HTML(li){} <LINK(['Справка:МФА для русского языка'], ['МФА']){} >, ':&nbsp;&#91;', <HTML(span){'class': 'IPA', 'style': 'white-space: nowrap;'} 'kɐrˈtofʲɪlʲ'>, '&#93;&nbsp;', <HTML(table){'class': 'audiotable', 'style': 'vertical-align: middle; display:inline-block; list-style:none;line-height: 1em; border-spacing: 0;'} <HTML(tr){} <HTML(td){'class': 'audiofile'} <LINK(['Файл:Ru-картофель.ogg'], ['noicon']){} >>, <HTML(td){'class': 'audiometa', 'style': 'font-size: 80%;', 'valign': 'top'} '(', <LINK([':Файл:Ru-картофель.ogg'], ['файл']){} >, ')'>>>>>, <TEMPLATE(['main other'], []){} >, '</span>\n\n', <LEVEL6(['Семантические свойства']){} '\n', <TEMPLATE(['илл'], ['Bl%C3%BChende_Kartoffel.JPG'], ['Картофель [1]']){} >, '\n', <TEMPLATE(['илл'], ['Potatoes.jpg'], ['Картофель [2]']){} >, '\n\n'>, <LEVEL6(['Значение']){} '\n', <LIST(#){} <LIST_ITEM(#){} ' ', <TEMPLATE(['ботан.'], ['ru']){} >, '  ', <LINK(['вид']){} >, ' рода ', <LINK(['паслён']){} >, '; многолетнее травянистое клубненосное ', <LINK(['растение']){} >, ', видоизмённые подземные органы вегетативного размножения которого - важный пищевой продукт ', <TEMPLATE(['пример'], [<TEMPLATE(['выдел'], ['Картофель']){} >, '\xa0— родом из Южной Америки, в Западную Европу ввезён в 16 в., в России известен с конца 17-го, а распространён правительственными мерами в 19.']){} >, ' ', <TEMPLATE(['пример'], ['Почернела ботва у ', <TEMPLATE(['выдел'], ['картофеля']){} >, ', пожелтел горох, начали обваливаться засыхавшие листья.'], ['Д.\xa0Н.\xa0Мамин-Сибиряк'], ['Зелёная война'], ['1910']){} >, '\n'>, <LIST_ITEM(#){} ' ', <LINK(['клубень'], ['клубни']){} >, ' картофеля [1], ', <LINK(['кушанье']){} >, ' из них ', <TEMPLATE(['пример'], ['Копать ', <TEMPLATE(['выдел'], ['картофель']){} >, '.']){} >, ' ', <TEMPLATE(['пример'], ['Котлеты из картофеля.']){} >, ' ', <TEMPLATE(['пример'], ['Жареный ', <TEMPLATE(['выдел'], ['картофель']){} >, '.']){} >, ' ', <TEMPLATE(['пример'], ['Картофель в мундире.']){} >, ' ', <TEMPLATE(['пример'], ['Пьер не ел целый день, и запах ', <TEMPLATE(['выдел'], ['картофеля']){} >, ' показался ему необыкновенно приятным.'], ['Л.\xa0Н.\xa0Толстой'], ['Война и мир'], ['1867–1869']){} >, ' ', <TEMPLATE(['пример'], ['Даже сидеть в кухне и чистить с Дарьюшкой ', <TEMPLATE(['выдел'], ['картофель']){} >, ' или выбирать сор из гречневой крупы ему казалось интересно.'], ['А.\xa0П.\xa0Чехов'], ['Палата 6'], ['1892']){} >, '\n'>>, '\n'>, <LEVEL5(['Синонимы']){} '\n', <LIST(#){} <LIST_ITEM(#){} ' ', <LINK(['паслён клубненосный']){} >, '; ', <TEMPLATE(['разг.'], ['-']){} >, ': ', <LINK(['картошка']){} >, '; ', <TEMPLATE(['уст.'], ['-']){} >, ': ', <LINK(['земляное яблоко']){} >, '\n'>, <LIST_ITEM(#){} ' ', <LINK(['картошка']){} >, ', ', <LINK(['картофелина']){} >, ', ', <LINK(['картошина']){} >, ', ', <LINK(['второй хлеб']){} >, '; ', <TEMPLATE(['ист.'], ['-']){} >, ': ', <LINK(['чёртово яблоко']){} >, '\n'>>, '\n'>, <LEVEL5(['Антонимы']){} '\n', <LIST(#){} <LIST_ITEM(#){} ' —\n'>, <LIST_ITEM(#){} ' —\n'>>, '\n'>, <LEVEL5(['Гиперонимы']){} '\n', <LIST(#){} <LIST_ITEM(#){} ' ', <LINK(['вид']){} >, ', ', <LINK(['растение']){} >, '\n'>, <LIST_ITEM(#){} ' ', <LINK(['клубень']){} >, ', ', <LINK(['еда']){} >, '\n'>>, '\n'>, <LEVEL5(['Гипонимы']){} '\n', <LIST(#){} <LIST_ITEM(#){} ' —\n'>, <LIST_ITEM(#){} ' ', <LINK(['картофелина']){} >, '\n'>>, '\n', <LEVEL6(['Холонимы']){} '\n', <LIST(#){} <LIST_ITEM(#){} ' —\n'>, <LIST_ITEM(#){} ' —\n'>>, '\n'>, <LEVEL6(['Меронимы']){} '\n', <LIST(#){} <LIST_ITEM(#){} ' —\n'>, <LIST_ITEM(#){} ' ', <LINK(['глазок']){} >, '\n'>>, '\n'>, <LEVEL6(['Родственные слова']){} '\n', <TEMPLATE(['родств-блок\n'], ['умласк=картошечка, картоха\n'], ['имена-собственные=\n'], ['существительные=картофелина, картошка\n'], ['прилагательные=картофельный\n'], ['глаголы=\n'], ['наречия=\n']){} >, '\n\n'>>> at ['картофель']
Jumat: DEBUG: UNIMPLEMENTED top-level template: неделя id {} at ['Jumat', 'неделя id']
Kamis: DEBUG: no corresponding start tag found for </li> at ['Kamis'] parsing Произношение
Rabu: DEBUG: UNIMPLEMENTED top-level template: Cf {1: 'rabu'} at ['Rabu', 'Cf']
Jumat: DEBUG: unexpected top-level node: <BOLD(){} 'Jumat'> at ['Jumat']
Kamis: DEBUG: no corresponding start tag found for </ul> at ['Kamis'] parsing Произношение
Jumat: DEBUG: unexpected top-level node: <LEVEL3(['Произношение']){} '\n</li></ul>', <TEMPLATE(['main other'], [<LINK(['Категория:Нужно произношение']){} >]){} >, '\n', <TEMPLATE(['длина слова'], ['5'], ['lang=id']){} >, '\n\n', <LEVEL6(['Семантические свойства']){} '\n\n'>, <LEVEL6(['Значение']){} '\n', <LINK(['пятница']){} >, '\n\n'>, <LEVEL6(['Родственные слова']){} '\n', <TEMPLATE(['родств-блок\n'], ['умласк=\n'], ['уничиж=\n'], ['увелич=\n'], ['имена-собственные=\n'], ['существительные=\n'], ['прилагательные=', <URL([<URL([]){} >]){} >, '\n'], ['числительные=\n'], ['местоимения=\n'], ['глаголы=\n'], ['наречия=\n'], ['предикативы=\n'], ['предлоги=\n'], ['полн=\n']){} >, '\n\n'>> at ['Jumat']
Jumat: DEBUG: unexpected top-level node: <LEVEL3(['Этимология']){} > at ['Jumat']
Rabu: DEBUG: UNIMPLEMENTED top-level template: -id- {1: 'rabu'} at ['Rabu', '-id-']
Rabu: DEBUG: UNIMPLEMENTED top-level template: неделя id {} at ['Rabu', 'неделя id']
картофель: DEBUG: unexpected top-level node: <LEVEL3(['Этимология']){} '\nОт ', <TEMPLATE(['этимология:картофель'], ['да']){} >, '\n\n', <LEVEL6(['Фразеологизмы и устойчивые сочетания']){} '\n', <LIST(*){} <LIST_ITEM(*){} ' ', <LINK(['картофель фри']){} >, '\n'>>, '\n'>, <LEVEL5(['Перевод']){} '\n', <TEMPLATE(['перев-блок'], ['\n'], ['az=', <LINK(['kartof']){} >, '\n'], ['ain=\n'], ['sq=', <LINK(['patatja']){} >, '\n'], ['en=', <LINK(['potato']){} >, '\n'], ['an=', <LINK(['trunfa']){} >, ' ', <TEMPLATE(['f']){} >, '\n'], ['hy=', <LINK(['կարտոֆիլ']){} >, ' (kartofil)\n'], ['ast=', <LINK(['pataca']){} >, ' ', <TEMPLATE(['f']){} >, '\n'], ['af=', <LINK(['aartappel']){} >, '\n'], ['eu=', <LINK(['patata']){} >, '\n'], ['ba=', <LINK(['бәрәңге']){} >, ', ', <LINK(['картуф']){} >, '\n'], ['be=', <LINK(['бульба']){} >, ' ', <TEMPLATE(['f']){} >, '\n'], ['bg=', <LINK(['картоф']){} >, ' ', <TEMPLATE(['m']){} >, '\n'], ['bs=\n'], ['br=\n'], ['bua=', <LINK(['хартаабха']){} >, '\n'], ['hu=', <LINK(['burgonya']){} >, '\n'], ['vep=\n'], ['vo=', <LINK(['pötet']){} >, '\n'], ['vi=', <LINK(['khoai tây']){} >, '\n'], ['vro=', <LINK(['maaupin']){} >, ', ', <LINK(['kardok']){} >, ', ', <LINK(['kardohk']){} >, ', ', <LINK(["kardol'"]){} >, ', ', <LINK(['kartli']){} >, ', ', <LINK(["kartol'"]){} >, '\n'], ['gl=', <LINK(['pataca']){} >, '\n'], ['gd=', <LINK(['buntàta']){} >, ' ', <TEMPLATE(['m']){} >, '\n'], ['el=', <LINK(['πατάτα']){} >, ' ', <TEMPLATE(['f']){} >, '; ', <LINK(['γεώμηλο']){} >, ' ', <TEMPLATE(['n']){} >, '\n'], ['ka=', <LINK(['კარტოფილი']){} >, '\n'], ['da=', <LINK(['kartoffel']){} >, '\n'], ['sgs=', <LINK(['bolbė']){} >, '\n'], ['he=', <LINK(['תפוח אדמה']){} >, ' (tapuah adama) ', <TEMPLATE(['m']){} >, '\n'], ['io=', <LINK(['terpomo']){} >, '\n'], ['id=', <LINK(['kentang']){} >, '\n'], ['ia=', <LINK(['patata']){} >, '\n'], ['is=', <LINK(['kartafla']){} >, ' ', <TEMPLATE(['f']){} >, ', ', <LINK(['jarðepli']){} >, ' ', <TEMPLATE(['n']){} >, ' (редк.)\n'], ['es=', <LINK(['patata']){} >, ' ', <TEMPLATE(['f']){} >, '\n'], ['it=(картофелина) ', <LINK(['patata']){} >, ' ', <TEMPLATE(['f']){} >, ', (растение) ', <LINK(['patate']){} >, ' ', <TEMPLATE(['мн.ч.']){} >, '\n'], ['kk=', <LINK(['картоп']){} >, '\n'], ['krl=', <LINK(['kartohku']){} >, '\n'], ['ca=', <LINK(['patatera']){} >, ' ', <TEMPLATE(['f']){} >, '\n'], ['ky=', <LINK(['жералма']){} >, '\n'], ['zh-tw=', <LINK(['馬鈴薯']){} >, ', ', <LINK(['马铃薯']){} >, ' (mǎlíngshǔ)\n'], ['zh=', <LINK(['土豆']){} >, ' (tǔdòu)\n'], ['kv=', <LINK(['картупель']){} >, '\n'], ['ko=', <LINK(['감자']){} >, ' (gamja)\n'], ['co=\n'], ['crh=', <LINK(['qartop']){} >, '\n'], ['la=', <LINK(['Solnum tubersum']){} >, '\n'], ['lv=', <LINK(['kartupelis']){} >, '; ', <LINK(['kartupeļi']){} >, ' ', <TEMPLATE(['мн.']){} >, '\n'], ['lt=', <LINK(['bulvė']){} >, '\n'], ['mk=', <LINK(['компир']){} >, ' ', <TEMPLATE(['m']){} >, '\n'], ['mg=', <LINK(['ovy']){} >, '\n'], ['ms=', <LINK(['kentang']){} >, '\n'], ['mt=', <LINK(['patata']){} >, ' ', <TEMPLATE(['f']){} >, '\n'], ['mdf=\n'], ['mn=', <LINK(['төмс']){} >, '\n'], ['gv=\n'], ['nah=', <LINK(['tlālcamohtli']){} >, '\n'], ['de=', <LINK(['Kartoffel']){} >, ' ', <TEMPLATE(['f']){} >, " =, -n, ''регион.'' ", <LINK(['Erdapfel']){} >, ' ', <TEMPLATE(['m']){} >, ' -s, -äpfel\n'], ['nl=', <LINK(['aardappel']){} >, '\n'], ['nog=', <LINK(['ералма']){} >, '\n'], ['no=', <LINK(['potet']){} >, ', ', <LINK(['jordeple']){} >, '\n'], ['os=', <LINK(['картоф']){} >, '\n'], ['fa=', <LINK(['سیب\u200cزمینی']){} >, ' (sib-zamini)\n'], ['pl=', <LINK(['kartofel']){} >, ', ', <LINK(['ziemniak']){} >, '\n'], ['ppol=\n'], ['pt=', <LINK(['batata']){} >, ' ', <TEMPLATE(['f']){} >, '\n'], ['ro=', <LINK(['cartof']){} >, '\n'], ['sa=', <LINK(['आलू']){} >, ' (ālū)\n'], ['sr=', <LINK(['кромпир']){} >, ' ', <TEMPLATE(['m']){} >, '\n'], ['sr-l=\n'], ['sk=', <LINK(['zemiaky']){} >, ' ', <TEMPLATE(['мн.']){} >, '\n'], ['sl=', <LINK(['krompir']){} >, ' ', <TEMPLATE(['m']){} >, '\n'], ['slovio-c=\n'], ['slovio-l=\n'], ['sw=', <LINK(['kiazi']){} >, ', ', <LINK(['viazi']){} >, ' ', <TEMPLATE(['мн.']){} >, '\n'], ['cu=\n'], ['tl=', <LINK(['patatas']){} >, '\n'], ['th=', <LINK(['มันฝรั่ง']){} >, ' (man fà-ràng)\n'], ['tt=', <LINK(['бәрәңге']){} >, '\n'], ['art=\n'], ['kim=', <LINK(['һортооӄа']){} >, ', ', <LINK(['һортоопӄа']){} >, '\n'], ['tr=', <LINK(['patates']){} >, '\n'], ['tk=\n'], ['uz=\n'], ['uk=', <LINK(['картопля']){} >, ' ', <TEMPLATE(['f']){} >, ', диал.: ', <LINK(['бульба']){} >, ', ', <LINK(['бараболя']){} >, ', ', <LINK(['картох']){} >, ', ', <LINK(['картоха']){} >, '\n'], ['ur=', <LINK(['آلو']){} >, ' (ālū)\n'], ['fo=', <LINK(['epli']){} >, ' ', <TEMPLATE(['n']){} >, '\n'], ['fi=', <LINK(['peruna']){} >, '\n'], ['fr=', <LINK(['pomme de terre']){} >, '\n'], ['fy=', <LINK(['jirpel']){} >, '\n'], ['hi=', <LINK(['आलू']){} >, ' (ālū)\n'], ['hr=', <LINK(['krumpir']){} >, ' ', <TEMPLATE(['m']){} >, '\n'], ['cs=', <LINK(['brambor']){} >, ' ', <TEMPLATE(['m']){} >, '\n'], ['sv=', <LINK(['potatis']){} >, '\n'], ['eo=', <LINK(['terpomo']){} >, '\n'], ['et=', <LINK(['kartul']){} >, '\n'], ['ja=', <LINK(['じゃがいも']){} >, ' (jagaimó), ', <LINK(['馬鈴薯']){} >, ' (', <LINK(['ばれいしょ']){} >, ', barēsho)\n'], ['sah=', <LINK(['хортуоппуй']){} >, '\n']){} >, '\n\n\n', <TEMPLATE(['improve'], ['ru'], ['переводы']){} >, '\n', <TEMPLATE(['Категория'], ['язык=ru'], ['��артофель'], [], []){} >, '\n', <TEMPLATE(['длина слова'], ['9'], ['ru']){} >>> at ['картофель']
Rabu: DEBUG: unexpected top-level node: <BOLD(){} 'Rabu'> at ['Rabu']
Rabu: DEBUG: unexpected top-level node: <LEVEL3(['Произношение']){} '\n</li></ul>', <TEMPLATE(['main other'], [<LINK(['Категория:Нужно произношение']){} >]){} >, '\n', <TEMPLATE(['длина слова'], ['4'], ['id']){} >, '\n\n', <LEVEL6(['Семантические свойства']){} '\n\n'>, <LEVEL6(['Значение']){} '\n', <LINK(['среда']){} >, ' ', <HTML(i){} 'день недели'>, '\n\n'>, <LEVEL6(['Родственные слова']){} '\n', <TEMPLATE(['родств-блок\n'], ['умласк=\n'], ['уничиж=\n'], ['увелич=\n'], ['имена-собственные=\n'], ['существительные=\n'], ['прилагательные=\n'], ['числительные=\n'], ['местоимения=\n'], ['глаголы=\n'], ['наречия=\n'], ['предикативы=\n'], ['предлоги=\n'], ['полн=\n']){} >, '\n\n'>> at ['Rabu']
Rabu: DEBUG: unexpected top-level node: <LEVEL3(['Этимология']){} > at ['Rabu']
Sabtu: DEBUG: no corresponding start tag found for </li> at ['Sabtu'] parsing Произношение
Sabtu: DEBUG: no corresponding start tag found for </ul> at ['Sabtu'] parsing Произношение
Kamis: DEBUG: UNIMPLEMENTED top-level template: Cf {1: 'kāmis, kamış, kamis'} at ['Kamis', 'Cf']
Kamis: DEBUG: UNIMPLEMENTED top-level template: -id- {1: 'kamis'} at ['Kamis', '-id-']
Kamis: DEBUG: UNIMPLEMENTED top-level template: неделя id {} at ['Kamis', 'неделя id']
Kamis: DEBUG: unexpected top-level node: <BOLD(){} 'Kamis'> at ['Kamis']
Kamis: DEBUG: unexpected top-level node: <LEVEL3(['Произношение']){} '\n</li></ul>', <TEMPLATE(['main other'], [<LINK(['Категория:Нужно произношение']){} >]){} >, '\n', <TEMPLATE(['длина слова'], ['5'], ['id']){} >, '\n\n', <LEVEL6(['Семантические свойства']){} '\n\n'>, <LEVEL6(['Значение']){} '\n', <LINK(['четверг']){} >, '\n\n'>, <LEVEL6(['Родственные слова']){} '\n', <TEMPLATE(['родств-блок\n'], ['умласк=\n'], ['уничиж=\n'], ['увелич=\n'], ['имена-собственные=\n'], ['существительные=\n'], ['прилагательные=\n'], ['числительные=\n'], ['местоимения=\n'], ['глаголы=\n'], ['наречия=\n'], ['предикативы=\n'], ['предлоги=\n'], ['полн=\n']){} >, '\n\n'>> at ['Kamis']
Minggu: DEBUG: UNIMPLEMENTED top-level template: Cf {1: 'minggu'} at ['Minggu', 'Cf']
Kamis: DEBUG: unexpected top-level node: <LEVEL3(['Этимология']){} > at ['Kamis']
Minggu: DEBUG: UNIMPLEMENTED top-level template: -id- {1: 'minggu'} at ['Minggu', '-id-']
Minggu: DEBUG: unexpected top-level node: <LEVEL5(['Морфологические и синтаксические свойства']){} '\n', <TEMPLATE(['неделя id']){} >, '\n', <TEMPLATE(['сущ id'], ['слоги=Ming·gu\n']){} >, '\n\n'> at ['Minggu']
Minggu: DEBUG: unexpected top-level node: <LEVEL3(['Произношение']){} '\n', <TEMPLATE(['transcriptions'], ['miŋgu'], []){} >, '\n\n', <LEVEL6(['Семантические свойства']){} '\n\n'>, <LEVEL6(['Значение']){} '\n', <LINK(['воскресенье']){} >, ' ', <TEMPLATE(['пример'], ['Minggu adalah hari pertama dalam satu pekan.']){} >, '\n\n'>, <LEVEL5(['Синонимы']){} '\n\n'>, <LEVEL5(['Гиперонимы']){} '\n', <LINK(['hari']){} >, '\n\n'>, <LEVEL5(['Гипонимы']){} '\n\n', <LEVEL6(['Родственные слова']){} '\n', <TEMPLATE(['родств-блок'], [], ['\n'], ['существительные=\n'], ['прилагательные=\n'], ['глаголы=\n'], ['наречия=\n']){} >, '\n\n'>>> at ['Minggu']
Minggu: DEBUG: unexpected top-level node: <LEVEL3(['Этимология']){} '\n\n', <TEMPLATE(['stub'], ['id']){} >, '\n', <TEMPLATE(['длина слова'], ['6'], ['id']){} >> at ['Minggu']
Sabtu: DEBUG: UNIMPLEMENTED top-level template: Cf {1: 'sabtu'} at ['Sabtu', 'Cf']
Sabtu: DEBUG: UNIMPLEMENTED top-level template: -id- {1: 'sabtu'} at ['Sabtu', '-id-']
Sabtu: DEBUG: UNIMPLEMENTED top-level template: неделя id {} at ['Sabtu', 'неделя id']
Sabtu: DEBUG: unexpected top-level node: <BOLD(){} 'Sabtu'> at ['Sabtu']
Sabtu: DEBUG: unexpected top-level node: <LEVEL3(['Произношение']){} '\n</li></ul>', <TEMPLATE(['main other'], [<LINK(['Категория:Нужно произношение']){} >]){} >, '\n', <TEMPLATE(['длина слова'], ['5'], ['id']){} >, '\n\n', <LEVEL6(['Семантические свойства']){} '\n\n'>, <LEVEL6(['Значение']){} '\n', <LINK(['суббота']){} >, '\n\n'>, <LEVEL6(['Родственные слова']){} '\n', <TEMPLATE(['родств-блок\n'], ['умласк=\n'], ['уничиж=\n'], ['увелич=\n'], ['имена-собственные=\n'], ['существительные=\n'], ['прилагательные=\n'], ['числительные=\n'], ['местоимения=\n'], ['глаголы=\n'], ['наречия=\n'], ['предикативы=\n'], ['предлоги=\n'], ['полн=\n']){} >, '\n\n'>> at ['Sabtu']
Sabtu: DEBUG: unexpected top-level node: <LEVEL3(['Этимология']){} > at ['Sabtu']
п'ятниця: DEBUG: UNIMPLEMENTED top-level template: -uk- {} at ["п'ятниця", '-uk-']
п'ятниця: DEBUG: UNIMPLEMENTED top-level template: неделя uk {} at ["п'ятниця", 'неделя uk']
п'ятниця: DEBUG: unexpected top-level node: <LEVEL5(['Морфологические и синтаксические свойства']){} '\n', <TEMPLATE(['сущ uk f ina '], ['слоги=', <TEMPLATE(['по-слогам'], ["п'я́т"], ['ни'], ['ця']){} >], ["п'я́тниц"], []){} >, '\n\n', <TEMPLATE(['морфо '], ['прист1='], ['корень1='], ['суфф1='], ['оконч=']){} >, '\n\n'> at ["п'ятниця"]
п'ятниця: DEBUG: unexpected top-level node: <LEVEL3(['Произношение']){} '\n', <TEMPLATE(['transcriptions'], ['ˈpjɑtnɪt͡sʲɑ'], ['pjɑtnɪˈt͡sʲi']){} >, ' ', <TEMPLATE(['медиа'], ["Uk-п'ятниця.ogg"]){} >, '\n\n', <LEVEL6(['Семантические свойства']){} '\n\n'>, <LEVEL6(['Значение']){} '\n', <LIST(#){} <LIST_ITEM(#){} ' ', <LINK(['пятница']){} >, ' ', <TEMPLATE(['пример'], [], ['перевод=']){} >, '\n'>, <LIST_ITEM(#){} ' \n\n'>>>, <LEVEL5(['Синонимы']){} '\n', <LIST(#){} <LIST_ITEM(#){} ' -\n'>, <LIST_ITEM(#){} ' \n\n'>>>, <LEVEL5(['Антонимы']){} '\n', <LIST(#){} <LIST_ITEM(#){} ' -\n'>, <LIST_ITEM(#){} ' \n\n'>>>, <LEVEL5(['Гиперонимы']){} '\n', <LIST(#){} <LIST_ITEM(#){} ' ', <LINK(['день']){} >, ', ', <LINK(['тиждень']){} >, '\n'>, <LIST_ITEM(#){} ' \n\n'>>>, <LEVEL5(['Гипонимы']){} '\n', <LIST(#){} <LIST_ITEM(#){} ' -\n'>, <LIST_ITEM(#){} ' \n\n'>>, <LEVEL6(['Родственные слова']){} '\n', <TEMPLATE(['родств-блок\n'], ['умласк=\n'], ['имена-собственные=\n'], ['существительные=\n'], ['прилагательные=\n'], ["числительные=п'ять, п'ятий\n"], ['глаголы=\n'], ['наречия=\n']){} >, '\n\n'>>> at ["п'ятниця"]
п'ятниця: DEBUG: unexpected top-level node: <LEVEL3(['Этимология']){} '\nПроисходит от ', <TEMPLATE(['этимология:пятница'], ['uk']){} >, '\n\n', <LEVEL6(['Фразеологизмы и устойчивые сочетания']){} '\n', <LIST(*){} <LIST_ITEM(*){} ' \n\n\n'>>, <TEMPLATE(['improve'], ['uk'], ['морфо'], ['пример']){} >, '\n', <TEMPLATE(['Категория'], ['язык=uk'], [], [], []){} >, '\n', <TEMPLATE(['длина слова'], ['8'], ['uk']){} >>> at ["п'ятниця"]
четвер: DEBUG: UNIMPLEMENTED top-level template: -uk- {} at ['четвер', '-uk-']
четвер: DEBUG: UNIMPLEMENTED top-level template: неделя uk {} at ['четвер', 'неделя uk']
na chuj: DEBUG: UNIMPLEMENTED top-level template: offensive {} at ['na chuj', 'offensive']
четвер: DEBUG: unexpected top-level node: <LEVEL5(['Морфологические и синтаксические свойства']){} '\n', <TEMPLATE(['сущ uk m ina '], ['слоги=', <TEMPLATE(['по-слогам'], ['чет'], ['ве́р']){} >], ['четвер'], []){} >, '\n\n', <TEMPLATE(['морфо '], ['прист1='], ['корень1='], ['суфф1='], ['оконч=']){} >, '\n\n'> at ['четвер']
четвер: DEBUG: unexpected top-level node: <LEVEL3(['Произношение']){} '\n', <TEMPLATE(['transcriptions'], ['ʧetˈwɛr'], ['ʧetwerˈɦɪ']){} >, ' ', <TEMPLATE(['медиа'], ['Uk-четвер.ogg']){} >, '\n\n', <LEVEL6(['Семантические свойства']){} '\n\n'>, <LEVEL6(['Значение']){} '\n', <LIST(#){} <LIST_ITEM(#){} ' ', <LINK(['четверг']){} >, ' ', <TEMPLATE(['пример'], [], ['перевод=']){} >, '\n'>, <LIST_ITEM(#){} ' \n\n'>>>, <LEVEL5(['Синонимы']){} '\n', <LIST(#){} <LIST_ITEM(#){} ' -\n'>, <LIST_ITEM(#){} ' \n\n'>>>, <LEVEL5(['Антонимы']){} '\n', <LIST(#){} <LIST_ITEM(#){} ' -\n'>, <LIST_ITEM(#){} ' \n\n'>>>, <LEVEL5(['Гиперонимы']){} '\n', <LIST(#){} <LIST_ITEM(#){} ' ', <LINK(['день']){} >, ', ', <LINK(['тиждень']){} >, '\n'>, <LIST_ITEM(#){} ' \n\n'>>>, <LEVEL5(['Гипонимы']){} '\n', <LIST(#){} <LIST_ITEM(#){} ' -\n'>, <LIST_ITEM(#){} ' \n\n'>>, <LEVEL6(['Родственные слова']){} '\n', <TEMPLATE(['родств-блок\n'], ['умласк=\n'], ['имена-собственные=\n'], ['существительные=\n'], ['прилагательные=\n'], ['числительные=четвертий\n'], ['глаголы=\n'], ['наречия=\n']){} >, '\n\n'>>> at ['четвер']
na chuj: DEBUG: UNIMPLEMENTED top-level template: wikify {} at ['na chuj', 'wikify']
четвер: DEBUG: unexpected top-level node: <LEVEL3(['Этимология']){} '\nПроисходит от ', <TEMPLATE(['этимология:четверг'], ['uk']){} >, '\n\n', <LEVEL6(['Фразеологизмы и устойчивые сочетания']){} '\n', <LIST(*){} <LIST_ITEM(*){} ' \n\n\n'>>, <TEMPLATE(['improve'], ['uk'], ['морфо'], ['пример']){} >, '\n', <TEMPLATE(['Категория'], ['язык=uk'], [], [], []){} >, '\n', <TEMPLATE(['длина слова'], ['6'], ['uk']){} >>> at ['четвер']
na chuj: DEBUG: UNIMPLEMENTED top-level template: -pl- {} at ['na chuj', '-pl-']
na chuj: DEBUG: unexpected top-level node: <BOLD(){} 'na chuj'> at ['na chuj']
na chuj: DEBUG: unexpected top-level node: <LEVEL6(['Семантические свойства']){} '\n\n'> at ['na chuj']
na chuj: DEBUG: unexpected top-level node: <LEVEL6(['Значение']){} '\n\n', <LINK(['на хуй']){} >, ', ', <LINK(['зачем']){} >> at ['na chuj']
þolfall: DEBUG: UNIMPLEMENTED top-level template: -is- {} at ['þolfall', '-is-']
þolfall: DEBUG: unexpected top-level node: <LEVEL5(['Морфологические и синтаксические свойства']){} '\n', <TEMPLATE(['сущ is hk sb 01 ö'], ['þolf'], ['ll'], ['слоги=', <TEMPLATE(['по слогам'], ['þolfall']){} >]){} >, '\n\n', <TEMPLATE(['морфо'], ['прист1='], ['корень1='], ['суфф1='], ['оконч=']){} >, '\n\n'> at ['þolfall']
þolfall: DEBUG: unexpected top-level node: <LEVEL3(['Произношение']){} '\n', <HTML(ul){'class': 'transcription', 'style': 'margin-left:0; list-style:none;'} <HTML(li){} <LINK(['w:Международный фонетический алфавит'], ['МФА']){} >, ':&nbsp;&#91;', <HTML(span){'class': 'IPA', 'style': 'white-space: nowrap;'} 'ˈθɔl.vatl'>, '&#93;'>>, <TEMPLATE(['main other'], []){} >, '\n\n', <LEVEL6(['Семантические свойства']){} '\n\n'>, <LEVEL6(['Значение']){} '\n', <LIST(#){} <LIST_ITEM(#){} ' ', <TEMPLATE(['лингв.'], ['is']){} >, ', ', <TEMPLATE(['грам.'], ['is']){} >, ' ', <LINK(['винительный падеж']){} >, ', ', <LINK(['аккузатив']){} >, ' ', <TEMPLATE(['пример'], [], ['перевод=']){} >, ' \n'>>, '\n'>, <LEVEL5(['Синонимы']){} '\n', <LIST(#){} <LIST_ITEM(#){} ' \n\n'>>>, <LEVEL5(['Антонимы']){} '\n', <LIST(#){} <LIST_ITEM(#){} ' ', <LINK(['nefnifall']){} >, ', ', <LINK(['þágufall']){} >, ', ', <LINK(['eignarfall']){} >, '\n'>>, '\n'>, <LEVEL5(['Гиперонимы']){} '\n', <LIST(#){} <LIST_ITEM(#){} ' ', <LINK(['fall']){} >, '\n'>>, '\n'>, <LEVEL5(['Гипонимы']){} '\n', <LIST(#){} <LIST_ITEM(#){} ' —\n'>>, '\n', <LEVEL6(['Родственные слова']){} '\n', <TEMPLATE(['родств-блок\n'], ['умласк=\n'], ['уничиж=\n'], ['увелич=\n'], ['имена-собственные=\n'], ['существительные=\n'], ['прилагательные=\n'], ['числительные=\n'], ['местоимения=\n'], ['глаголы=\n'], ['наречия=\n'], ['предикативы=\n'], ['предлоги=\n']){} >, '\n\n'>>> at ['þolfall']
заседать: DEBUG: UNIMPLEMENTED top-level template: -ru- {} at ['заседать', '-ru-']
þolfall: DEBUG: unexpected top-level node: <LEVEL3(['Этимология']){} '\nПроисходит от ', <TEMPLATE(['этимология:'], []){} >, '\n\n', <LEVEL6(['Фразеологизмы и устойчивые сочетания']){} '\n', <LIST(*){} <LIST_ITEM(*){} ' \n\n'>>>, <LEVEL6(['Библиография']){} '\n', <LIST(*){} <LIST_ITEM(*){} ' \n\n'>>, <TEMPLATE(['improve'], ['is'], ['морфо'], ['пример'], ['синонимы'], ['этимология']){} >, '\n', <TEMPLATE(['Категория'], ['язык=is'], ['Падежи']){} >, '\n', <TEMPLATE(['длина слова'], ['7'], ['is']){} >>> at ['þolfall']
заседать: DEBUG: UNIMPLEMENTED top-level template: Омонимы {1: 'ru', 2: '2'} at ['заседать', 'Омонимы']
заседать: DEBUG: unexpected top-level node: <LEVEL6([<TEMPLATE(['заголовок'], ['I']){} >]){} '\n\n'> at ['заседать']
заседать: DEBUG: unexpected top-level node: <LEVEL5(['Морфологические и синтаксические свойства']){} '\n', <TEMPLATE(['спряжения\n'], ['вид=н\n'], ['Я =заседа́ю\n'], ['Я (прош.) =заседа́л<br />заседа́ла\n'], ['Мы =заседа́ем\n'], ['Мы (прош.) =заседа́ли\n'], ['Мы (повел.) =\n'], ['Ты =заседа́ешь\n'], ['Ты (прош.) =заседа́л<br />заседа́ла\n'], ['Ты (повел.)=заседа́й\n'], ['Вы =заседа́ете\n'], ['Вы (прош.) =заседа́ли\n'], ['Вы (повел.)=заседа́йте\n'], ['Он/она/оно =заседа́ет\n'], ['Он/она/оно (прош.)=заседа́л<br />заседа́ла<br />заседа́ло\n'], ['Они =заседа́ют\n'], ['Они (прош.)=заседа́ли\n'], ['ПричНаст =заседа́ющий\n'], ['ПричПрош =заседа́вший\n'], ['ДеепрНаст =заседа́я\n'], ['ДеепрПрош =заседа́в, заседа́вши\n'], ['ПричСтрад =заседа́емый\n'], ['ПричСтрадПрош =—\n'], ['Будущее = буду/будешь… заседа́ть\n'], ['Прич = \n'], ['Деепр = \n'], ['НП=\n'], ['безличный=\n'], ['многократный=\n']){} >, <HTML(b){} 'за', <HTML(span){'class': 'hyph', 'style': 'color:lightgreen;'} '-'>, 'се', <HTML(span){'class': 'hyph', 'style': 'color:lightgreen;'} '-'>, 'да́ть'>, '\n\n', <LINK(['глагол'], ['Глагол']){} >, ', ', <LINK(['несовершенный вид']){} >, ', ', <LINK(['Категория:Русские лексемы']){} >, <LINK(['Категория:Русские глаголы несовершенного вида']){} >, ' ', <LINK(['переходный глагол'], ['переходный']){} >, <LINK(['Категория:Переходные глаголы']){} >, ',    тип спряжения по ', <LINK(['Викисловарь:Использование словаря Зализняка'], ['классификации А.&#160;Зализняка']){} >, '&#160;—&#32;1a.', <LINK(['Категория:Глаголы, спряжение 1a']){} >, '\n\n', <TEMPLATE(['морфо-ru'], ['заседа'], ['+ть'], ['и=т']){} >, '\n\n'> at ['заседать']
заседать: DEBUG: unexpected top-level node: <LEVEL3(['Произношение']){} '\n', <HTML(span){'class': 'rutr'} >, <HTML(ul){'class': 'transcription', 'style': 'margin-left:0; list-style:none;'} <HTML(li){} <LINK(['Справка:МФА для русского языка'], ['МФА']){} >, ':&nbsp;&#91;', <HTML(span){'class': 'IPA', 'style': 'white-space: nowrap;'} 'zəsʲɪˈdatʲ'>, '&#93;'>>, <TEMPLATE(['main other'], []){} >, '</span>', <LINK(['Категория:Нужна аудиозапись произношения/ru']){} >, '\n\n', <LEVEL6(['Семантические свойства']){} '\n\n'>, <LEVEL6(['Значение']){} '\n', <LIST(#){} <LIST_ITEM(#){} ' собравшись, коллективно обсуждать ', <TEMPLATE(['пример'], []){} >, '\n'>, <LIST_ITEM(#){} ' участвовать в заседании ', <TEMPLATE(['пример'], []){} >, '\n'>, <LIST_ITEM(#){} ' ', <TEMPLATE(['устар.'], ['ru']){} >, ', ', <TEMPLATE(['разг.'], ['ru']){} >, ' сидеть ', <TEMPLATE(['пример'], ['В одном углу комнаты накрыт был стол с огромным самоваром, и за ним ', <TEMPLATE(['выдел'], ['заседала']){} >, ' пожилая дама, та самая Клеопатра Платоновна.'], ['А. К. Толстой'], ['Упырь'], ['1841']){} >, '\n'>>, '\n'>, <LEVEL5(['Синонимы']){} '\n', <LIST(#){} <LIST_ITEM(#){} ' \n'>, <LIST_ITEM(#){} ' \n\n'>>>, <LEVEL5(['Антонимы']){} '\n', <LIST(#){} <LIST_ITEM(#){} ' \n'>, <LIST_ITEM(#){} ' \n'>, <LIST_ITEM(#){} ' \n\n'>>>, <LEVEL5(['Гиперонимы']){} '\n', <LIST(#){} <LIST_ITEM(#){} ' \n'>, <LIST_ITEM(#){} ' \n'>, <LIST_ITEM(#){} ' \n\n'>>>, <LEVEL5(['Гипонимы']){} '\n', <LIST(#){} <LIST_ITEM(#){} ' \n'>, <LIST_ITEM(#){} ' \n'>, <LIST_ITEM(#){} ' \n\n'>>, <LEVEL6(['Родственные слова']){} '\n', <TEMPLATE(['родств-блок\n'], ['имена-собственные=\n'], ['существительные=\n'], ['прилагательные=\n'], ['глаголы=\n'], ['наречия=\n']){} >, '\n\n'>>> at ['заседать']
заседать: DEBUG: unexpected top-level node: <LEVEL3(['Этимология']){} '\nПроисходит от ', <TEMPLATE(['этимология:'], ['да']){} >, '\n\n', <LEVEL6(['Фразеологизмы и устойчивые сочетания']){} '\n', <LIST(*){} <LIST_ITEM(*){} ' \n\n'>>>, <LEVEL5(['Перевод']){} '\n', <TEMPLATE(['перев-блок'], ['коллективно обсуждать'], ['\n'], ['de=', <LINK(['tagen']){} >, '\n']){} >, '\n\n', <TEMPLATE(['перев-блок'], ['коллективно обсуждать'], ['\n'], ['de=', <LINK(['hängenbleiben']){} >, '\n']){} >, '\n\n\n', <TEMPLATE(['improve'], ['ru'], ['примеры'], ['синонимы'], ['гиперонимы'], ['этимология'], ['переводы']){} >, '\n', <TEMPLATE(['Категория'], ['язык=ru'], [], [], []){} >, '\n\n', <LEVEL6([<TEMPLATE(['заголовок'], ['II']){} >]){} '\n\n'>>, <LEVEL5(['Морфологические и синтаксические свойства']){} '\n', <TEMPLATE(['спряжения\n'], ['вид=н\n'], ['Я =заседа́ю\n'], ['Я (прош.) =заседа́л<br />заседа́ла\n'], ['Мы =заседа́ем\n'], ['Мы (прош.) =заседа́ли\n'], ['Мы (повел.) =\n'], ['Ты =заседа́ешь\n'], ['Ты (прош.) =заседа́л<br />заседа́ла\n'], ['Ты (повел.)=заседа́й\n'], ['Вы =заседа́ете\n'], ['Вы (прош.) =заседа́ли\n'], ['Вы (повел.)=заседа́йте\n'], ['Он/она/оно =заседа́ет\n'], ['Он/она/оно (прош.)=заседа́л<br />заседа́ла<br />заседа́ло\n'], ['Они =заседа́ют\n'], ['Они (прош.)=заседа́ли\n'], ['ПричНаст =заседа́ющий\n'], ['ПричПрош =заседа́вший\n'], ['ДеепрНаст =заседа́я\n'], ['ДеепрПрош =заседа́в, заседа́вши\n'], ['ПричСтрад =заседа́емый\n'], ['ПричСтрадПрош =—\n'], ['Будущее = буду/будешь… заседа́ть\n'], ['Прич = \n'], ['Деепр = \n'], ['НП=\n'], ['безличный=\n'], ['многократный=\n']){} >, <HTML(b){} 'за', <HTML(span){'class': 'hyph', 'style': 'color:lightgreen;'} '-'>, 'се', <HTML(span){'class': 'hyph', 'style': 'color:lightgreen;'} '-'>, 'да́ть'>, '\n\n', <LINK(['глагол'], ['Глагол']){} >, ', ', <LINK(['несовершенный вид']){} >, ', ', <LINK(['Категория:Русские лексемы']){} >, <LINK(['Категория:Русские глаголы несовершенного вида']){} >, ' ', <LINK(['переходный глагол'], ['переходный']){} >, <LINK(['Категория:Переходные глаголы']){} >, ',    тип спряжения по ', <LINK(['Викисловарь:Использование словаря Зализняка'], ['классификации А.&#160;Зализняка']){} >, '&#160;—&#32;1a.', <LINK(['Категория:Глаголы, спряжение 1a']){} >, '\n\n', <TEMPLATE(['морфо-ru'], ['за-'], ['сед'], ['-а'], ['+ть']){} >, '\n\n'>> at ['заседать']
заседать: DEBUG: unexpected top-level node: <LEVEL3(['Произношение']){} '\n', <HTML(span){'class': 'rutr'} >, <HTML(ul){'class': 'transcription', 'style': 'margin-left:0; list-style:none;'} <HTML(li){} <LINK(['Справка:МФА для русского языка'], ['МФА']){} >, ':&nbsp;&#91;', <HTML(span){'class': 'IPA', 'style': 'white-space: nowrap;'} 'zəsʲɪˈdatʲ'>, '&#93;'>>, <TEMPLATE(['main other'], []){} >, '</span>', <LINK(['Категория:Нужна аудиозапись произношения/ru']){} >, '\n\n', <LEVEL6(['Семантические свойства']){} '\n\n'>, <LEVEL6(['Значение']){} '\n', <LIST(#){} <LIST_ITEM(#){} ' ', <HTML(i){} <HTML(span){'style': 'background-color:#BBEEEE;'} >>, ' глубоко вонзаясь куда-либо, оставаться там; застревать ', <TEMPLATE(['пример'], []){} >, '\n'>, <LIST_ITEM(#){} ' ', <TEMPLATE(['разг.'], ['ru']){} >, ' ', <LINK(['отпечатываться']){} >, ' в памяти, в особенности ', <LINK(['невольно']){} >, ' ', <TEMPLATE(['пример'], []){} >, '\n'>, <LIST_ITEM(#){} '\n\n'>>>, <LEVEL5(['Синонимы']){} '\n', <LIST(#){} <LIST_ITEM(#){} ' ', <LINK(['застревать']){} >, '\n'>, <LIST_ITEM(#){} ' ', <LINK(['запоминаться']){} >, ', ', <LINK(['отпечатываться']){} >, ', ', <LINK(['оставаться']){} >, ', ', <LINK(['откладываться']){} >, ' \n'>, <LIST_ITEM(#){} ' \n\n'>>>, <LEVEL5(['Антонимы']){} '\n', <LIST(#){} <LIST_ITEM(#){} ' \n'>, <LIST_ITEM(#){} ' \n\n'>>>, <LEVEL5(['Гиперонимы']){} '\n', <LIST(#){} <LIST_ITEM(#){} ' \n'>, <LIST_ITEM(#){} ' \n\n'>>>, <LEVEL5(['Гипонимы']){} '\n', <LIST(#){} <LIST_ITEM(#){} ' \n'>, <LIST_ITEM(#){} ' \n\n'>>, <LEVEL6(['Родственные слова']){} '\n', <TEMPLATE(['родств-блок\n'], ['умласк=\n'], ['имена-собственные=\n'], ['существительные=\n'], ['прилагательные=\n'], ['глаголы=\n'], ['наречия=\n']){} >, '\n\n'>>> at ['заседать']
заседать: DEBUG: unexpected top-level node: <LEVEL3(['Этимология']){} '\nОт ', <TEMPLATE(['этимология:'], ['ru']){} >, '\n\n', <LEVEL6(['Фразеологизмы и устойчивые сочетания']){} '\n', <LIST(*){} <LIST_ITEM(*){} '  \n\n'>>>, <LEVEL5(['Перевод']){} '\n', <TEMPLATE(['перев-блок'], ['\n'], ['abq=\n'], ['ab=\n'], ['av=\n'], ['ave=\n'], ['agh=\n'], ['aja=\n'], ['ady=\n'], ['az=\n'], ['ay=\n'], ['ain=\n'], ['ain.kana=\n'], ['ain.lat=\n'], ['sq=\n'], ['als=\n'], ['ale=\n'], ['alt=\n'], ['en=\n'], ['ar=\n'], ['an=\n'], ['arc.jud=\n'], ['arc.syr=\n'], ['arn=\n'], ['hy=\n'], ['asm=\n'], ['ast=\n'], ['af=\n'], ['bar=\n'], ['bm=\n'], ['eu=\n'], ['ba=\n'], ['be=\n'], ['bn=\n'], ['bg=\n'], ['bs=\n'], ['br=\n'], ['bua=\n'], ['cy=\n'], ['wa=\n'], ['hu=\n'], ['vep=\n'], ['hsb=\n'], ['vot=\n'], ['vo=\n'], ['wo=\n'], ['vro=\n'], ['vi=\n'], ['gag=\n'], ['haw=\n'], ['ht=\n'], ['gl=\n'], ['ze=\n'], ['kl=\n'], ['el=\n'], ['ka=\n'], ['gn=\n'], ['gu=\n'], ['gd=\n'], ['dar=\n'], ['prs=\n'], ['da=\n'], ['dv=\n'], ['ang=\n'], ['grc=\n'], ['bat-smg=\n'], ['zza=\n'], ['zu=\n'], ['he=\n'], ['yi=\n'], ['io=\n'], ['id=\n'], ['ia=\n'], ['iu=\n'], ['ik=\n'], ['ga=\n'], ['is=\n'], ['es=\n'], ['it=\n'], ['kbd=\n'], ['kk=\n'], ['xal=\n'], ['kn=\n'], ['kaa=\n'], ['krc=\n'], ['krl=\n'], ['ca=\n'], ['csb=\n'], ['qu=\n'], ['ky=\n'], ['zh=\n'], ['zh-tw=\n'], ['zh-cn=\n'], ['kom=\n'], ['koi=\n'], ['kok=\n'], ['kw=\n'], ['ko=\n'], ['co=\n'], ['xh=\n'], ['crh=\n'], ['ku=\n'], ['km=\n'], ['lad=\n'], ['lo=\n'], ['la=\n'], ['lez=\n'], ['lv=\n'], ['li=\n'], ['ln=\n'], ['lt=\n'], ['lb=\n'], ['mk=\n'], ['mg=\n'], ['ms=\n'], ['ml=\n'], ['mt=\n'], ['mi=\n'], ['chm=\n'], ['mdf=\n'], ['mo=\n'], ['mn=\n'], ['gv=\n'], ['nv=\n'], ['gld=\n'], ['nah=\n'], ['na=\n'], ['nio=\n'], ['nap=\n'], ['de=\n'], ['yrk=\n'], ['nl=\n'], ['dsb=\n'], ['no=\n'], ['oc=\n'], ['os=\n'], ['pa=\n'], ['pap=\n'], ['fa=\n'], ['pl=\n'], ['pt=\n'], ['ps=\n'], ['pms=\n'], ['rap=\n'], ['rm=\n'], ['ro=\n'], ['sjd=\n'], ['sa=\n'], ['sc=\n'], ['se=\n'], ['sr=\n'], ['sr-l=\n'], ['scn=\n'], ['sk=\n'], ['sl=\n'], ['slovio-c=\n'], ['slovio-l=\n'], ['so=\n'], ['chu.cyr=\n'], ['chu.glag=\n'], ['sw=\n'], ['tab=\n'], ['tl=\n'], ['tg=\n'], ['ty=\n'], ['th=\n'], ['ta=\n'], ['tt=\n'], ['tt.cyr=\n'], ['tt.lat=\n'], ['te=\n'], ['art=\n'], ['tpi=\n'], ['kim=\n'], ['tn=\n'], ['tyv=\n'], ['tr=\n'], ['tk=\n'], ['udm=\n'], ['ug=\n'], ['uz=\n'], ['uk=\n'], ['ur=\n'], ['fo=\n'], ['fi=\n'], ['fr=\n'], ['fy=\n'], ['fur=\n'], ['kjh=\n'], ['ha=\n'], ['hi=\n'], ['hr=\n'], ['rom=\n'], ['ce=\n'], ['cs=\n'], ['cv=\n'], ['sv=\n'], ['cjs=\n'], ['sco=\n'], ['ewe=\n'], ['myv=\n'], ['eo=\n'], ['et=\n'], ['jv=\n'], ['sah=\n'], ['ja=\n']){} >, '\n\n', <LEVEL6(['Библиография']){} '\n', <LIST(*){} <LIST_ITEM(*){} ' \n\n\n'>>, <TEMPLATE(['improve'], ['ru'], ['пример'], ['гиперонимы'], ['этимология'], ['перевод']){} >, '\n', <TEMPLATE(['Категория'], ['язык=ru'], []){} >, '\n', <TEMPLATE(['длина слова'], ['8'], ['ru']){} >>>> at ['заседать']
цеста: DEBUG: no corresponding start tag found for </li> at ['цеста'] parsing Произношение
цеста: DEBUG: no corresponding start tag found for </ul> at ['цеста'] parsing Произношение
цеста: DEBUG: UNIMPLEMENTED top-level template: -sr- {} at ['цеста', '-sr-']
morze: DEBUG: UNIMPLEMENTED top-level template: -pl- {} at ['morze', '-pl-']
цеста: DEBUG: unexpected top-level node: <LEVEL5(['Морфологические и синтаксические свойства']){} '\n', <TEMPLATE(['сущ sr f 1'], [<TEMPLATE(['по-слогам'], ['це��'], ['та']){} >], ['основа=цест']){} >, '\n\n', <TEMPLATE(['морфо'], [], ['цест'], [], ['а']){} >, '\n\n'> at ['цеста']
morze: DEBUG: unexpected top-level node: <LEVEL5(['Морфологические и синтаксические свойства']){} '\n', <TEMPLATE(['сущ pl n e*'], ['слоги=mor-ze'], ['morz'], ['mórz']){} >, '\n\n', <TEMPLATE(['морфо'], ['прист1='], ['корень1=morz'], ['суфф1='], ['оконч=e']){} >, '\n\n'> at ['morze']
цеста: DEBUG: unexpected top-level node: <LEVEL3(['Произношение']){} '\n</li></ul>', <TEMPLATE(['main other'], [<LINK(['Категория:Нужно произношение']){} >]){} >, '\n\n', <LEVEL6(['Семантические свойства']){} '\n\n'>, <LEVEL6(['Значение']){} '\n', <LIST(#){} <LIST_ITEM(#){} ' ', <LINK(['дорога']){} >, ' ', <TEMPLATE(['пример'], [], ['перевод=']){} >, '\n'>, <LIST_ITEM(#){} ' ', <LINK(['улица']){} >, ' ', <TEMPLATE(['пример'], [], ['перевод=']){} >, '\n'>, <LIST_ITEM(#){} ' ', <LINK(['шоссе']){} >, ' ', <TEMPLATE(['пример'], [], ['перевод=']){} >, '\n'>>, '\n'>, <LEVEL5(['Синонимы']){} '\n', <LIST(#){} <LIST_ITEM(#){} ' ', <LINK(['пут']){} >, '\n'>>, '\n'>, <LEVEL5(['Антонимы']){} '\n\n', <LEVEL6(['Родственные слова']){} '\n\n'>>> at ['цеста']
цеста: DEBUG: unexpected top-level node: <LEVEL3(['Этимология']){} '\n\n', <TEMPLATE(['unfinished'], ['sr']){} >, '\n', <TEMPLATE(['Категория'], ['язык=sr'], ['Дороги'], [], []){} >, '\n', <TEMPLATE(['длина слова'], ['5'], ['lang=sr']){} >> at ['цеста']
morze: DEBUG: unexpected top-level node: <LEVEL3(['Произношение']){} '\n', <TEMPLATE(['transcriptions'], ['ˈmɔʒɛ'], []){} >, '\n\n', <LEVEL6(['Семантические свойства']){} '\n\n'>, <LEVEL6(['Значение']){} '\n', <LIST(#){} <LIST_ITEM(#){} ' ', <LINK(['море']){} >, ' ', <TEMPLATE(['пример'], [], ['перевод=']){} >, '\n'>, <LIST_ITEM(#){} ' \n\n'>>>, <LEVEL5(['Синонимы']){} '\n', <LIST(#){} <LIST_ITEM(#){} ' \n'>, <LIST_ITEM(#){} ' \n\n'>>>, <LEVEL5(['Антонимы']){} '\n', <LIST(#){} <LIST_ITEM(#){} ' \n'>, <LIST_ITEM(#){} ' \n\n'>>>, <LEVEL5(['Гиперонимы']){} '\n', <LIST(#){} <LIST_ITEM(#){} ' \n'>, <LIST_ITEM(#){} ' \n\n'>>>, <LEVEL5(['Гипонимы']){} '\n', <LIST(#){} <LIST_ITEM(#){} ' \n'>, <LIST_ITEM(#){} ' \n\n'>>, <LEVEL6(['Родственные слова']){} '\n', <TEMPLATE(['родств-блок\n'], ['умласк=\n'], ['уничиж=\n'], ['увелич=\n'], ['имена-собственные=\n'], ['существительные=\n'], ['прилагательные=morski\n'], ['числительные=\n'], ['местоимения=\n'], ['глаголы=\n'], ['наречия=\n'], ['предикативы=\n'], ['предлоги=\n']){} >, '\n\n'>>> at ['morze']
morze: DEBUG: unexpected top-level node: <LEVEL3(['Этимология']){} '\nПроисходит от ', <TEMPLATE(['этимология:море'], ['pl']){} >, '\n\n', <LEVEL6(['Фразеологизмы и устойчивые сочетания']){} '\n', <LIST(*){} <LIST_ITEM(*){} ' \n\n'>>>, <LEVEL6(['Библиография']){} '\n', <LIST(*){} <LIST_ITEM(*){} ' \n\n\n'>>, <TEMPLATE(['improve'], ['pl'], ['пример'], ['синонимы'], ['гиперонимы']){} >, '\n', <TEMPLATE(['Категория'], ['язык=pl'], ['Море']){} >, '\n', <TEMPLATE(['длина слова'], ['5'], ['pl']){} >>> at ['morze']
октомври: DEBUG: UNIMPLEMENTED top-level template: Cf {1: 'Октомври'} at ['октомври', 'Cf']
септември: DEBUG: UNIMPLEMENTED top-level template: -bg- {} at ['септември', '-bg-']
октомври: DEBUG: UNIMPLEMENTED top-level template: -bg- {} at ['октомври', '-bg-']
ŝi: DEBUG: UNIMPLEMENTED top-level template: Cf {1: 'si'} at ['ŝi', 'Cf']
септември: DEBUG: UNIMPLEMENTED top-level template: месяцы bg {} at ['септември', 'месяцы bg']
октомври: DEBUG: UNIMPLEMENTED top-level template: месяцы bg {} at ['октомври', 'месяцы bg']
септември: DEBUG: UNIMPLEMENTED top-level template: длина слова {1: '9', 2: 'bg'} at ['септември', 'длина слова']
ŝi: DEBUG: UNIMPLEMENTED top-level template: wikify {} at ['ŝi', 'wikify']
@kristian-clausal
Copy link
Collaborator

Those seem mostly harmless. DEBUG is mostly used for less-than-actual-error messages, either stuff that is good to know in case something is actually wrong, messages that are used to collect data for other stuff, or when something has been recovered from in some way. The "no corresponding start tag" thing is reallllly common, and the unimplemented top-level templates are in page/parse_top_level_template(), where we parse any templates that come before the "contents" of the actual languages if there's any. Mostly, we ignore most templates like that.

@Vuizur
Copy link
Contributor Author

Vuizur commented Feb 1, 2023

Thanks for the help.

The error might be in the data files, I will have to keep looking.

I also can't seem to get the program to run completely (using WSL2). It processes a large amount of entries, and then I get a broken pipe error.
Something like:

поморити: DEBUG: unexpected top-level node: <LEVEL3(['Произношение']){} '\n', <TEMPLATE(['transcriptions'], [], []){} >, '\n\n', <LEVEL6(['Семантические свойства']){} '\n\n'>, <LEVEL6(['Значение']){} '\n', <LIST(#){} <LIST_ITEM(#){} ' ', <HTML(i){} <HTML(span){'style': 'background-color:#BBEEEE;'} >>, ' ', <TEMPLATE(['пример'], [], ['перевод=']){} >, '\n'>, <LIST_ITEM(#){} ' \n\n'>>>, <LEVEL5(['Синонимы']){} '\n', <LIST(#){} <LIST_ITEM(#){} ' \n'>, <LIST_ITEM(#){} ' \n\n'>>>, <LEVEL5(['Антонимы']){} '\n', <LIST(#){} <LIST_ITEM(#){} ' \n'>, <LIST_ITEM(#){} ' \n\n'>>>, <LEVEL5(['Гиперонимы']){} '\n', <LIST(#){} <LIST_ITEM(#){} ' \n'>, <LIST_ITEM(#){} ' \n\n'>>>, <LEVEL5(['Гипонимы']){} '\n', <LIST(#){} <LIST_ITEM(#){} ' \n'>, <LIST_ITEM(#){} ' \n\n'>>, <LEVEL6(['Родственные слова']){} '\n', <TEMPLATE(['родств-блок\n'], ['умласк=\n'], ['уничиж=\n'], ['увелич=\n'], ['имена-собственные=\n'], ['существительные=\n'], ['прилагательные=\n'], ['числительные=\n'], ['местоимения=\n'], ['глаголы=\n'], ['наречия=\n'], ['предикативы=\n'], ['предлоги=\n']){} >, '\n\n'>>> at ['поморити']
поморити: DEBUG: unexpected top-level node: <LEVEL3(['Этимология']){} '\nПроисходит от ', <TEMPLATE(['этимология:'], ['uk']){} >, '\n\n', <LEVEL6(['Фразеологизмы и устойчивые сочетания']){} '\n', <LIST(*){} <LIST_ITEM(*){} ' \n\n'>>>, <LEVEL6(['Библиография']){} '\n', <LIST(*){} <LIST_ITEM(*){} ' \n\n\n'>>, <TEMPLATE(['improve'], ['uk'], ['морфо'], ['транскрипция'], ['значение'], ['синонимы'], ['гиперонимы'], ['этимология']){} >, '\n', <TEMPLATE(['Категория'], ['язык=uk'], [], [], []){} >, '\n', <TEMPLATE(['длина слова'], ['8'], ['uk']){} >>> at ['поморити']
Process ForkPoolWorker-27:
Process ForkPoolWorker-29:
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
  File "/usr/lib/python3.10/multiprocessing/pool.py", line 131, in worker
    put((job, i, result))
  File "/usr/lib/python3.10/multiprocessing/pool.py", line 131, in worker
    put((job, i, result))
Traceback (most recent call last):
  File "/usr/lib/python3.10/multiprocessing/queues.py", line 377, in put
    self._writer.send_bytes(obj)
  File "/usr/lib/python3.10/multiprocessing/queues.py", line 377, in put
    self._writer.send_bytes(obj)
  File "/usr/lib/python3.10/multiprocessing/connection.py", line 200, in send_bytes
    self._send_bytes(m[offset:offset + size])
  File "/usr/lib/python3.10/multiprocessing/connection.py", line 200, in send_bytes
    self._send_bytes(m[offset:offset + size])
  File "/usr/lib/python3.10/multiprocessing/connection.py", line 411, in _send_bytes
    self._send(header + buf)
  File "/usr/lib/python3.10/multiprocessing/connection.py", line 411, in _send_bytes
    self._send(header + buf)
  File "/usr/lib/python3.10/multiprocessing/connection.py", line 368, in _send
    n = write(self._handle, buf)
  File "/usr/lib/python3.10/multiprocessing/pool.py", line 131, in worker
    put((job, i, result))
Process ForkPoolWorker-21:
  File "/usr/lib/python3.10/multiprocessing/queues.py", line 377, in put
    self._writer.send_bytes(obj)
  File "/usr/lib/python3.10/multiprocessing/connection.py", line 368, in _send
    n = write(self._handle, buf)
  File "/usr/lib/python3.10/multiprocessing/pool.py", line 131, in worker
    put((job, i, result))
BrokenPipeError: [Errno 32] Broken pipe
  File "/usr/lib/python3.10/multiprocessing/connection.py", line 200, in send_bytes
    self._send_bytes(m[offset:offset + size])

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/usr/lib/python3.10/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python3.10/multiprocessing/pool.py", line 136, in worker
    put((job, i, (False, wrapped)))
  File "/usr/lib/python3.10/multiprocessing/queues.py", line 377, in put
    self._writer.send_bytes(obj)
  File "/usr/lib/python3.10/multiprocessing/connection.py", line 200, in send_bytes
    self._send_bytes(m[offset:offset + size])
  File "/usr/lib/python3.10/multiprocessing/connection.py", line 411, in _send_bytes
    self._send(header + buf)
  File "/usr/lib/python3.10/multiprocessing/connection.py", line 368, in _send
    n = write(self._handle, buf)
BrokenPipeError: [Errno 32] Broken pipe
@xxyzz
Copy link
Collaborator

xxyzz commented Feb 1, 2023

You shouldn't ignore these unexpected top-level node error, that usually means the language or POS header is not expanded. For example, -uk- template is the Ukrainian language header. Please see tatuylonen/wikitextprocessor#15

Broken pipe happens because the forked processes are killed by Linux OOM killer, you need more RAM or swap.

@leyan
Copy link

leyan commented Mar 19, 2023

@Vuizur Did you make progress on your Russian wiktionary parsing?

@xxyzz , @kristian-clausal Is multilingual Wiktionary parsing an actual goal of the project? I am trying to work on the French Wiktionary, but even with the few json files to translate some resources, it seems that most of the logic is very linked to the English wiktionary, expecting some templates at some places, hardcoding categories, etc. There is a fr folder in the data folder, but from what I have seen, it cannot work for the French wiktionary as it is today. Is it a long term goal to separate the generic logic and the specific treatment of each wiktionary?

@Vuizur
Copy link
Contributor Author

Vuizur commented Mar 19, 2023

Hmm, I haven't gotten it to work yet (but I also didn't have that much time recently).

I think Wiktextract works pretty decently with the Chinese Wiktionary because they have taken a lot of templates from the English one, which makes them comparatively similar. Other language Wiktionaries like German or Russian haven't done this as much (as far as I could tell), so here it is pretty hard to get it to work because everything is different. The HTML layouts are different, and the data is probably also structured a bit differently. I guess it would be a real challenge to adapt the Wiktextract code to work for all Wiktionaries.

So one probably has to write custom code for each Wiktionary, but of course it would be smartest to reuse the wiktextract code where it makes sense, (I don't know the code/details, but for example for creating the forms array with grammar tags out of a Wiktionary table. Or the code to create the mappings between language name and language code) And it would probably make a lot of sense to keep the JSON consistent.

DBnary already parses a pretty significant number of different Wiktionary XML dumps to extract data from them. However, in comparison to wiktextract it has several disadvantages: As far as I can tell, it only parses German entries from the German Wiktionary (and Russian for the Russian one), for example, missing potentially useful data. And it doesn't expand templates, so you lose the table data/inflections. (And it uses RDF, which feels very complicated to me compared to Wiktextract's JSON.)

So I think the best way to get high quality data from each Wiktionary that is either

  1. Reimplement some of the Wiktextract code, so the general approach of XML dump -> expanded templates with wikitextprocessor -> put it into JSON.
  2. Take the Wiktionary HTML dump -> use Beautifulsoup or something like this to parse the HTML -> put this into JSON. I needed some data for another project from the Russian Wiktionary's tables a while ago, so I went this route with this project: https://github.com/Vuizur/ruwiktionary-htmldump-parser .

I haven't tried 1), but for 2), difficulty varies by how logically the specific Wiktionary structures its HTML, but I was generally really happy with the results compared to the time I invested. (But one could probably spend forever on some small fixes).

@xxyzz
Copy link
Collaborator

xxyzz commented Mar 19, 2023

The current code can extract some basic data like POS, definition and example sentence once those config JSON files are added for non-English dump files. Pronunciation and forms parsing code are mostly still hard coded for the English Wiktionary. I think it's inevitable to write separate parsing code for each language, Dbnary seems to use this approach. The downside of parsing HTML is if the MediaWiki theme changed the code also need to change.

@kristian-clausal
Copy link
Collaborator

The long-long-long-term plan is to attempt to decouple wiktextract's core from wiktionary language versions. However, the more I see how different wiktionaries can be from each other, the more like a morass it seems like...

The code in Wikitextprocessor should definitely be decoupled, and it mostly is.

As it is, Tatu and me are trying to make at least en.wiktionary.org to work, mainly because it's the most useful to tackle by far. But even that is a moving target, because en.wiktionary is not static.

In the meanwhile, if you want to make another wiktionary work with wiktextract, you have to actually put a lot of work into it, much in the same way that we've put a lot of work into making just en.wiktionary function.

@leyan
Copy link

leyan commented Mar 20, 2023

Thank you for your answers!

It seems my best bet is to fork the code and try to do what I can for the French wiktionary first, then maybe see if some parts can be merged back afterwards? At least for basic data, I think it would be good to have something working in the main wiktextract repo (currently, it fails at the very beginning, during the initial stage of recognizing languages, because French wiktionary uses templates inside the second level header).

As an aside, DBNary seems to now manage entries in languages other than the main one: http://kaiko.getalp.org/about-dbnary/eager-to-meet-the-exolexica/. I will also have a look at what is done there, but with Java + RDF, it is going to be more of a struggle ...

@empiriker
Copy link
Contributor

empiriker commented Mar 27, 2023

Having gained some insight from starting to parse the French Wiktionary, I want to add my thoughts to the discussion.

General remarks
Judging based on the En and Fr projects, the different Wiktionarys can differ considerably. On the bright side, they all follow the same general structure (sections and such) which can with some modifications (e.g. where the French Wiktionary uses templates inside the second level header) be mapped to a standard structure. On the other hand, how information within a section is organized can vary widely. The French Wiktionary makes heavy use of templates whereas the English Wiktionary tends to encode more information in plain text conventions (necessitating a lot of diverse logic to extract it). In consequence, program logic to parse a particular section for one Wiktionary project will not work or work badly for a different Wiktionary project (by leaving a lot of extractable information on the table).

This leads me my next point.

Organizing the code base
Unsurprisingly, the current code base is build around accommodating the English Wiktionary. In consequence, many parts of the code are tightly coupled to the conventions there. My main struggle so far has been to find the right access points to insert extraction logic for the French project (trying to make as much use of the existing code without risking it adversely affecting the parsing result for French).

If this repo wants to support the parsing of different Wiktionary project, it would really benefit from clearly separating which parts deal with the general structure of a Wiktionary page and which parts rely on the (Wiktionary) project-internal conventions for each section. Otherwise, each contributor parsing a different Wiktionary project will choose a different access point (or worse get discouraged from trying at all) and the repo will get a huge mess of Wiktionary-project specific code hidden behind flags.

Of course, it's always an option to just let it grow organically and reorganize later.

One format to serve them all?
I read in some existing issues that this repo considers Wiktionary as a "moving target" and, therefore, does not want to enforce a static output format for the parsing result. While I agree with this statement, the Readme's attempt to describe the current format might just do a good enough job for consumers of the past data do figure out what each field stands for but it's not so good at helping contributors (parsing other Wiktionary projects) to figure out where each extracted information type should go (especially since (linguistic) conventions and terms might differ between Wiktionary projects in different linguistic traditions).

Additionally, different Wiktionary projects might be more detailed or more coarse in the information they provide in an organized manner. For example, the French Wiktionary uses in many cases the template {{exemple}} to provide example sentences. This template has (optional) fields for translations, transliterations, source information and links. It would be shame to not making use of this and capture the different information types. However, the current "example" object only has the fields "text", "ref", "english", "roman", "note", "type". These allow capturing most of what the {{exemple}} template provides but a new field "link" would need to be added and the field "english" should probably be renamed since the translation would be in French.

This is just one example. The bigger question here is to which extent the divergence of output formats for parsing different Wiktionary projects is acceptable and if yes how can these differences be made transparent?

Final thought
I am aware that I offer little answers here and mostly raise questions. However, I feel that its not my place to propose any kind of coherent policy on these issues. Especially, since I don't yet know how much I can contribute beyond my own needs in extraction data from the French Wiktionary. I look forward to hear the thoughts and views of the great people who have brought this project to the current state.

Cheers.

@kristian-clausal
Copy link
Collaborator

kristian-clausal commented Mar 28, 2023

The only way to know what other wiktionary projects need is to implement those wiktionary projects and see what kind of output they 'should' generate. "english" could easily be renamed if we can figure out a good term for it (or we could use "french" for when it's French), and just adding things to previously existing fields should not break things too much.

I believe it is futile to try to standardize any sort of format at this point (it might be even be that at any point...), so the simplest thing is just to do whatever you need for French and then implement or unify stuff later.

Similarly for the separate code stuff, it might be simplest just to let you and xxyyzz wrestle with the Chinese and French Wiktionaries, see where you needed to put your if lang_codes and based on that start planning.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
5 participants