Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Japanese .jsonl to .index error with japanese part of english wiktionary #572

Open
franzmondlichtmann opened this issue Jul 7, 2024 · 0 comments

Comments

@franzmondlichtmann
Copy link

franzmondlichtmann commented Jul 7, 2024

OS: Newest EndeavourOS updates (arch linux with calamares installer)
Python-Setup: Micromamba with python 3.10
Shell: Fish shell

pyglossary was installed with pip.
I did take the wiktionary .jsonl files from the kaikki.org site.
It worked for the spanish part of the english wiktionary, but when I try it with the japanese part I get an error:

laptop02@laptop02-pc ~/Downloads> pyglossary kaikki.org-dictionary-Japanese.jsonl kaikki.org-dictionary-Japanese.index                                                  (py3) 
[INFO] Writing to DictOrg file '/home/laptop02/Downloads/kaikki.org-dictionary-Japanese.index'
[ERROR] Exception while calling plugin's write function                                                                                                                       
Traceback (most recent call last):
  File "/home/laptop02/fish/envs/py3.10/lib/python3.10/site-packages/pyglossary/glossary_v2.py", line 908, in _write
    self._writeEntries(writerList, filename)
  File "/home/laptop02/fish/envs/py3.10/lib/python3.10/site-packages/pyglossary/glossary_v2.py", line 842, in _writeEntries
    for entry in self:
  File "/home/laptop02/fish/envs/py3.10/lib/python3.10/site-packages/pyglossary/glossary_v2.py", line 393, in _readersEntryGen
    yield from self._applyEntryFiltersGen(reader)
  File "/home/laptop02/fish/envs/py3.10/lib/python3.10/site-packages/pyglossary/glossary_v2.py", line 407, in _applyEntryFiltersGen
    for entry in gen:
  File "/home/laptop02/fish/envs/py3.10/lib/python3.10/site-packages/pyglossary/plugins/wiktextract.py", line 156, in __iter__
    yield self.makeEntry(json_loads(line))
  File "/home/laptop02/fish/envs/py3.10/lib/python3.10/site-packages/pyglossary/plugins/wiktextract.py", line 208, in makeEntry
    self.writeSenseList(_hf, data.get("senses"))  # type: ignore
  File "/home/laptop02/fish/envs/py3.10/lib/python3.10/site-packages/pyglossary/plugins/wiktextract.py", line 313, in writeSenseList
    self.makeList(
  File "/home/laptop02/fish/envs/py3.10/lib/python3.10/site-packages/pyglossary/plugins/wiktextract.py", line 653, in makeList
    processor(hf, el)
  File "/home/laptop02/fish/envs/py3.10/lib/python3.10/site-packages/pyglossary/plugins/wiktextract.py", line 612, in writeSense
    self.writeSenseExamples(hf, sense.get("examples"))
  File "/home/laptop02/fish/envs/py3.10/lib/python3.10/site-packages/pyglossary/plugins/wiktextract.py", line 392, in writeSenseExamples
    self.writeSenseExample(hf, example)
  File "/home/laptop02/fish/envs/py3.10/lib/python3.10/site-packages/pyglossary/plugins/wiktextract.py", line 369, in writeSenseExample
    hf.write(text)
  File "src/lxml/serializer.pxi", line 1660, in lxml.etree._IncrementalFileWriter.write
TypeError: got invalid input value of type <class 'list'>, expected string or Element
Traceback (most recent call last):
  File "/home/laptop02/fish/envs/py3.10/lib/python3.10/site-packages/pyglossary/glossary_v2.py", line 908, in _write
    self._writeEntries(writerList, filename)
  File "/home/laptop02/fish/envs/py3.10/lib/python3.10/site-packages/pyglossary/glossary_v2.py", line 842, in _writeEntries
    for entry in self:
  File "/home/laptop02/fish/envs/py3.10/lib/python3.10/site-packages/pyglossary/glossary_v2.py", line 393, in _readersEntryGen
    yield from self._applyEntryFiltersGen(reader)
  File "/home/laptop02/fish/envs/py3.10/lib/python3.10/site-packages/pyglossary/glossary_v2.py", line 407, in _applyEntryFiltersGen
    for entry in gen:
  File "/home/laptop02/fish/envs/py3.10/lib/python3.10/site-packages/pyglossary/plugins/wiktextract.py", line 156, in __iter__
    yield self.makeEntry(json_loads(line))
  File "/home/laptop02/fish/envs/py3.10/lib/python3.10/site-packages/pyglossary/plugins/wiktextract.py", line 208, in makeEntry
    self.writeSenseList(_hf, data.get("senses"))  # type: ignore
  File "/home/laptop02/fish/envs/py3.10/lib/python3.10/site-packages/pyglossary/plugins/wiktextract.py", line 313, in writeSenseList
    self.makeList(
  File "/home/laptop02/fish/envs/py3.10/lib/python3.10/site-packages/pyglossary/plugins/wiktextract.py", line 653, in makeList
    processor(hf, el)
  File "/home/laptop02/fish/envs/py3.10/lib/python3.10/site-packages/pyglossary/plugins/wiktextract.py", line 612, in writeSense
    self.writeSenseExamples(hf, sense.get("examples"))
  File "/home/laptop02/fish/envs/py3.10/lib/python3.10/site-packages/pyglossary/plugins/wiktextract.py", line 392, in writeSenseExamples
    self.writeSenseExample(hf, example)
  File "/home/laptop02/fish/envs/py3.10/lib/python3.10/site-packages/pyglossary/plugins/wiktextract.py", line 369, in writeSenseExample
    hf.write(text)
  File "src/lxml/serializer.pxi", line 1660, in lxml.etree._IncrementalFileWriter.write
TypeError: got invalid input value of type <class 'list'>, expected string or Element
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
1 participant