# jdepp-python
**Repository Path**: mirrors_bokuweb/jdepp-python
## Basic Information
- **Project Name**: jdepp-python
- **Description**: Python binding for J.DepP(C++ implementation of Japanese Dependency Parsers)
- **Primary Language**: Unknown
- **License**: BSD-2-Clause
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No
## Statistics
- **Stars**: 0
- **Forks**: 0
- **Created**: 2024-09-08
- **Last Updated**: 2026-01-31
## Categories & Tags
**Categories**: Uncategorized
**Tags**: None
## README
# jdepp-python
Python binding for J.DepP(C++ implementation of Japanese Dependency Parsers) https://www.tkl.iis.u-tokyo.ac.jp/~ynaga/jdepp/
## Install
```
$ python -m pip install jdepp
```
### Precompiled model files
pip install does not install the model(dictionary).
You can get precompiled model files(MeCab POS tagging + train with KNBC copus) from
https://github.com/lighttransport/jdepp-python/releases/tag/v0.1.0
Precompiled KNBC model file is licensed under 3-clause BSD license.
### Build configuration
* MeCab style POS format: `FEATURE_SEP ','`
* See `jdepp/typedf.h` for more info about ifdef macros.
## Example
Download precompiled model file.
```bash
$ wget https://github.com/lighttransport/jdepp-python/releases/download/v0.1.0/knbc-mecab-jumandic-2ndpoly.tar.gz
$ tar xvf knbc-mecab-jumandic-2ndpoly.tar.gz
```
```py
import jdepp
model_path = "model/knbc"
parser = jdepp.Jdepp()
parser.load_model(model_path)
# NOTE: Mecab format: surface + TAB + feature(comma separated 7 fields)
input_postagged = """吾輩 名詞,普通名詞,*,*,吾輩,わがはい,代表表記:我が輩/わがはい カテゴリ:人
は 助詞,副助詞,*,*,は,は,*
猫 名詞,普通名詞,*,*,猫,ねこ,*
である 判定詞,*,判定詞,デアル列基本形,だ,である,*
。 特殊,句点,*,*,。,。,*
名前 名詞,普通名詞,*,*,名前,なまえ,*
は 助詞,副助詞,*,*,は,は,*
まだ 副詞,*,*,*,まだ,まだ,*
ない 形容詞,*,イ形容詞アウオ段,基本形,ない,ない,*
。 特殊,句点,*,*,。,。,*
EOS
"""
sent = parser.parse_from_postagged(input_postagged)
print(sent)
```
### Print in tree
```py
print(jdepp.to_tree(str(sent)))
```
```
# S-ID: 1; J.DepP
0: 吾輩は━━┓
1: 猫である。━━┓
2: 名前は━━┫
3: まだ━━┫
4: ない。EOS
```
### Graphviz dot export
`jdepp.to_dot` is provided to export graph as dot(Graphviz)
```py
dot_text = jdepp.to_dot(str(sentence))
# feed output text to graphviz viewer, e.g. https://dreampuf.github.io/GraphvizOnline/
```
See [examples/](examples) for more details
## POS tagged input format
MeCab style. surface + TAB + feature(comma separated 7 fields)
### With jagger
You can use jagger-python for POS tagging.
```py
import jagger
import jdepp
jagger_model_path = "model/kwdlc/patterns"
tokenizer = jagger.Jagger()
tokenizer.load_model(jagger_model_path)
text = "吾輩は猫である。名前はまだない。"
toks = tokenizer.tokenize(text)
pos_tagged_input = ""
for tok in toks:
pos_tagged_input += tok.surface() + '\t' + tok.feature() + '\n'
pos_tagged_input += "EOS\n"
jdepp_model_path = "model/knbc"
parser.load_model(jdepp_model_path)
parser.parse_from_postagged(pos_tagged_input)
```
## Build standalone C++ app + training a model
If you just want to use J.DepP from cli(e.g. batch processing),
you can build a standalone C++ app using CMake.
We modified J.DepP source code to improve portablily(e.g. Ours works well on Windows)
Training a model from Python binding is also not yet supported.
For a while, you can train a model by using standalone C++ jdepp app.
### Standalone python module(For developer)
This is for developer usecase.
Use setup.py(pyproject.toml) to build python module for end users.
Install pybind11 devkit.
```
$ python -m pip install pybind11
```
Then invoke cmake with `-DJDEPP_WITH_PYTHON` and `pybind11_DIR`
```
$ pybind11_DIR=/path/to/pybind11 cmake -DJDEPP_WITH_PYTHON=1 ...
```
### Releasing
* tag it: `git tag vX.Y.Z`
* push tag: `git push --tags`
Versioning is automatically done through `setuptools_scm`
## TODO
- [ ] WASM build
- [ ] Training API support
- [ ] Integrate jagger POS tagger as builtin(standalone) POS tagger in J.DepP
- https://github.com/lighttransport/jagger-python
- [ ] MMap(or SharedMemory) load of dict data to save memory in Python multiprocessing
## License
jdepp-python is licensed under 2-Clause BSD license.
J.DepP https://www.tkl.iis.u-tokyo.ac.jp/~ynaga/jdepp/ is licensed under GPLv2/LGPLv2.1/BSD triple license.
## Thrird party license
* pacco, cedar, opal(subcompoennts of J.DepP): GPLv2/LGPLv2.1/BSD triple license. We choose BSD license.
* io-util: MIT license.
* optparse: Unlicense https://github.com/skeeto/optparse