Hi! Thanks for your interest in contributing to NLTK.
We use GitHub to host our code repositories and issues. The NLTK organization on GitHub has many repositories, so we can manage better the issues and development. The most important are:
nltk.downloader
;NLTK consists of the functionality that the Python/NLP community is motivated to contribute. Some priority areas for development are listed in the NLTK Wiki.
We use Git as our version control system, so the best way to contribute is to learn how to use it and put your changes on a Git repository. There's plenty of documentation about Git -- you can start with the Pro Git book.
To set up your local development environment for contributing to the main repository nltk/nltk:
git clone https://github.com/<your-github-username>/nltk.git
);cd nltk
to get to the root directory of the nltk
code base;pip install -r pip-req.txt
);pre-commit install
)python -m nltk.downloader all
);nltk/nltk
on GitHub
(git remote add upstream https://github.com/nltk/nltk.git
) --
you will need to use this upstream
link when updating your local repository
with all the latest contributions.We use the famous gitflow to manage our branches.
Summary of our git branching model:
develop
branch (git checkout develop
);nltk/nltk
repository
(git pull upstream develop
);develop
with a descriptive name (for example:
feature/portuguese-sentiment-analysis
, hotfix/bug-on-downloader
). You can
do it by switching to the develop
branch (git checkout develop
) and then
creating a new branch (git checkout -b name-of-the-new-branch
);git add files-changed
,
git commit -m "Add some change"
);tox -e py312
if you are on Python 3.12);AUTHORS.md
file as a contributor;git push origin branch-name
);develop
branch);develop
branch should be deployable (no failing tests).git add .
: it can add unwanted files;git commit -a
unless you know what you're doing;git diff
before adding them to the index (stage
area) and with git diff --cached
before committing;develop
: your access should be used only to accept pull requests; if you
want to make a new feature, you should use the same process as other
developers so your code will be reviewed.x
is always wrong);'{a} = {b}'
) or new-style
formatting
('{} = {}'.format(a, b)
), instead of the old-style formatting ('%s = %s' % (a, b)
);#TODO
comments should be turned into issues (use our
GitHub issue system);tox
) so you will know if your
changes broke something;See also our developer's guide.
You should write tests for every feature you add or bug you solve in the code. Having automated tests for every line of our code lets us make big changes without worries: there will always be tests to verify if the changes introduced bugs or lack of features. If we don't have tests we will be blind and every change will come with some fear of possibly breaking something.
For a better design of your code, we recommend using a technique called test-driven development, where you write your tests before writing the actual code that implements the desired feature.
You can use pytest
to run your tests, no matter which type of test it is:
cd nltk/test
pytest util.doctest # doctest
pytest unit/translate/test_nist.py # unittest
pytest # all tests
Deprecated: NLTK uses Cloudbees for continuous integration.
Deprecated: NLTK uses Travis for continuous integration.
NLTK uses GitHub Actions for continuous integration. See here for GitHub's documentation.
The .github/workflows/ci.yaml
file configures the CI:
on:
section
The cache_nltk_data
job
nltk
source code.nltk_data
via cache.
nltk.download('all')
.The test
job
3.8
, 3.9
, 3.10
, 3.11
, 3.12
).ubuntu-latest
and macos-latest
.cache_nltk_data
job to ensure that nltk_data
is available.nltk
source code.pip install -U -r requirements-ci.txt
.nltk_data
loaded via cache_nltk_data
.pytest --numprocesses auto -rsx nltk/test
.The pre-commit
job
nltk
source code.pre-commit run --all-files
)tox
locallyFirst setup a new virtual environment, see https://docs.python-guide.org/dev/virtualenvs/
Then run tox -e py312
.
For example, using pipenv
:
git clone https://github.com/nltk/nltk.git
cd nltk
pipenv install -r pip-req.txt
pipenv install tox
tox -e py312
We have three mail lists on Google Groups:
Please feel free to contact us through the nltk-dev mail list if you have any questions or suggestions. Every contribution is very welcome!
Happy hacking! (;
此处可能存在不合适展示的内容,页面不予展示。您可通过相关编辑功能自查并修改。
如您确认内容无涉及 不当用语 / 纯广告导流 / 暴力 / 低俗色情 / 侵权 / 盗版 / 虚假 / 无价值内容或违法国家有关法律法规的内容,可点击提交进行申诉,我们将尽快为您处理。