# TexPrax **Repository Path**: mirrors_UKPLab/TexPrax ## Basic Information - **Project Name**: TexPrax - **Description**: No description available - **Primary Language**: Unknown - **License**: Apache-2.0 - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2022-08-18 - **Last Updated**: 2026-05-24 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # TexPrax ### Lorenz Stangier`*`, Ji-Ung Lee`*`, Yuxi Wang, Marvin Müller, Nicholas Frick, Joachim Metternich, and Iryna Gurevych #### [UKP Lab, TU Darmstadt](https://www.informatik.tu-darmstadt.de/ukp/ukp_home/index.en.jsp) #### [PTW, TU Darmstadt](https://www.ptw.tu-darmstadt.de/institut_ptw/index.de.jsp) `*` Both authors contributed equally. This repository contains code and data from our TexPrax demo [paper](https://aclanthology.org/2022.aacl-demo.2/) published at AACL 2022. > **Abstract:** Collecting and annotating task-oriented dialog data is difficult, especially for highly specific domains that require expert knowledge. At the same time, informal communication channels such as instant messengers are increasingly being used at work. This has led to a lot of work-relevant information that is disseminated through those channels and needs to be post-processed manually by the employees. To alleviate this problem, we present TexPrax, a messaging system to collect and annotate _problems_, _causes_, and _solutions_ that occur in work-related chats. TexPrax uses a chatbot to directly engage the employees to provide lightweight annotations on their conversation and ease their documentation work. To comply with data privacy and security regulations, we use an end-to-end message encryption and give our users full control over their data which has various advantages over conventional annotation tools. We evaluate TexPrax in a user-study with German factory employees who ask their colleagues for solutions on problems that arise during their daily work. Overall, we collect 202 task-oriented German dialogues containing 1,027 sentences with sentence-level expert annotations. Our data analysis also reveals that real-world conversations frequently contain instances with code-switching, varying abbreviations for the same entity, and dialects which NLP systems should be able to handle. * **Contact** * Ji-Ung Lee (ji-ung.lee@tu-darmstadt.de) * UKP Lab: http://www.ukp.tu-darmstadt.de/ * PTW: https://www.ptw.tu-darmstadt.de/ * TU Darmstadt: http://www.tu-darmstadt.de/ > Drop us a line or report an issue if something is broken (and shouldn't be) or if you have any questions. > > For license information, please see the LICENSE and README files. The code for the [TexPrax](https://texprax.de/) project consists of three components: * recorder-bot * texpraxconnector * examples The modification of the matrix-synapse server (`synapserecording`) has been removed from the main branch with the port to python 3.10. It is still available in the branch `python3.7` A detailed description and installation instructions can be found below. A demo video of the project can be found [here](https://nextcloud.ukp.informatik.tu-darmstadt.de/index.php/s/EcQxDwAEeNT4w8n). > Disclaimer: This repository contains experimental software and is published for the sole purpose of giving additional background details on the respective publication. ## Citing the paper ``` @inproceedings{stangier-etal-2022-texprax, title = "{T}ex{P}rax: A Messaging Application for Ethical, Real-time Data Collection and Annotation", author = {Stangier, Lorenz and Lee, Ji-Ung and Wang, Yuxi and M{\"u}ller, Marvin and Frick, Nicholas and Metternich, Joachim and Gurevych, Iryna}, booktitle = "Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing: System Demonstrations", month = nov, year = "2022", address = "Taipei, Taiwan", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2022.aacl-demo.2", pages = "9--16", } ``` ## Data An anoymized version of the collected data including annotations can be downloaded from [tudatalib](https://tudatalib.ulb.tu-darmstadt.de/handle/tudatalib/3534) or via [huggingface-datasets](https://huggingface.co/datasets/UKPLab/TexPrax) (CC-by-NC). ### Recorder Bot The chatbot that keeps track of messages, provides label suggestions, and collects feedback via reactions. ### Texprax Connector Example code to exchange data with an external dashboard via HTTP requests. Please check the branch ```remote-storage``` to see an implementation that utilizes remote storage. ## How to setup TexPrax Detailed instructions on how to setup the TexPrax messaging and recording system. ### Setting up Synapse Clone the repostiory ```git clone https://github.com/UKPLab/TexPrax.git``` Setup your python environment. ``` conda create --name=texprax-demo python=3.10 conda activate texprax-demo ``` Install the synapse server first: ``` pip install matrix-synapse ``` Now we need to create a config file via: ``` python -m synapse.app.homeserver -c homeserver.yaml --generate-config --server-name= --report-stats= ``` This has now created a ```homeserver.yaml``` file. Now you can start the homeserver via ```synctl start``` You can check if the installation is running by going to [http://localhost:8008](http://localhost:8008) in your browser. For further steps, we ask you to follow the instructions in the [official synapse documentation](https://matrix.org/docs/projects/server/synapse). ### Registering a new user 1. Go to your ```homeserver.yaml``` location. 2. Add a new user via ``` register_new_matrix_user -c homeserver.yaml http://localhost:8008 ``` Note: Make sure that you are in the correct python environment e.g., ```conda activate texprax-demo``` 3. Go to [Element](https://app.element.io/) 4. Go to Sign In, and ```Edit``` the homeserver from [matrix.org](matrix.org) to [http://localhost:8008](http://localhost:8008) 5. Sign in with your credentials ### Setting up the recorder bot Note: You can setup the bot independently of your synapse server, for instance, using a new env: ``` conda create --name=texprax-bot python=3.10 conda activate texprax-bot ``` OLM is required for encryption. Install it via: git clone https://gitlab.matrix.org/matrix-org/olm.git olm cd olm cmake . -Bbuild cmake --build build Now go to the recorder-bot folder: ```cd recorder-bot``` and install the requirements: ```pip install -r requirements.txt``` . Nake sure that you are in the correct python environment e.g., ```conda activate texprax-bot```. If there are issues with python-olm, try this: pip install python-olm --extra-index-url https://gitlab.matrix.org/api/v4/projects/27/packages/pypi/simple Now we need to create a config file with the respective paths etc. You can use ```sample.config.yaml``` as your base file. We also need to add a new account for the bot (follow the steps above to create a new account). As an example, we will use the username ```bot``` with the password ```bot```. Setting bot credentials (```config.yaml```): matrix user_id: "@bot:texprax-demo" user_password: "bot" homeserver_url: "http://localhost:8008" The default storage location of your messages will be ```./store``` . You will also have to supply a ```message_path``` (line 34 in ```config.yaml```): message_path: ".store/messages.json" To use the models finetuned on German dialog data, download them from [tudatalib](https://tudatalib.ulb.tu-darmstadt.de/handle/tudatalib/3534) and put them into a models folder: mkdir models cd models wget -q --show-progress https://tudatalib.ulb.tu-darmstadt.de/bitstream/handle/tudatalib/3534/sequence_classification_model.zip wget -q --show-progress https://tudatalib.ulb.tu-darmstadt.de/bitstream/handle/tudatalib/3534/token_classification_model.zip unzip sequence_classification_model.zip unzip token_classification_model.zip Now add them to the ```config.yaml```: sequence_model_path: "models/sequence_classification_model" token_model_path: "models/token_classification_model" We further set the language of the bot to German by setting: language_file_path: "language_files/DE.txt" Finally, run the bot via: ``` LD_LIBRARY_PATH=/olm/build/ python autorecorderbot_start ``` After the bot is running, you can add it like any user to your room. The bot's id in this example will be: `@bot:texprax-demo` ### Synapserecording (old version) The modified Synapse instance to automatically invite the bot into newly created rooms. **Important**: This requires some features that are only available in an older (deprecated) version that uses python 3.7. Please switch to the branch ```python3.7``` for this.