# document-processor **Repository Path**: liliy111/document-processor ## Basic Information - **Project Name**: document-processor - **Description**: No description available - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2026-04-07 - **Last Updated**: 2026-04-07 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README Introduction ================== Process documents to prepare train/test data for 'libsvm' tool. We are using CHI to select terms as the feature vector, and then using TF-IDF to compute weight values. How To ================== Compute data for libsvm tool, include 2 phases: train and test. * For train Program entrance class: org.shirdrn.document.processor.TrainDocumentProcessorDriver Configuration file : config-train.properties * For test Program entrance class: org.shirdrn.document.processor.TestDocumentProcessorDriver Configuration file : config-test.properties FAQ ================== * If you choose to use ICTCLAS Chinese analyzer, be sure to copy file 'NLPIR_JNI.dll' to directory 'C:\Windows\System32' in Win7 operating system(default Win7 64bit, more about ICTCLAS, please hit http://ictclas.nlpir.org/downloads). Contact ================== * Website: www.shiyanjun.cn * Email : shirdrn@gmail.com