# html2docx **Repository Path**: nicefeiniu/html2docx ## Basic Information - **Project Name**: html2docx - **Description**: Convert html to docx - **Primary Language**: Python - **License**: MIT - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2024-01-18 - **Last Updated**: 2024-01-19 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # htmldocx Convert html to docx Dependencies: `python-docx` & `bs4` ### To install `pip install git+ssh://git@gitee.com/nicefeiniu/html2docx.git` ### Usage Add strings of html to an existing docx.Document object ``` from docx import Document from htmldocx import HtmlToDocx document = Document() new_parser = HtmlToDocx() # do stuff to document html = '

Hello world

' new_parser.add_html_to_document(html, document) # do more stuff to document document.save('your_file_name') ``` Convert files directly ``` from htmldocx import HtmlToDocx new_parser = HtmlToDocx() new_parser.parse_html_file(input_html_file_path, output_docx_file_path) ``` Convert files from a string ``` from htmldocx import HtmlToDocx new_parser = HtmlToDocx() docx = new_parser.parse_html_string(input_html_file_string) ``` Change table styles Tables are not styled by default. Use the `table_style` attribute on the parser to set a table style. The style is used for all tables. ``` from htmldocx import HtmlToDocx new_parser = HtmlToDocx() new_parser.table_style = 'Light Shading Accent 4' ``` To add borders to tables, use the `TableGrid` style: ``` new_parser.table_style = 'TableGrid' ``` Default table styles can be found here: https://python-docx.readthedocs.io/en/latest/user/styles-understanding.html#table-styles-in-default-template Change default paragraph style No style is applied to the paragraphs by default. Use the `paragraph_style` attribute on the parser to set a default paragraph style. The style is used for all paragraphs. If additional styling ( color, background color, alignment...) is defined in the HTML, it will be applied after the paragraph style. ``` from htmldocx import HtmlToDocx new_parser = HtmlToDocx() new_parser.paragraph_style = 'Quote' ``` Default paragraph styles can be found here: https://python-docx.readthedocs.io/en/latest/user/styles-understanding.html#paragraph-styles-in-default-template ### UPDATE 24-01-18 1. 图片宽度可根据 image 标签中的style进行设置 比如 ```html images/2024/01/18/火箭.png ``` 这个图片会放入 word 中,宽度设置为 word 的可用宽度。 当然,宽度也可以用 'px' 进行设置 比如 ```html images/2024/01/18/火箭.png ``` 那么这个图片在 word 中会被设置为 ((图片px宽度) / 794) * word的可用宽度 2. 如何设置字体 ```python from docx import Document from docx.shared import Pt from html2docx import HtmlToDocx default_font_name = '宋体' default_font_size = Pt(12) # 这就是小四 doc = Document() default_font = doc.styles['Normal'].font default_font.name = default_font_name default_font.size = default_font_size new_parser = HtmlToDocx() html = '

Hello world

' new_parser.add_html_to_document(html, doc) # do more stuff to document doc.save('your_file_name') ```