# html2docx
**Repository Path**: nicefeiniu/html2docx
## Basic Information
- **Project Name**: html2docx
- **Description**: Convert html to docx
- **Primary Language**: Python
- **License**: MIT
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No
## Statistics
- **Stars**: 0
- **Forks**: 0
- **Created**: 2024-01-18
- **Last Updated**: 2024-01-19
## Categories & Tags
**Categories**: Uncategorized
**Tags**: None
## README
# htmldocx
Convert html to docx
Dependencies: `python-docx` & `bs4`
### To install
`pip install git+ssh://git@gitee.com/nicefeiniu/html2docx.git`
### Usage
Add strings of html to an existing docx.Document object
```
from docx import Document
from htmldocx import HtmlToDocx
document = Document()
new_parser = HtmlToDocx()
# do stuff to document
html = '
Hello world
'
new_parser.add_html_to_document(html, document)
# do more stuff to document
document.save('your_file_name')
```
Convert files directly
```
from htmldocx import HtmlToDocx
new_parser = HtmlToDocx()
new_parser.parse_html_file(input_html_file_path, output_docx_file_path)
```
Convert files from a string
```
from htmldocx import HtmlToDocx
new_parser = HtmlToDocx()
docx = new_parser.parse_html_string(input_html_file_string)
```
Change table styles
Tables are not styled by default. Use the `table_style` attribute on the parser to set a table
style. The style is used for all tables.
```
from htmldocx import HtmlToDocx
new_parser = HtmlToDocx()
new_parser.table_style = 'Light Shading Accent 4'
```
To add borders to tables, use the `TableGrid` style:
```
new_parser.table_style = 'TableGrid'
```
Default table styles can be found
here: https://python-docx.readthedocs.io/en/latest/user/styles-understanding.html#table-styles-in-default-template
Change default paragraph style
No style is applied to the paragraphs by default. Use the `paragraph_style` attribute on the parser
to set a default paragraph style. The style is used for all paragraphs. If additional styling (
color, background color, alignment...) is defined in the HTML, it will be applied after the
paragraph style.
```
from htmldocx import HtmlToDocx
new_parser = HtmlToDocx()
new_parser.paragraph_style = 'Quote'
```
Default paragraph styles can be found
here: https://python-docx.readthedocs.io/en/latest/user/styles-understanding.html#paragraph-styles-in-default-template
### UPDATE 24-01-18
1. 图片宽度可根据 image 标签中的style进行设置
比如
```html
```
这个图片会放入 word 中,宽度设置为 word 的可用宽度。
当然,宽度也可以用 'px' 进行设置
比如
```html
```
那么这个图片在 word 中会被设置为 ((图片px宽度) / 794) * word的可用宽度
2. 如何设置字体
```python
from docx import Document
from docx.shared import Pt
from html2docx import HtmlToDocx
default_font_name = '宋体'
default_font_size = Pt(12) # 这就是小四
doc = Document()
default_font = doc.styles['Normal'].font
default_font.name = default_font_name
default_font.size = default_font_size
new_parser = HtmlToDocx()
html = 'Hello world
'
new_parser.add_html_to_document(html, doc)
# do more stuff to document
doc.save('your_file_name')
```