# selectolax **Repository Path**: mirrors_rushter/selectolax ## Basic Information - **Project Name**: selectolax - **Description**: Python binding to Modest and Lexbor engines (fast HTML5 parser with CSS selectors). - **Primary Language**: Unknown - **License**: MIT - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2022-01-06 - **Last Updated**: 2026-02-09 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README ![selectolax logo](docs/logo.png) --- A fast HTML5 parser with CSS selectors, written in Cython, using [Modest](https://github.com/lexborisov/Modest/) and [Lexbor](https://github.com/lexbor/lexbor) engines. --- [![PyPI - Version](https://img.shields.io/pypi/v/selectolax?logo=pypi&label=Pypi&logoColor=fff)](https://pypi.org/project/selectolax) [![PyPI Total Downloads](https://static.pepy.tech/badge/selectolax)](https://pepy.tech/projects/selectolax) [![CI](https://img.shields.io/github/actions/workflow/status/rushter/selectolax/pythonpackage.yml?branch=master&logo=githubactions&label=CI)](https://github.com/rushter/selectolax/actions/workflows/pythonpackage.yml?query=branch%3Amaster+event%3Apush) [![Python Versions](https://img.shields.io/pypi/pyversions/selectolax?logo=python&logoColor=fff&label=Python)](https://pypi.org/project/selectolax) [![GitHub License](https://img.shields.io/github/license/rushter/selectolax?logo=github&label=License)](https://github.com/rushter/selectolax/blob/master/LICENSE) --- ## Installation From PyPI using pip: ```bash pip install selectolax ``` If installation fails due to compilation errors, you may need to install [Cython](https://github.com/cython/cython): ```bash pip install selectolax[cython] ``` This usually happens when you try to install an outdated version of selectolax on a newer version of Python. Development version from GitHub: ```bash git clone --recursive https://github.com/rushter/selectolax cd selectolax pip install -r requirements_dev.txt python setup.py install ``` How to compile selectolax while developing: ```bash make clean make dev ``` ## Basic examples Here are some basic examples to get you started with selectolax: Parsing HTML and extracting text: ```python In [1]: from selectolax.lexbor import LexborHTMLParser ...: ...: html = """ ...:

Hi there

...:

Lorem Ipsum is simply dummy text of the printing and typesetting industry.

...:

Lorem ipsum dolor sit amet, consectetur adipiscing elit.

...: """ ...: tree = LexborHTMLParser(html) In [2]: tree.css_first('h1#title').text() Out[2]: 'Hi there' In [3]: tree.css_first('h1#title').attributes Out[3]: {'id': 'title', 'data-updated': '20201101'} In [4]: [node.text() for node in tree.css('.post')] Out[4]: ['Lorem Ipsum is simply dummy text of the printing and typesetting industry. ', 'Lorem ipsum dolor sit amet, consectetur adipiscing elit.'] ``` ### Using advanced CSS selectors ```python In [1]: html = "

link

text

" ...: selector = "div > :nth-child(2n+1):not(:has(a))" In [2]: for node in LexborHTMLParser(html).css(selector): ...: print(node.attributes, node.text(), node.tag) ...: print(node.parent.tag) ...: print(node.html) ...: {'id': 'p1'} p div

{'id': 'p5'} text p div

text

``` #### Using `lexbor-contains` CSS pseudo-class to match text ```python from selectolax.lexbor import LexborHTMLParser html = "

hello

lexbor is AwesOme

" parser = LexborHTMLParser(html) # Case-insensitive search results = parser.css('p:lexbor-contains("awesome" i)') # Case-sensitive search results = parser.css('p:lexbor-contains("AwesOme")') assert len(results) == 1 assert results[0].text() == "lexbor is AwesOme" ``` * [More examples](https://selectolax.readthedocs.io/en/latest/examples.html) ### Available backends Selectolax supports two backends: `Modest` and `Lexbor`. By default, all examples use the `Lexbor` backend. Most of the features between backends are almost identical, but there are some differences. As of 2024, the preferred backend is `Lexbor`. The `Modest` backend is still available for compatibility reasons and the underlying C library that selectolax uses is not maintained anymore. To use `lexbor`, just import the parser and use it in the similar way to the `HTMLParser`. ```python In [1]: from selectolax.lexbor import LexborHTMLParser In [2]: html = """ ...: Hi there ...:

2021-08-15

...: """ In [3]: parser = LexborHTMLParser(html) In [4]: parser.root.css_first("#updated").text() Out[4]: '2021-08-15' ``` ## Simple Benchmark * Extract title, links, scripts and a meta tag from main pages of top 754 domains. See `examples/benchmark.py` for more information. | Package | Time | |-------------------------------|-----------| | Beautiful Soup (html.parser) | 61.02 sec.| | lxml / Beautiful Soup (lxml) | 9.09 sec. | | html5_parser | 16.10 sec.| | selectolax (Modest) | 2.94 sec. | | selectolax (Lexbor) | 2.39 sec. | ## Links * [selectolax API reference and examples](https://selectolax.readthedocs.io/en/latest/index.html) * [Video introduction to web scraping using selectolax](https://youtu.be/HpRsfpPuUzE) * [How to Scrape 7k Products with Python using selectolax and httpx](https://www.youtube.com/watch?v=XpGvq755J2U) * [Modest introduction](https://lexborisov.github.io/Modest/) * [Modest benchmark](https://lexborisov.github.io/benchmark-html-parsers/) * [Python benchmark](https://rushter.com/blog/python-fast-html-parser/) * [Another Python benchmark](https://www.peterbe.com/plog/selectolax-or-pyquery) * [Universal interface to lxml and selectolax](https://github.com/lorien/domselect) ## License * Modest engine — [LGPL2.1](https://github.com/lexborisov/Modest/blob/master/LICENSE) * lexbor engine — [Apache-2.0 license](https://github.com/lexbor/lexbor?tab=Apache-2.0-1-ov-file#readme) * selectolax - [MIT](https://github.com/rushter/selectolax/blob/master/LICENSE) ## Contributors Thanks to all the contributors of selectolax!