# PyCQuery

**Repository Path**: mirrors_naver/PyCQuery

## Basic Information

- **Project Name**: PyCQuery
- **Description**: Python for CQuery : Python DB-API and SQLAlchemy interfaces for Hive 3.x
- **Primary Language**: Unknown
- **License**: Apache-2.0
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2021-02-20
- **Last Updated**: 2026-01-18

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# PyCQuery

PyCQuery is a collection of Python DB-API and SQLAlchemy interfaces for Hive 3.x.
Other than [PyHive](https://github.com/dropbox/PyHive), it supports features below.
- Dynamic service discovery through Zookeeper
- Kerberos authentication with pure-python Kerberos library, [minikerberos](https://github.com/skelsec/minikerberos)

## Requirements
- Python 3.6 or above
- [HiveServer2](https://cwiki.apache.org/confluence/display/Hive/Setting+up+HiveServer2) instance for Hive 3.x

## Install
`pip3 install PyCQuery`

## Usage

### DB-API
```py
from pycquery import hive
cursor = hive.connect('localhost').cursor()
cursor.execute('SELECT * FROM my_data LIMIT 10')
print(cursor.fetchone())
print(cursor.fetchall())
cursor.close()
```

### DB-API (asynchronous)
```py
from pycquery import hive
from TCLIService.ttypes import TOperationState

cursor = hive.connect('localhost').cursor()
cursor.execute('SELECT * FROM my_data LIMIT 10', async_=True)

status = cursor.poll().operationState
while status in (TOperationState.INITIALIZED_STATE, TOperationState.RUNNING_STATE):
    logs = cursor.fetch_logs()
    for message in logs:
        print(message)

    # If needed, an asynchronous query can be cancelled at any time with:
    # cursor.cancel()

    status = cursor.poll().operationState

print(cursor.fetchall())
cursor.close()
```

### SQLAlchemy
First install this package to register it with SQLAlchemy (see ``setup.py``).

```py
from sqlalchemy import *
from sqlalchemy.engine import create_engine
from sqlalchemy.schema import *

engine = create_engine('hive://localhost:10000/default')
logs = Table('my_data', MetaData(bind=engine), autoload=True)
print(select([func.count('*')], from_obj=logs).scalar())
```

Note: query generation functionality is not fully tested, but there should be no
problem with raw SQL.

### Passing session configuration
```py
# DB-API
hive.connect('localhost', configuration={'hive.exec.reducers.max': '123'})
# SQLAlchemy
create_engine(
    'hive://user@host:10000/database',
    connect_args={'configuration': {'hive.exec.reducers.max': '123'}},
)
# SQLAlchemy with LDAP
create_engine(
    'hive://user:password@host:10000/database',
    connect_args={'auth': 'LDAP'},
)
```

### Kerberos Authentication with HTTP Thrift
- For Single HiveServer2 Instance
```py
from pycquery import hive
from pycquery_krb.common import conf

params = {
    'host': 'localhost',
    'port': '10000',
    'http_path': 'cliservice',
    'service_mode': 'http',
    'kerberos_service_name': 'hive/localhost',
    'auth': 'KERBEROS',
    'username': 'user1@EXAMPLE.COM',
    'keytab_file': 'keytab-file-for-username',
    'krb_conf': conf.KerberosConf.from_osenv() # load krb5.conf from the path set by KRB5_CONFIG or /etc/krb5.conf
     # 'krb_conf': conf.KerberosConf.from_file('other-krb5-conf-file') # load krb5.conf from your path
}

if __name__ == "__main__":
    conn = hive.connect(**params)

    try:
        cursor = conn.cursor()
        try:
            cursor.execute('SELECT * FROM my_data LIMIT 10')
            print(cursor.fetchall())
        finally:
            cursor.close()
    finally:
        conn.close()
```

- For Multiple HiveServer2 Instances with Zookeeper Discovery
```py
from pycquery import hive
from pycquery_krb.common import conf

params = {
    'host': 'zk1.host, zk2.host, zk3.host', # list of zookeeper server hosts
    'port': '2181', # zookeeper server port
    'is_zookeeper': True,
    'http_path': 'cliservice',
    'service_mode': 'http',
    'kerberos_service_name': 'hive/localhost',
    'auth': 'KERBEROS',
    'username': 'user1@EXAMPLE.COM',
    'keytab_file': 'keytab-file-for-username',
    'krb_conf': conf.KerberosConf.from_osenv() # load krb5.conf from the path set by KRB5_CONFIG or /etc/krb5.conf
     # 'krb_conf': conf.KerberosConf.from_file('other-krb5-conf-file') # load krb5.conf from your path 
}

if __name__ == "__main__":
    conn = hive.connect(**params)

    try:
        cursor = conn.cursor()
        try:
            cursor.execute('SELECT * FROM my_data LIMIT 10')
            print(cursor.fetchall())
        finally:
            cursor.close()
    finally:
        conn.close()
```

### Kerberos authentication with Binary Thrift
- It requires `sasl` and `thrift-sasl`. The pure-python Kerberos library is not supported yet.

## Updating TCLIService
The TCLIService module is autogenerated using a ``TCLIService.thrift`` file. To update it, the
``generate.py`` file can be used: ``python generate.py <TCLIServiceURL>``. When left blank, the
version for Hive 3.0 will be downloaded.

## License

```
Copyright 2021-present NAVER Corp.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
```