# PyCQuery **Repository Path**: mirrors_naver/PyCQuery ## Basic Information - **Project Name**: PyCQuery - **Description**: Python for CQuery : Python DB-API and SQLAlchemy interfaces for Hive 3.x - **Primary Language**: Unknown - **License**: Apache-2.0 - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2021-02-20 - **Last Updated**: 2026-01-18 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # PyCQuery PyCQuery is a collection of Python DB-API and SQLAlchemy interfaces for Hive 3.x. Other than [PyHive](https://github.com/dropbox/PyHive), it supports features below. - Dynamic service discovery through Zookeeper - Kerberos authentication with pure-python Kerberos library, [minikerberos](https://github.com/skelsec/minikerberos) ## Requirements - Python 3.6 or above - [HiveServer2](https://cwiki.apache.org/confluence/display/Hive/Setting+up+HiveServer2) instance for Hive 3.x ## Install `pip3 install PyCQuery` ## Usage ### DB-API ```py from pycquery import hive cursor = hive.connect('localhost').cursor() cursor.execute('SELECT * FROM my_data LIMIT 10') print(cursor.fetchone()) print(cursor.fetchall()) cursor.close() ``` ### DB-API (asynchronous) ```py from pycquery import hive from TCLIService.ttypes import TOperationState cursor = hive.connect('localhost').cursor() cursor.execute('SELECT * FROM my_data LIMIT 10', async_=True) status = cursor.poll().operationState while status in (TOperationState.INITIALIZED_STATE, TOperationState.RUNNING_STATE): logs = cursor.fetch_logs() for message in logs: print(message) # If needed, an asynchronous query can be cancelled at any time with: # cursor.cancel() status = cursor.poll().operationState print(cursor.fetchall()) cursor.close() ``` ### SQLAlchemy First install this package to register it with SQLAlchemy (see ``setup.py``). ```py from sqlalchemy import * from sqlalchemy.engine import create_engine from sqlalchemy.schema import * engine = create_engine('hive://localhost:10000/default') logs = Table('my_data', MetaData(bind=engine), autoload=True) print(select([func.count('*')], from_obj=logs).scalar()) ``` Note: query generation functionality is not fully tested, but there should be no problem with raw SQL. ### Passing session configuration ```py # DB-API hive.connect('localhost', configuration={'hive.exec.reducers.max': '123'}) # SQLAlchemy create_engine( 'hive://user@host:10000/database', connect_args={'configuration': {'hive.exec.reducers.max': '123'}}, ) # SQLAlchemy with LDAP create_engine( 'hive://user:password@host:10000/database', connect_args={'auth': 'LDAP'}, ) ``` ### Kerberos Authentication with HTTP Thrift - For Single HiveServer2 Instance ```py from pycquery import hive from pycquery_krb.common import conf params = { 'host': 'localhost', 'port': '10000', 'http_path': 'cliservice', 'service_mode': 'http', 'kerberos_service_name': 'hive/localhost', 'auth': 'KERBEROS', 'username': 'user1@EXAMPLE.COM', 'keytab_file': 'keytab-file-for-username', 'krb_conf': conf.KerberosConf.from_osenv() # load krb5.conf from the path set by KRB5_CONFIG or /etc/krb5.conf # 'krb_conf': conf.KerberosConf.from_file('other-krb5-conf-file') # load krb5.conf from your path } if __name__ == "__main__": conn = hive.connect(**params) try: cursor = conn.cursor() try: cursor.execute('SELECT * FROM my_data LIMIT 10') print(cursor.fetchall()) finally: cursor.close() finally: conn.close() ``` - For Multiple HiveServer2 Instances with Zookeeper Discovery ```py from pycquery import hive from pycquery_krb.common import conf params = { 'host': 'zk1.host, zk2.host, zk3.host', # list of zookeeper server hosts 'port': '2181', # zookeeper server port 'is_zookeeper': True, 'http_path': 'cliservice', 'service_mode': 'http', 'kerberos_service_name': 'hive/localhost', 'auth': 'KERBEROS', 'username': 'user1@EXAMPLE.COM', 'keytab_file': 'keytab-file-for-username', 'krb_conf': conf.KerberosConf.from_osenv() # load krb5.conf from the path set by KRB5_CONFIG or /etc/krb5.conf # 'krb_conf': conf.KerberosConf.from_file('other-krb5-conf-file') # load krb5.conf from your path } if __name__ == "__main__": conn = hive.connect(**params) try: cursor = conn.cursor() try: cursor.execute('SELECT * FROM my_data LIMIT 10') print(cursor.fetchall()) finally: cursor.close() finally: conn.close() ``` ### Kerberos authentication with Binary Thrift - It requires `sasl` and `thrift-sasl`. The pure-python Kerberos library is not supported yet. ## Updating TCLIService The TCLIService module is autogenerated using a ``TCLIService.thrift`` file. To update it, the ``generate.py`` file can be used: ``python generate.py ``. When left blank, the version for Hive 3.0 will be downloaded. ## License ``` Copyright 2021-present NAVER Corp. Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. ```