# aliyun-datahub-sdk-python
**Repository Path**: aliyun/aliyun-datahub-sdk-python
## Basic Information
- **Project Name**: aliyun-datahub-sdk-python
- **Description**: Datahub Python SDK http://pydatahub.readthedocs.io
- **Primary Language**: Unknown
- **License**: Apache-2.0
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No
## Statistics
- **Stars**: 0
- **Forks**: 0
- **Created**: 2025-05-08
- **Last Updated**: 2025-06-18
## Categories & Tags
**Categories**: Uncategorized
**Tags**: None
## README
# Datahub Python SDK
[](https://pypi.python.org/pypi/pydatahub) [](http://pydatahub.readthedocs.io/zh_CN/latest/) [](https://github.com/aliyun/aliyun-datahub-sdk-python/blob/master/LICENSE) 
-----------------
Elegant way to access Datahub Python SDK API. [Documentation](https://aliyun-datahub-sdk-python.readthedocs.io/en/latest/)
## Installation
The quick way:
```shell
$ sudo pip install pydatahub
```
The dependencies will be installed automatically.
Or from source code:
```shell
$ virtualenv pydatahub_env
$ source pydatahub_env/bin/activate
$ git clone pydatahub
$ cd pydatahub
$ python setup.py install
```
If python-dev was not installed, error message like 'Python.h: No such file or directory' will be printed. [See this](https://stackoverflow.com/questions/21530577/fatal-error-python-h-no-such-file-or-directory)
If install in windows, error message like 'Microsoft Visual C++ XX.0 is required', download and install dependency [here](https://wiki.python.org/moin/WindowsCompilers)
If network is not available, requirements are in dependency folder:
```shell
$ cd dependency
$ pip install -r first.txt
$ pip install -r second.txt
```
## Python Version
Tested on Python 2.7, 3.3, 3.4, 3.5, 3.6 and pypy, Python 3.6 recommended
## Dependencies
* setuptools (>=39.2.0)
* requests (>=2.4.0)
* simplejson (>=3.3.0)
* six (>=1.1.0)
* enum34 (>=1.1.5 for python_version < '3.4')
* crcmod (>=1.7)
* lz4 (>=2.0.0)
* cprotobuf (>=0.1.9)
* funcsigs (>=1.0.2)
* atomic>=0.7.0
* rwlock>=0.0.6
* urllib3>=1.26.10
## Run Tests
- install tox:
```shell
$ pip install -U tox
```
- fill datahub/tests/datahub.ini with your configuration
- run shell
```
$ tox
```
## Usage
```python
from datahub import DataHub
dh = DataHub('**your-access-id**', '**your-secret-access-key**', endpoint='**your-end-point**')
# with security token
# dh = DataHub('**your-access-id**', '**your-secret-access-key**', endpoint='**your-end-point**', security_token='**your-security-token**')
# ============================= create project =============================
project_name = 'my_project_name'
comment = 'my project'
dh.create_project(project_name, comment)
# ============================= get project =============================
project_result = dh.get_project('pydatahub_test')
print(project_result)
# ============================= create tuple topic =============================
from datahub.models import RecordSchema, FieldType
topic_name='tuple_topic_test'
shard_count = 3
life_cycle = 7
comment = 'tuple topic'
record_schema = RecordSchema.from_lists(['bigint_field', 'string_field', 'double_field', 'bool_field', 'time_field'],
[FieldType.BIGINT, FieldType.STRING, FieldType.DOUBLE, FieldType.BOOLEAN, FieldType.TIMESTAMP])
dh.create_tuple_topic(project_name, topic_name, shard_count, life_cycle, record_schema, comment)
# ============================= create blob topic =============================
topic_name='blob_topic_test'
shard_count = 3
life_cycle = 7
comment = 'blob topic'
dh.create_tuple_topic(project_name, topic_name, shard_count, life_cycle, comment)
# ============================= get topic =============================
topic_result = dh.get_topic(project_name, topic_name)
print(topic_result)
print(topic_result.record_schema)
# ============================= list shard =============================
shards_result = dh.list_shard(project_name, topic_name)
print(shards_result)
# ============================= put tuple records =============================
from datahub.models import TupleRecord
# put records by shard is recommended
records0 = []
record0 = TupleRecord(schema=record_schema, values=[1, 'yc1', 10.01, True, 1455869335000000])
record0.put_attribute('AK', '47')
records0.append(record0)
put_result = dh.put_records_by_shard('pydatahub_test', 'tuple_topic_test', "0", records0)
# records0 = []
# record0 = TupleRecord(schema=record_schema, values=[1, 'yc1', 10.01, True, 1455869335000000])
# record0.shard_id = '0'
# record0.put_attribute('AK', '47')
# records0.append(record0)
# put_result = dh.put_records('pydatahub_test', 'tuple_topic_test', records0)
print(put_result)
# ============================= put blob records =============================
from datahub.models import BlobRecord
# put records by shard is recommended
data = None
with open(os.path.join(root_path, 'tests/resources/datahub.png'), 'rb') as f:
data = f.read()
records1 = []
record1 = BlobRecord(blob_data=data)
record1.put_attribute('a', 'b')
records1.append(record1)
put_result = dh.put_records_by_shard('pydatahub_test', 'blob_topic_test', "0" records1)
# records1 = []
# record1 = BlobRecord(blob_data=data)
# record1.shard_id = '0'
# record1.put_attribute('a', 'b')
# records1.append(record1)
# put_result = dh.put_records('pydatahub_test', 'blob_topic_test', records1)
print(put_result)
# ============================= get cursor =============================
from datahub.models import CursorType
cursor_result = dh.get_cursor(project_name, topic_name, '0', CursorType.OLDEST)
print(cursor_result)
# ============================= get blob records =============================
limit = 10
blob_cursor_result = dh.get_cursor(project_name, topic_name, '0', CursorType.OLDEST)
get_result = dh.get_blob_records(project_name, topic_name, '0', blob_cursor_result.cursor, limit)
print(get_result)
print(get_result.records)
print(get_result.records[0])
# ============================= get tuple records =============================
limit = 10
tuple_cursor_result = dh.get_cursor(project_name, topic_name, '0', CursorType.OLDEST)
get_result = dh.get_tuple_records(project_name, topic_name, '0', record_schema, tuple_cursor_result.cursor, limit)
print(get_result)
print(get_result.records)
print(get_result.records[0].values)
```
## Examples
see more examples in [examples](https://github.com/aliyun/aliyun-datahub-sdk-python/tree/master/examples)
## Release
Update [changelog](https://github.com/aliyun/aliyun-datahub-sdk-python/tree/master/changelog.rst), then use [bumpversion](https://github.com/peritus/bumpversion) to update version:
1. bugfix: `bumpversion patch`
2. small feature: `bumpversion minor`
3. breaking change: `bumpversion major`
## Contributing
For a development install, clone the repository and then install from source:
```
git clone https://github.com/aliyun/aliyun-datahub-sdk-python.git
```
## License
Licensed under the [Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0.html)