๐Ÿ Python & library/Etc.

[h5py] hdf5 ์†Œ๊ฐœ, h5py ์‚ฌ์šฉ๋ฒ• ๊ฐ„๋‹จ ์ •๋ฆฌ

๋ณต๋งŒ 2021. 12. 14. 17:54

Manual: https://docs.h5py.org/en/stable/

 

HDF5 for Python — h5py 3.5.0 documentation

© Copyright 2014, Andrew Collette and contributors Revision fb9989a7.

docs.h5py.org

 

h5py๋Š” HDF5 ๋ฐ์ดํ„ฐ ํฌ๋งท์„ Python์œผ๋กœ ๋‹ค๋ฃฐ ์ˆ˜ ์žˆ๋Š” ํŒจํ‚ค์ง€์ด๋‹ค.

HDF5๋ฅผ ์ด์šฉํ•˜๋ฉด ๋Œ€๋Ÿ‰์˜ NumPy ๋ฐ์ดํ„ฐ ๋“ฑ์„ ์†์‰ฝ๊ฒŒ ๊ด€๋ฆฌํ•  ์ˆ˜ ์žˆ๋‹ค.

 

 

 

About HDF5

HDF๋Š” Hierarchical Data Format์˜ ์•ฝ์ž์ด๋‹ค.

HDF5์˜ ๋ชจ๋“  object๋Š” ๊ฐ์ž์˜ 'name'์ด ์žˆ๊ณ , ๋‹ค์Œ๊ณผ ๊ฐ™์ด ๊ณ„์ธต์ ์ธ ๊ตฌ์กฐ๋กœ ๊ด€๋ฆฌ๋œ๋‹ค.

https://www.neonscience.org/resources/learning-hub/tutorials/about-hdf5

 

์ผ๋ฐ˜์ ์ธ ์šด์˜์ฒด์ œ์˜ ํŒŒ์ผ์‹œ์Šคํ…œ์„ ์ƒ๊ฐํ•˜๋ฉด ๋œ๋‹ค. ํด๋”์™€ ํŒŒ์ผ๋กœ ์ด๋ฃจ์–ด์ ธ ์žˆ๋Š” ๊ฒƒ์ฒ˜๋Ÿผ, ๊ฐ ๊ณ„์ธต์€ / separator์— ์˜ํ•ด ๋ถ„๋ฆฌ๋œ๋‹ค (POSIX-style).

 

 

HDF5 ๊ตฌ์กฐ์—์„œ, ๊ฐ ํด๋”๋Š” 'Group'์ด๋ผ๊ณ  ๋ถˆ๋ฆฐ๋‹ค. Root ํด๋” ์—ญ์‹œ ํ•˜๋‚˜์˜ ํด๋”์ธ ๊ฒƒ์ฒ˜๋Ÿผ, ์ƒ์„ฑํ•œ HDF file ์—ญ์‹œ ํ•˜๋‚˜์˜ group์ด๋‹ค.

ํŒŒ์ผ์˜ ์—ญํ• ์„ ํ•˜๋Š” ๊ฒƒ์€ 'Dataset'๊ณผ 'Attribute'์ด๋‹ค.

Dataset์€ HDF5 file์— ๋‹ด๊ฒจ์ ธ ์žˆ๋Š” ๋ฐ์ดํ„ฐ๋ฅผ ๋งํ•œ๋‹ค. ์˜ˆ๋ฅผ ๋“ค๋ฉด numpy array๊ฐ€ ๋  ์ˆ˜ ์žˆ๋‹ค.

Attribute๋Š” ๋ฐ์ดํ„ฐ์˜ metadata๋ผ๊ณ  ์ƒ๊ฐํ•  ์ˆ˜ ์žˆ๋‹ค. ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•œ ์ •๋ณด๋“ค์„ ํ•จ๊ป˜ ๋ณด๊ด€ํ•  ์ˆ˜ ์žˆ๋‹ค.

 

 

 

Install

Conda๋‚˜ pip๋ฅผ ์ด์šฉํ•ด ์„ค์น˜ํ•  ์ˆ˜ ์žˆ๋‹ค.

conda install h5py
pip install h5py

 

 

 

Import & create HDF file

Python์—์„œ ๋‹ค์Œ๊ณผ ๊ฐ™์ด importํ•˜์—ฌ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋‹ค.

import h5py

 

HDF5 ํŒŒ์ผ์„ ๋‹ค๋ฃจ๋Š” ๋ฒ•์€ ์ผ๋ฐ˜์ ์ธ Python file object๋ฅผ ๋‹ค๋ฃจ๋Š” ๋ฐฉ๋ฒ•๊ณผ ๋™์ผํ•˜๋‹ค.

๋‹ค์Œ๊ณผ ๊ฐ™์ด w ๋ชจ๋“œ๋กœ HDF5 ํŒŒ์ผ์„ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ๋‹ค.

with h5py.File('myfile.hdf5', 'w') as f:
    ...

 

์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋Š” ๋ชจ๋“œ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

 

 

 

Group

Group์„ ์ƒ์„ฑํ•˜๋Š” ๋ฐฉ๋ฒ•์€ ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

group_a = f.create_group('group_a')

 

๋‹ค์Œ๊ณผ ๊ฐ™์ด group์˜ name์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๊ณ , group ํ•˜์œ„์— group์„ ์ƒ์„ฑํ•  ์ˆ˜๋„ ์žˆ๋‹ค.

#group name ํ™•์ธ
print(group_a.name)
>> '/group_a'

#ํ•˜์œ„ group ์ƒ์„ฑ
group_b = group_a.create_group('group_b')
print(group_b.name)
>> '/group_a/group_b

 

์ƒ์„ฑ๋œ group๋“ค์€ Python dictionary์™€ ๊ฐ™์€ ๋ฐฉ์‹์œผ๋กœ ๊ด€๋ฆฌ๋œ๋‹ค.

๊ฐ group์˜ name์ด key๊ฐ€ ๋˜์–ด, name์„ ์ด์šฉํ•ด ๊ฐ group์— ์ ‘๊ทผํ•  ์ˆ˜ ์žˆ๊ณ ,

๋งˆ์ฐฌ๊ฐ€์ง€๋กœ dictionary์˜ ์—ฌ๋Ÿฌ ๊ธฐ๋Šฅ๋“ค์„ ์ด์šฉํ•  ์ˆ˜ ์žˆ๋‹ค.

#group ์ ‘๊ทผ
group_b = f['/group_a/group_b'] #์ด๋ฏธ ์ƒ์„ฑ๋˜์–ด ์žˆ๋Š” group๋งŒ ๊ฐ€๋Šฅ

#group๋“ค์˜ list ํ™•์ธ
print(list(f.keys())
>> ['group_a']
print(list(grou_a.keys())
>> ['group_b']

#group ์‚ญ์ œ
del f['/group_a/group_b']
print(list(grou_a.keys())
>> []

 

 

 

Dataset

Dataset์„ ์ƒ์„ฑํ•˜๋Š” ๋ฐฉ๋ฒ•์€ ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค. ์ฒซ ๋ฒˆ์งธ ์ธ์ž๋กœ dataset์˜ name์ด ๋“ค์–ด๊ฐ€๊ณ , data์™€ dtype ๋“ฑ์„ ์ง€์ •ํ•  ์ˆ˜ ์žˆ๋‹ค. ์•„๋ž˜์—์„œ๋Š” group_a์— dataset์„ ์ƒ์„ฑํ–ˆ์ง€๋งŒ, root์—๋„ dataset์„ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ๋‹ค (f.create_dataset)

import numpy as np

data = np.arange(100)
group_a.create_dataset('data_a', data=data, dtype=np.int16)

 

์ƒ์„ฑ๋œ dataset์€ group๊ณผ ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ name์œผ๋กœ ์ ‘๊ทผํ•  ์ˆ˜ ์žˆ๋‹ค.

data_a = f['data_a']

 

 

 

Attribute

Attribute๋Š” dataset๊ณผ ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ ํŒŒ์ผ์˜ ์—ญํ• ์„ ํ•˜์ง€๋งŒ, .attrs๋ฅผ ํ†ตํ•ด ์ ‘๊ทผํ•  ์ˆ˜ ์žˆ๋‹ค.

Group์— attribute๋ฅผ ์ถ”๊ฐ€ํ•˜๋Š” ๋ฐฉ๋ฒ•์€ ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค. Dataset์˜ ์ƒ์„ฑ ๋ฐฉ๋ฒ•๊ณผ ๊ฑฐ์˜ ๋™์ผํ•˜๋ฉฐ, dtype ์—ญ์‹œ ์ง€์ •ํ•ด์ค„ ์ˆ˜ ์žˆ๋‹ค.

group_a.attrs.create('attr_a', data='hello')

 

Dataset๊ณผ ๋‹ค๋ฅธ ์ ์€, attribute์˜ ๊ฒฝ์šฐ ์ ‘๊ทผ๊ณผ ๋™์‹œ์— ์ƒ์„ฑ์„ ํ•  ์ˆ˜๋„ ์žˆ๋‹ค.

group_a.attrs['attr_b'] = 'hi'
๋ฐ˜์‘ํ˜•