python - Iteratively writing to HDF5 Stores in Pandas -


pandas has following examples how store series, dataframes , panelsin hdf5 files:

prepare data:

in [1142]: store = hdfstore('store.h5')  in [1143]: index = date_range('1/1/2000', periods=8)  in [1144]: s = series(randn(5), index=['a', 'b', 'c', 'd', 'e'])  in [1145]: df = dataframe(randn(8, 3), index=index,    ......:                columns=['a', 'b', 'c'])    ......:  in [1146]: wp = panel(randn(2, 5, 4), items=['item1', 'item2'],    ......:            major_axis=date_range('1/1/2000', periods=5),    ......:            minor_axis=['a', 'b', 'c', 'd'])    ......: 

save in store:

in [1147]: store['s'] = s  in [1148]: store['df'] = df  in [1149]: store['wp'] = wp 

inspect what's in store:

in [1150]: store out[1150]:  <class 'pandas.io.pytables.hdfstore'> file path: store.h5 /df            frame        (shape->[8,3])   /s             series       (shape->[5])     /wp            wide         (shape->[2,5,4]) 

close store:

in [1151]: store.close() 

questions:

  1. in code above, when data written disk?

  2. say want add thousands of large dataframes living in .csv files single .h5 file. need load them , add them .h5 file 1 one since cannot afford have them in memory @ once take memory. possible hdf5? correct way it?

  3. the pandas documentation says following:

    "these stores not appendable once written (though remove them , rewrite). nor queryable; must retrieved in entirety."

    what mean not appendable nor queryable? also, shouldn't once closed instead of written?

  1. as statement exectued, eg store['df'] = df. close closes actual file (which closed if process exists, print warning message)

  2. read section http://pandas.pydata.org/pandas-docs/dev/io.html#storing-in-table-format

    it not idea put lot of nodes in .h5 file. want append , create smaller number of nodes.

    you can iterate thru .csv , store/append them 1 one. like:

    for f in files:   df = pd.read_csv(f)   df.to_hdf('file.h5',f,df) 

    would 1 way (creating separate node each file)

  3. not appendable - once write it, can retrieve @ once, e.g. cannot select sub-section

    if have table, can things like:

    pd.read_hdf('my_store.h5','a_table_node',['index>100']) 

    which database query, getting part of data

    thus, store not appendable, nor queryable, while table both.


Comments

Popular posts from this blog

SPSS keyboard combination alters encoding -

Add new record to the table by click on the button in Microsoft Access -

javascript - jQuery .height() return 0 when visible but non-0 when hidden -