TupleDict

Introduction

TupleDict: nested mapping representation.

A TupleDict is a representation of a dict-of-dict…-of-dict, or a recursive dict of a given depth as a table of tuples, padded with None values.

Typically, we want to represent the following recursive dictionary:

>>> dictofdict = {"a":{"b":{}, "c":{"d":3}}, "d":{}}

As the following dictionary:

>>> tupledict = {
... ("a", "b", None): None,
... ("a", "c", "d"): 3,
... ("d", None, None): None
... }

Such a transformation allows a simple representation of dict-(of-dict)* within a tabular data-store. Supporting all dict-like operations requires some maintenance work that is provided in this current module.

Fold and Unfold functions

The operation of converting a recursive-dict into a tuple-dict and vice versa are called fold() and unfold().

networkdisk.tupledict.unfold(recdict, maxdepth=-1)

Enumerate tuplerows from recursive dictionary, as tuples.

Two yields may have different length. If maxdepth is nonnegative then only levels up to depth maxdepth are unfold. Non-unfolded levels are kept unchanged (pointer). The default for maxdepth is -1 meaning unbounded. The enumeration is depth-first. Calling this on an empty dictionary

yields the empty tuple.

Parameters:
recdict:

a recursive dictionary

maxdepth: int, default=-1

the max depth to unfold the dictionary. If -1, then unbounded.

Examples

>>> from networkdisk.tupledict import fold, unfold
>>> recdict = { 'a': { 'aa': { 'aaa': True, 'aab': False }, 'ab': False }, 'b': False, 'c': {} }
>>> sorted(unfold(recdict))
[('a', 'aa', 'aaa', True), ('a', 'aa', 'aab', False), ('a', 'ab', False), ('b', False), ('c',)]
>>> sorted(unfold(recdict, maxdepth=0))
[({'a': {'aa': {'aaa': True, 'aab': False}, 'ab': False}, 'b': False, 'c': {}},)]
>>> sorted(unfold({}, maxdepth=0))
[({},)]
>>> sorted(unfold(recdict, maxdepth=1))
[('a', {'aa': {'aaa': True, 'aab': False}, 'ab': False}), ('b', False), ('c', {})]
networkdisk.tupledict.fold(tuples, maxdepth=-1, default=None)

Recursively transforms an iterable of tuples in a recursive dictionary. Empty tuples are ignored. If maxdepth is given, then the tuples are expected to have length at most maxdepth plus one, and to satisfy the TupleDict condition, namely that no two tuples of length maxdepth differ only on their last coordinate. Furthermore, the maxdepth-th is considered as a leaf value not to be unfolded. If tuples is empty or if it contains only empty tuples, then an empty dictionary is returned, if maxdepth is not equal to 0, or the value of the parameter default (default is None) otherwise.

Parameters:
tuples:

an iterable of tuples;

maxdepth: int, default=-1

a bound (positive integer) on the tuple length minus one, or -1 (default) meaning “unbounded”.

default:

a value for missing leaves (default is None).

Examples

>>> recdict = fold([(1, 2, 3), (), (1, 3, 4), (1, 3, 5), (0,), (), (1, 4, 3)])
>>> type(recdict) is dict
True
>>> sorted(recdict.items())
[(0, {}), (1, {2: {3: {}}, 3: {4: {}, 5: {}}, 4: {3: {}}})]

TupleDictView

class networkdisk.tupledict.ReadOnlyTupleDictView(rowstore, address, cache_level=0, **kwargs)
class networkdisk.tupledict.ReadWriteTupleDictView(rowstore, address, cache_level=0, **kwargs)

RowStores

class networkdisk.tupledict.BaseAbstractRowStore

Abstract class for tupleDict back-end storing rows

An abstract class whose implementations are the TupleDict base class. It assume to implement selection of partial _rows_ in a data store.

Notes

Abstract methods: + select

the class core method, which allows a fine control on row selection, e.g., selection under condition, projections, and aggregation.

Abstract properties: + height:

the height of the tupleDict tree. It is therefore the length of the rows minus one (the last coordinate being the _tuplevalue_).

  • lazy: a Boolean specifying whether partial _tuplekey_ membership should be checked on reading or not. When False, some TupleDictViews might have invalid addresses.

Terminology: + the _height_ is the depth of the represented recursive dict + a _row_ is a tuple of fixed length height plus one + a _tuplekey_ is a row prefix of length height + a _tuplevalue_ is a row last coordinate + _Null_ is a special value, indicating void coordinates + a _trimmed row_, is a row prefix obtained by dropping off

Null values from the right

Semantic restrictions: 1. All rows have length height+1 2. No two rows are equal 3. No two rows differ on their last coordinate only 4. In each row, the Null coordinates form a suffix of the row 5. No trimmed row is a prefix of another trimmed row 6. No row can have Null value on its last coordinate only

Class attributes: + _View:

the class to use for providing a tupledict (recursive dict of bounded depth) interface. The view class is initialized with a pointer to the rowstore instance, as well as a tuple of length less than height, called address and representing a row prefix. See the method view below.

  • Null: a value (default is None) to use for padding incomplete rows.

class networkdisk.tupledict.ReadWriteAbstractRowStore

A read/write version of BaseAbstractRowStore. In addition to the select method of the parent class, it should provide two methods for inserting or deleting data.

Abstract methods: + select

(see BaseAbstractRowStore)

  • insert for inserting a partial row

  • delete for deleting all rows starting with a row_prefix

Abstract properties: + height, lazy

(see ReadOnlyAbstractRowStore)

Methods: + bulk_insert + bulk_insert_onepass + bulk_insert_reiterable