go map学习探秘

兔吱吱吱

努力！奋斗！

整体结构

go的map实现，是一个hash map，对于hash冲突使用的是拉链法解决。比较不同的是，bucket下使用的是bmap这个结构，一个bmap会保存8个key。如果超过了8个，会重新申请一个bmap空间，通过指针overflow关联。

在超过负载的情况下，会进行扩容，扩容分成两种，一种是等量扩容，一种是扩展到原有的二倍。

extra用作gc遍历的时候优化和map内部的overflow的复用。

hmap

我们常规make map就是返回这个hmap结构的指针。所以我们把map作为函数的入参时，会之际直接对原有的map进行修改。

type hmap struct {
	count     int //map key的总数
	flags     uint8 //map状态的标记，写状态等
	B         uint8  //bucket数量为2^b
	noverflow uint16 //溢出的map的数量，B小于15是精准值，其他情况是近似值
	hash0     uint32 //hash随机值，算hash的是用到

	buckets    unsafe.Pointer //buckets指针
	oldbuckets unsafe.Pointer //只有在扩容的时候才会用到
	nevacuate  uintptr        //扩容位置计数

	extra *mapextra //map的扩展信息
}

bmap

一个bmap保存8个元素，内存中排列的方式是按照tophash，key，value，overflow的顺序方式。8个key集中存放一起，8个value集中存放一起，这么做的目的是在 map[int64]int8的情况下减少内存使用。

tophash除了有快速比较的功能以外，还有用于标记该bmap的状态

// A bucket for a Go map.
type bmap struct {
	// tophash generally contains the top byte of the hash value
	// for each key in this bucket. If tophash[0] < minTopHash,
	// tophash[0] is a bucket evacuation state instead.
	tophash [bucketCnt]uint8
	// Followed by bucketCnt keys and then bucketCnt values.
	// NOTE: packing all the keys together and then all the values together makes the
	// code a bit more complicated than alternating key/value/key/value/... but it allows
	// us to eliminate padding which would be needed for, e.g., map[int64]int8.
	// Followed by an overflow pointer.
}

mapextra

如果map的key，value都不包含指针，我们为了减少gc的时候扫描全部的map，我们会对该map做个特殊的标记。但是我们上面提到bmap.overflow本身是个指针，所以我们把所有overflow的bmap统一存储到mapextra结构中，用于gc扫描。所以只有在key，value都不包含指针的情况下，这个overflow才会有用。

nextOverflow是为了overflow的复用

// mapextra holds fields that are not present on all maps.
type mapextra struct {
	overflow    *[]*bmap
	oldoverflow *[]*bmap

	nextOverflow *bmap
}

如何定位一个Key

定位bucket
遍历bucket下的所有元素
定位对应的key

1.定位bucket

计算对应的hash，这里我们引入一个hash的随机种子

 hash := alg.hash(key, uintptr(h.hash0))

找到对应的bucket

bucket的长度一定为2^B，在这种情况下，我们可以把取模运算转化为位运算，降低计算的耗时

m := bucketMask(h.B) //获取bucket的掩码
//掩码为（1<<B)-1
func bucketMask(b uint8) uintptr {
	return bucketShift(b) - 1
} 
//通过hash&m得到第几个bucket，乘上bucketsize得到对应的地址
b := (*bmap)(add(h.buckets, (hash&m)*uintptr(t.bucketsize)))

2.遍历bucket下的所有元素

这里遍历该bucket下的所有bmap，通过overflow找到对应的溢出桶

每个for 8次，遍历所有的元素

for ; b != nil; b = b.overflow(t) {
	for i := uintptr(0); i < bucketCnt; i++ {
    }
}

也是通过地址来获取overflow

func (b *bmap) overflow(t *maptype) *bmap {
	return *(**bmap)(add(unsafe.Pointer(b), uintptr(t.bucketsize)-sys.PtrSize))
}

3.定位对应的key

先通过tophash快速适错，tophash相同才会，继续比较key

      for ; b != nil; b = b.overflow(t) {
		for i := uintptr(0); i < bucketCnt; i++ {
                        //先比较tophash是否正确
			if b.tophash[i] != top {
				continue
			}
                        //通过计算出来地址，来获取对应的key
			k := add(unsafe.Pointer(b), dataOffset+i*uintptr(t.keysize))
			if t.indirectkey {
				k = *((*unsafe.Pointer)(k))
			}
                        //如果key相同，则取出对应的value，返回
			if alg.equal(key, k) {
				v := add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.keysize)+i*uintptr(t.valuesize))
				if t.indirectvalue {
					v = *((*unsafe.Pointer)(v))
				}
				return v
			}
		}
	}

扩容

扩容的条件：

1.负载因子过大

如果当前count数量 > bucket的数量*6.5，则需要进行2倍的扩容

func overLoadFactor(count int, B uint8) bool {
	return count > bucketCnt && uintptr(count) > loadFactorNum*(bucketShift(B)/loadFactorDen)
}

2.溢出桶过多

noverflow表示当前map溢出桶的个数，如果B很大，这里是个近似值

func tooManyOverflowBuckets(noverflow uint16, B uint8) bool {
	if B > 15 {
		B = 15
	}
	return noverflow >= uint16(1)<<(B&15)
}

扩容开始

hashGrow做了一些扩容初始工作

func hashGrow(t *maptype, h *hmap) {
	bigger := uint8(1)
        //如果不是负载过的，则是等size增长
	if !overLoadFactor(h.count+1, h.B) {
		bigger = 0
		h.flags |= sameSizeGrow
	}
	oldbuckets := h.buckets
	newbuckets, nextOverflow := makeBucketArray(t, h.B+bigger, nil)

	flags := h.flags &^ (iterator | oldIterator)
	if h.flags&iterator != 0 {
		flags |= oldIterator
	}
	// commit the grow (atomic wrt gc)
	h.B += bigger
	h.flags = flags
        //buckets保存到oldbuckets中
        //在h.bucket创建新的buckets
	h.oldbuckets = oldbuckets
	h.buckets = newbuckets
	h.nevacuate = 0
	h.noverflow = 0
       //extra相关的转移操作
	if h.extra != nil && h.extra.overflow != nil {
		// Promote current overflow buckets to the old generation.
		if h.extra.oldoverflow != nil {
			throw("oldoverflow is not nil")
		}
		h.extra.oldoverflow = h.extra.overflow
		h.extra.overflow = nil
	}
	if nextOverflow != nil {
		if h.extra == nil {
			h.extra = new(mapextra)
		}
		h.extra.nextOverflow = nextOverflow
	}
}

迁移

每次在查询的时候，会对该个bucket先进行迁移，每次会迁移2个bucket

func growWork(t *maptype, h *hmap, bucket uintptr) {
	//迁移当前bucket
	evacuate(t, h, bucket&h.oldbucketmask())

	//如果还需要扩容，nevacuate是一个扩容桶的计数，会扩容nevacuate bucket，
	if h.growing() {
		evacuate(t, h, h.nevacuate)
	}
}

发布于 2019-11-11 20:20

哈希函数

Map

Go 语言