google架构师对Innodb存储结构分析翻译[1]

2840阅读 0评论2013-12-17 zxszcaijin
分类:Mysql/postgreSQL

Innodb’s data storage model uses “spaces”, often called “tablespaces” in the context of mysql, and sometimes called “file spaces” in innodb itself. A space may consist of multiple actual files at the operating system level (e.g. Ibdata1, ibdata2, etc.) But it is just a single logical file — multiple physical files are just treated as though they were physically concatenated together.

Innodb的数据存储模型通常使用“space”来标识,在mysql上下文中通常称作”tablespace”,在innodb内部有时候也被称作”file space”。在操作系统层面,一个表空间或许有多个实际的物理文件组成,但在逻辑上他们被看做一个统一的整体 —多个物理文件好像物理的连接在一起从而形成一个文件。

Each space in innodb is assigned a 32-bit integer space ID, which is used in many different places to refer to the space. Innodb always has a “system space”, which is always assigned the space ID of 0. The system space is used for various special bookkeeping that innodb requires. Through mysql, innodb currently only supports additional spaces in the form of “file per table” spaces, which create an .ibd file for each mysql table. Internally, this .ibd file is actually a fully functional space which could contain multiple tables, but in the implementation with mysql, they will only contain a single table.
Innodb会为每个表空间分配一个32-bit的整型的space ID,它被用作很多不同的地方来指向同一个space。Innodb总是会有一个”system space(系统表空间)”,该表空间也总是会用space ID为0来标识。系统表空间会存储各种innodb所需要的一些特殊记录。在mysql中,除了系统表空间外,innodb目前只支持以”file per table(innodb_file_per_table=on)”形式的额外的表空间,这种新式的表空间会为每一个mysql的表创建一个.ibd文件。在内部,这个.ibd文件实际上一个功能齐全的,可以包含多个表的文件,但是在mysql的实现上,它只包含一个表。

Pages

Each space is divided into pages, normally 16 kib each (this can differ for two reasons: if the compile-time define UNIV_PAGE_SIZE is changed, or if innodb compression is used). Each page within a space is assigned a 32-bit integer page number, often called “offset”, which is actually just the page’s offset from the beginning of the space (not necessarily the file, for multi-file spaces). So, page 0 is located at file offset 0, page 1 at file offset 16384, and so on. (The astute may remember that innodb has a limit of 64tib of data; this is actually a limit per space, and is due primarily to the page number being a 32-bit integer combined with the default page size: 232 x 16 kib = 64 tib.)

每一个空间被划分成很多的页,每个页的大小为16KB (有两种原因可能导致这个大小不是16KB:[1] 在编译的时候UNIV_PAGE_SIZE定义被改变. [2] innodb使用了压缩的格式存储.).每一个page被分配一个32位整型的page number,通常被称作”偏移”,page number实际只是标识这个页在同一个space id标识的表空间中的起始位置的偏移量.(不一定是一个文件,也有可能是多个文件)。简而言之,page 0标识的是偏移量为0的页,page 1标识偏移量为 1*16k=1*16384的页,等等.如果细心回忆,你可能还记得,Innodb对每一个表空间的数据总量限制为64TB,这主要是由于page number是一个32位的整型,所以它所能标识最大的表空间的总大小为: 232 x 16 kib = 64 tib(在使用默认页为16KB的情况下)。

A page is laid out as follows:
 一个页的基本结构如下:



Every page has a 38-byte FIL header and 8-byte FIL trailer (FIL is a shortened form of “file”). The header contains a field which is used to indicate the page type, which determines the structure of the rest of the page. The structure of the FIL header and trailer are:

每一个页包含一个38字节的文件头和一个8字节的文件尾。文件头包含一个被用于表示个文件类型的区域,该类型决定了这个页剩下空间的结构(除了头和尾以外剩下的空间)。文件头和文件尾的结构如下所示:




The FIL header and trailer contain the following structures (not in order):

文件头和文件尾包含如下的结构(不是按照存储的顺序描述):

Space files

A space file is just a concatenation of many (up to  232) pages. For more efficient management, pages are grouped into blocks of 1 mib (64 contiguous pages with the default page size of 16 kib), and called an “extent”. Many structures then refer only to extents to allocate pages within a space.Innodb needs to do some bookkeeping to keep track of all of the pages, extents, and the space itself, so a space file has some mandatory super-structure:

一个表空间文件由许多的page组合而成(最多为 232).为了更有效的管理,page被组织成一个1MB的block(64个连续的page,每个page使用默认的16KB的空间),被称作”extent”.许多结构只用extent来为表空间分配存储空间。
Innodb需要做一些记录来跟踪所有的page,extent和表空间自身,所以表空间文件有一些必须的超级结构。



 

The first page (page 0) in a space is always an FSP_HDR or “file space header” page. The FSP_HDR page contains (confusingly) an FSP header structure, which tracks things like the size of the space and lists of free, fragmented, and full extents. (A more detailed discussion of free space management is reserved for a future post.)

表空间的第一个page(page 0)总是一个FSP_HDR或者说是一个包含表空间元数据信息的page。 FSP_HDR page包含了(容易混淆)一个FSP的头结构,用来跟踪记录诸如表空间的大小,free lists的信息,页分裂的信息,和所有extent的信息。(更多关于空间空间的管理详细讨论在保留到以后的文章中会加以分析)。

An FSP_HDR page only has enough space internally to store bookkeeping information for 256 extents (or 16,384 pages, 256 mib), so additional space must be reserved every 16,384 pages for bookkeeping information in the form of an XDES page. The structure of XDES and FSP_HDR pages is identical, except that the FSP header structure is zeroed-out in XDES pages. These additional pages are allocated automatically as a space file grows.

一个FSP_HDR 页的内部只有足够的空间来记录256个extent的信息(或者16384个page,256MB),所以额外的XDES page空必须为每16384个页预留空间来记录相关的信息。,FSP头部结构是一个清零的XDES PAGE,除此之外XDES的结构和FSP_HDR的结构相同。额外的页会在表空间的文件增长的同时被自动的分配。

The third page in each space (page 2) will be an INODE page, which is used to store lists related to file segments (groupings of extents plus an array of singly-allocated “fragment” pages). Each INODE page can store 85 INODE entries, and each index requires two INODE entries. (A more detailed discussion of INODE entries and file segments is reserved for a future post.)

Alongside each FSP_HDR or XDES page will also be an IBUF_BITMAP page, which is used for bookkeeping information related to insert buffering, and is outside the scope of this post.

每个表空间的第三个page(page 2)将会是一个INODE page,它用于存储相关文件segment的list(由extents和一些单独分配的”片段”的集合组成)。每一个INNODE page可以保存85个INNODE条目,而且每个INDEX PAGE需要至少两个INNODE条目。(更多关于INNODE条目和文件 segments的讨论在以后发表的文章中会做详细的讨论。)
除了每个FSP_HDR或者XDES page,还包括一个IBUF_BITMAP page,被用于记录关于insert buffering的信息,这已经超出了本文要讨论的范围

The system space

 The system space (space 0) is special in innodb, and contains quite a few pages allocated at fixed page numbers to store a wide range of information critical to innodb’s operation. Since the system space is a space like any other, it has the  required FSP_HDR, IBUF_BITMAP, and INODE pages allocated as its first three pages. After that, it is a bit special:

 系统表空间(space id为0的空间)在Innodb中是一个特殊的表空间,而且包含一些固定page number的page来存储关于innodb操作至关重要的信息。因为系统包空间和其他表空间很像,它也包括FSP_HDR,IBUF_BITMAP,和INNODE pages三个page,这三个page都分配在该表
 空间的前三个page中。从第三个页之后,它就有点特别了:



The following pages are allocated:
下面是系统表空间page的信息:

All other pages are allocated on an as-needed basis to indexes, rollback segments, undo logs, etc.

所有其他的页按需分配,包括index,回滚段,undo日志,等等..


Per-table space files

Innodb offers a "file per table" mode, which will create a file (which as explained above is actually a space) for each mysql table created. A better name for this feature may be "space per table" rather than "file per table". The .ibd file created for each table has the typical space file structure:

Innodb提供了一个”file per table”的模式,它可以为每一个mysql表创建一个文件(在上面的解释中实际被称作表空间).相比”file per table”而言,”space per table”或许是一个更好的名称.每一个表都会创建一个.ibd文件,该典型的表空间文件的结构如下:

Ignoring "fast index creation" which adds indexes at runtime, after the requisite 3 initial pages, the next pages allocated in the space will be the root pages of each index in the table, in the order they were defined in the table creation. Page 3 will be the root of the clustered index, Page 4 will be the root of the first secondary key, etc.
忽略”fast index creation”会在运行时加索引的情况,在必需的前三个初始化page之后的page分配的是索引组织表的根索引节点,顺序的记录了表创建定义的一些信息。   Page 3是索引组织表的根节点,Page 4将会是第一个二级索引的根节点,等等.

Since most of innodb's bookkeeping structures are stored in the system space, most pages allocated in a per-table space will be of type INDEX and store table data.

因为大部分innodb的记录结构都存放在系统表空间,每个page被分配给每个表空间(非系统表空间)将会是一个INDEX类型,同时存储了表的数据。

What's next?

 Next we'll look at free space management within innodb: extent descriptors, file segments (inodes), and lists.

 下一节,我们探索innodb中关于空闲空间的管理:extent描述信息,file段(inodes),和链表



 Ignoring "fast index creation" which adds indexes at runtime, after the requisite 3 initial pages, the next pages allocated in the space will be the root pages of each index in the table, in the order they were defined in the table creation. Page 3 will be the root of  the clustered index, Page 4 will be the root of the first secondary key, etc.

 忽略”fast index creation”会在运行时加索引的情况,在必需的前三个初始化page之后的page,分配的是索引组织表的根索引节点,顺序的记录了表创建定义的一些信息。Page 3是索引组织表的根节点,Page 4将会是第一个二级索引的根节点,等等.
 Since most of innodb's bookkeeping structures are stored in the system space, most pages allocated in a per-table space will be of type INDEX and store table data.

 因为大部分innodb的记录结构都存放在系统表空间,每个page被分配给每个表空间(非系统表空间)将会是一个INDEX类型,同时存储了表的数据。

What's next?

Next we'll look at free space management within innodb: extent descriptors, file segments (inodes), and lists.

下一节,我们探索innodb中关于空闲空间的管理:extent描述信息,file段(inodes),和链表




 该文章翻译自google高级架构师Jeremy Cole
 原文:  http://blog.jcole.us/2013/01/03/the-basics-of-innodb-space-file-layout/
 翻译当中可能有很多不准确的地方,希望看到的朋友能够指出来。



上一篇:Mysql诊断案例一则
下一篇:Mysql 计数表的设计