ffmpeg之AVPicture、AVFrame-leiminchn-ChinaUnix博客

1. AVPicture简介

AVPicture是图片数据存储结构体。最多可以存储4个组件，最后一个是α。
AVPicturek可以在栈上创建。

2. AVPicture定义

点击(此处)折叠或打开

typedef struct AVPicture {
uint8_t* data[AV_NUM_DATA_POINTERS]; ///< pointers to the image data planes
int linesize[AV_NUM_DATA_POINTERS]; ///< number of bytes per line
} AVPicture;

3. AVPicture字段说明

    uint8_t* data[AV_NUM_DATA_POINTERS] ：指向解码后的图片数据的指针。对于视频来说是YUV/RGB数据，对于音频来说是PCM数据。
    int linesize[AV_NUM_DATA_POINTERS] ：每行数据的大小。对于视频数据，行宽必须是16/32字节对齐（取决于cpu）, 因此未必等于图片的宽，一般大于图像的宽。

4.AVPicture数据存储分布

   YUV420p:
data[0]存储全部的Y数据，data[1]存储全部的U数据，data[2]存储全部的V数据。
      linesize[0]指明每行Y数据的宽度，linesize[1]指明每行U数据的宽度，linesize[2]指明每行U数据的宽度。
      每行数据大小计算公式： line = width + padding size(补足16/32字节对齐，取决于cpu)。
      y数据的行数为分辨率高度hegith，u、v数据的行数为分辨率高度的二分之一即height/2。因此，所有数据大小buff= linesize[0] * height + linesize[1] * height/2 + linesize[2] * height/2。

   RGB:
   data[0]存储全部RGB(A)数据，data[1],data[2],data[3]无效，因此linesize[1],linesize[2],linesize[3]也是无效的。
      每行数据大小计算公式： line = width *3 (无alpha通道，e.g rgb24)； line = width * 4 (有alpha通道, e.g rgb32)。
      所有数据大小buff = linesize[0] * height。

   PCM:




1.AVFrame简介

AVFrame存储解码后的原始音频或视频数据。
创建必须使用av_frame_alloc, 该函数只创建AVFrame本身，数据缓冲区由其他方式管理；释放必须使用av_frame_free。AVFrame可以一次创建，重复使用。
其内部的缓冲区通常由ffmpeg通过AVBuff API计数管理。
sizeof(AVFrame) 的大小不是public ABI的一部分，因此不在栈上创建AVFrame。
AVPicture是AVFrame的一个子集。

2.AVFrame定义

			点击(此处)折叠或打开
		
				typedef struct AVFrame {

				#define AV_NUM_DATA_POINTERS 8

				    /**

				     * pointer to the picture/channel planes.

				     * This might be different from the first allocated byte

				     *

				     * Some decoders access areas outside 0,0 - width,height, please

				     * see avcodec_align_dimensions2(). Some filters and swscale can read

				     * up to 16 bytes beyond the planes, if these filters are to be used,

				     * then 16 extra bytes must be allocated.

				     */

				    uint8_t* data[AV_NUM_DATA_POINTERS];

				    /**

				     * For video, size in bytes of each picture line.

				     * For audio, size in bytes of each plane.

				     *

				     * For audio, only linesize[0] may be set. For planar audio, each channel

				     * plane must be the same size.

				     *

				     * For video the linesizes should be multiples of the CPUs alignment

				     * preference, this is 16 or 32 for modern desktop CPUs.

				     * Some code requires such alignment other code can be slower without

				     * correct alignment, for yet other it makes no difference.

				     *

				     * @note The linesize may be larger than the size of usable data -- there

				     * may be extra padding present for performance reasons.

				     */

				    int linesize[AV_NUM_DATA_POINTERS];

				    /**

				     * pointers to the data planes/channels.

				     *

				     * For video, this should simply point to data[].

				     *

				     * For planar audio, each channel has a separate data pointer, and

				     * linesize[0] contains the size of each channel buffer.

				     * For packed audio, there is just one data pointer, and linesize[0]

				     * contains the total size of the buffer for all channels.

				     *

				     * Note: Both data and extended_data should always be set in a valid frame,

				     * but for planar audio with more channels that can fit in data,

				     * extended_data must be used in order to access all channels.

				     */

				    uint8_t** extended_data;

				    /**

				     * width and height of the video frame

				     */

				    int width, height;

				    /**

				     * number of audio samples (per channel) described by this frame

				     */

				    int nb_samples;

				    /**

				     * format of the frame, -1 if unknown or unset

				     * Values correspond to enum AVPixelFormat for video frames,

				     * enum AVSampleFormat for audio)

				     */

				    int format;

				    /**

				     * 1 -> keyframe, 0-> not

				     */

				    int key_frame;

				    /**

				     * Picture type of the frame.

				     */

				    enum AVPictureType pict_type;

				#if FF_API_AVFRAME_LAVC

				    attribute_deprecated

				    uint8_t* base[AV_NUM_DATA_POINTERS];

				#endif

				    /**

				     * Sample aspect ratio for the video frame, 0/1 if unknown/unspecified.

				     */

				    AVRational sample_aspect_ratio;

				    /**

				     * Presentation timestamp in time_base units (time when frame should be shown to user).

				     */

				     int64_t pts;

				    /**

				     * PTS copied from the AVPacket that was decoded to produce this frame.

				     */

				     int64_t pkt_pts;

				    /**

				     * DTS copied from the AVPacket that triggered returning this frame. (if frame threading isn't used)

				     * This is also the Presentation time of this AVFrame calculated from

				     * only AVPacket.dts values without pts values.

				     */

				    int64_t pkt_dts;

				    /**

				     * picture number in bitstream order

				     */

				    int coded_picture_number;

				    /**

				     * picture number in display order

				     */

				    int display_picture_number;

				    /**

				     * quality (between 1 (good) and FF_LAMBDA_MAX (bad))

				     */

				    int quality;

				#if FF_API_AVFRAME_LAVC

				    attribute_deprecated

				    int reference;

				    /**

				     * QP table

				     */

				    attribute_deprecated

				    int8_t* qscale_table;

				    /**

				     * QP store stride

				     */

				    attribute_deprecated

				    int qstride;

				    attribute_deprecated

				    int qscale_type;

				    /**

				     * mbskip_table[mb]>=1 if MB didn't change

				     * stride= mb_width = (width+15)>>4

				     */

				    attribute_deprecated

				    uint8_t* mbskip_table;

				    /**

				     * motion vector table

				     * @code

				     * example:

				     * int mv_sample_log2= 4 - motion_subsample_log2;

				     * int mb_width= (width+15)>>4;

				     * int mv_stride= (mb_width << mv_sample_log2) + 1;

				     * motion_val[direction][x + y*mv_stride][0->mv_x, 1->mv_y];

				     * @endcode

				     */

				     int16_t(*motion_val[2])[2];

				    /**

				     * macroblock type table

				     * mb_type_base + mb_width + 2

				     */

				    attribute_deprecated

				    uint32_t* mb_type;

				    /**

				     * DCT coefficients

				     */

				    attribute_deprecated

				    short* dct_coeff;

				    /**

				     * motion reference frame index

				     * the order in which these are stored can depend on the codec.

				     */

				    attribute_deprecated

				    int8_t* ref_index[2];

				#endif

				    /**

				     * for some private data of the user

				     */

				    void* opaque;

				    /**

				     * error

				     */

				    uint64_t error[AV_NUM_DATA_POINTERS];

				#if FF_API_AVFRAME_LAVC

				    attribute_deprecated

				    int type;

				#endif

				    /**

				     * When decoding, this signals how much the picture must be delayed.

				     * extra_delay = repeat_pict / (2*fps)

				     */

				    int repeat_pict;

				    /**

				     * The content of the picture is interlaced.

				     */

				    int interlaced_frame;

				    /**

				     * If the content is interlaced, is top field displayed first.

				     */

				    int top_field_first;

				    /**

				     * Tell user application that palette has changed from previous frame.

				     */

				    int palette_has_changed;

				#if FF_API_AVFRAME_LAVC

				    attribute_deprecated

				    int buffer_hints;

				    /**

				     * Pan scan.

				     */

				    attribute_deprecated

				    struct AVPanScan* pan_scan;

				#endif

				    /**

				     * reordered opaque 64bit (generally an integer or a double precision float

				     * PTS but can be anything).

				     * The user sets AVCodecContext.reordered_opaque to represent the input at

				     * that time,

				     * the decoder reorders values as needed and sets AVFrame.reordered_opaque

				     * to exactly one of the values provided by the user through AVCodecContext.reordered_opaque

				     * @deprecated in favor of pkt_pts

				     */

				    int64_t reordered_opaque;

				#if FF_API_AVFRAME_LAVC

				    /**

				     * @deprecated this field is unused

				     */

				    attribute_deprecated 
			
				    void* hwaccel_picture_private;

				    attribute_deprecated

				    struct AVCodecContext* owner;

				    attribute_deprecated

				    void* thread_opaque;

				    /**

				     * log2 of the size of the block which a single vector in motion_val represents:

				     * (4->16x16, 3->8x8, 2-> 4x4, 1-> 2x2)

				     */

				    uint8_t motion_subsample_log2;

				#endif

				    /**

				     * Sample rate of the audio data.

				     */

				    int sample_rate;

				    /**

				     * Channel layout of the audio data.

				     */

				    uint64_t channel_layout;

				    /**

				     * AVBuffer references backing the data for this frame. If all elements of

				     * this array are NULL, then this frame is not reference counted. This array

				     * must be filled contiguously -- if buf[i] is non-NULL then buf[j] must

				     * also be non-NULL for all j < i.

				     *

				     * There may be at most one AVBuffer per data plane, so for video this array

				     * always contains all the references. For planar audio with more than

				     * AV_NUM_DATA_POINTERS channels, there may be more buffers than can fit in

				     * this array. Then the extra AVBufferRef pointers are stored in the

				     * extended_buf array.

				     */

				    AVBufferRef* buf[AV_NUM_DATA_POINTERS];

				    /**

				     * For planar audio which requires more than AV_NUM_DATA_POINTERS

				     * AVBufferRef pointers, this array will hold all the references which

				     * cannot fit into AVFrame.buf.

				     *

				     * Note that this is different from AVFrame.extended_data, which always

				     * contains all the pointers. This array only contains the extra pointers,

				     * which cannot fit into AVFrame.buf.

				     *

				     * This array is always allocated using av_malloc() by whoever constructs

				     * the frame. It is freed in av_frame_unref().

				     */

				    AVBufferRef** extended_buf;

				    /**

				     * Number of elements in extended_buf.

				     */

				    int nb_extended_buf;

				    AVFrameSideData** side_data;

				    int nb_side_data;

				/**

				 * @defgroup lavu_frame_flags AV_FRAME_FLAGS

				 * Flags describing additional frame properties.

				 *

				 * @{

				 */

				/**

				 * The frame data may be corrupted, e.g. due to decoding errors.

				 */

				#define AV_FRAME_FLAG_CORRUPT (1 << 0)

				/**

				 * @}

				 */

				    /**

				     * Frame flags, a combination of @ref lavu_frame_flags

				     */

				    int flags;

				    /**

				     * MPEG vs JPEG YUV range.

				     * It must be accessed using av_frame_get_color_range() and

				     * av_frame_set_color_range().

				     * - encoding: Set by user

				     * - decoding: Set by libavcodec

				     */

				    enum AVColorRange color_range;

				    enum AVColorPrimaries color_primaries;

				    enum AVColorTransferCharacteristic color_trc;

				    /**

				     * YUV colorspace type.

				     * It must be accessed using av_frame_get_colorspace() and

				     * av_frame_set_colorspace().

				     * - encoding: Set by user

				     * - decoding: Set by libavcodec

				     */

				    enum AVColorSpace colorspace;

				    enum AVChromaLocation chroma_location;

				    /**

				     * frame timestamp estimated using various heuristics, in stream time base

				     * Code outside libavcodec should access this field using:

				     * av_frame_get_best_effort_timestamp(frame)

				     * - encoding: unused

				     * - decoding: set by libavcodec, read by user.

				     */

				     int64_t best_effort_timestamp;

				    /**

				     * reordered pos from the last AVPacket that has been input into the decoder

				     * Code outside libavcodec should access this field using:

				     * av_frame_get_pkt_pos(frame)

				     * - encoding: unused

				     * - decoding: Read by user.

				     */

				    int64_t pkt_pos;

				    /**

				     * duration of the corresponding packet, expressed in

				     * AVStream->time_base units, 0 if unknown.

				     * Code outside libavcodec should access this field using:

				     * av_frame_get_pkt_duration(frame)

				     * - encoding: unused

				     * - decoding: Read by user.

				     */

				    int64_t pkt_duration;

				    /**

				     * metadata.

				     * Code outside libavcodec should access this field using:

				     * av_frame_get_metadata(frame)

				     * - encoding: Set by user.

				     * - decoding: Set by libavcodec.

				     */

				    AVDictionary *metadata;

				    /**

				     * decode error flags of the frame, set to a combination of

				     * FF_DECODE_ERROR_xxx flags if the decoder produced a frame, but there

				     * were errors during the decoding.

				     * Code outside libavcodec should access this field using:

				     * av_frame_get_decode_error_flags(frame)

				     * - encoding: unused

				     * - decoding: set by libavcodec, read by user.

				     */

				    int decode_error_flags;

				#define FF_DECODE_ERROR_INVALID_BITSTREAM   1

				#define FF_DECODE_ERROR_MISSING_REFERENCE   2

				    /**

				     * number of audio channels, only used for audio.

				     * Code outside libavcodec should access this field using:

				     * av_frame_get_channels(frame)

				     * - encoding: unused

				     * - decoding: Read by user.

				     */

				    int channels;

				    /**

				     * size of the corresponding packet containing the compressed

				     * frame. It must be accessed using av_frame_get_pkt_size() and

				     * av_frame_set_pkt_size().

				     * It is set to a negative value if unknown.

				     * - encoding: unused

				     * - decoding: set by libavcodec, read by user.

				     */

				    int pkt_size;

				    /**

				     * Not to be accessed directly from outside libavutil

				     */

				    AVBufferRef* qp_table_buf;

				} AVFrame;

3.AVFrame字段说明

  uint8_t* data[AV_NUM_DATA_POINTERS] : 指向解码后的数据的指针数组。参考AVPicture字段说明。解码时，由ffmpeg相关解码函数设置；编码时由调用者设置（可以通过av_img_copy将解码后的数据复制到即将进行编码的AVFrame中）。
  int linesize[AV_NUM_DATA_POINTERS] ：指明每行数据的大小。参考AVPicture字段说明。解码时，由ffmpeg相关解码函数设置；编码时由调用者设置（可以通过av_img_copy将解码后的数据复制到即将进行编码的AVFrame中）。
  uint8_t** extended_data ：扩展数据缓冲区。解码时由ffmpeg设置，供调用者读取, 编码时不使用该字段。
int width ：视频帧的宽度。解码时由ffmpeg设置，供调用者读取, 编码时不使用该字段。
  int height ：视频帧的高度。解码时由ffmpeg设置，供调用者读取, 编码时不使用该字段。
  int nb_samples ：音频帧的数量。解码时，由ffmpeg相关解码函数设置；编码时由调用者设置。
int format ：图片格式。-1时未设置或未知。对于视频数据而言是枚举类型AVPixelFormat（yuv402p,rgb24等），对于音频而言是枚举类型AVSampleFormat。解码时由ffmpeg设置，供调用者读取, 编码时不使用该字段。
int key_frame ：是否为关键帧标志。为1时为关键帧，为0时为非关键帧。解码和编码时都由ffmpeg设置。
  enum AVPictureType pict_type ：图片类型0。为枚举类型AVPictureType（I,B,P等）。解码和编码时都由ffmpeg设置。
uint8_t* base[AV_NUM_DATA_POINTERS] ：
AVRational sample_aspect_ratio ：视频的宽高比。0时未知，1时未指定。解码时由ffmpeg设置，供调用者读取, 编码时不使用该字段。
  int64_t pts ：显示时间戳。解码时，由ffmpeg相关解码函数设置；编码时由调用者设置。
int64_t pkt_pts ：拷贝自AVPakcet的显示时间戳。显示时，依赖该字段，而非 pts 字段。解码时由ffmpeg设置，供调用者读取, 编码时不使用该字段。
  int64_t pkt_dts ：拷贝自AVPacket的解码时间戳。解码时由ffmpeg设置，供调用者读取, 编码时不使用该字段。
int coded_picture_number ：图片在比特流序列中的序号。解码和编码时都由ffmpeg设置。
  int display_picture_number ：图片在显示序列中的序号。解码和编码时都由ffmpeg设置。
  int quality ：质量。处于1（good）至 FF_LAMBDA_MAX (bad))之间。解码和编码时都由ffmpeg设置。
int reference ：指明图片是否是引用的。解码和编码时都由ffmpeg设置。
  int8_t* qscale_table ： QP表。解码时由ffmpeg设置，供调用者读取, 编码时不使用该字段。
  int qstride ： QP存储步幅。解码时由ffmpeg设置，供调用者读取, 编码时不使用该字段。
  int qscale_type ：
  uint8_t* mbskip_table ：跳过宏块表。解码时由ffmpeg设置，供调用者读取, 编码时不使用该字段。
int16_t (*motion_val[2])[2] ：运动矢量表。解码时由ffmpeg设置，供调用者读取, 编码时不使用该字段。
  uint32_t* mb_type ：宏块类型表。解码时由ffmpeg设置，供调用者读取, 编码时不使用该字段。
  short* dct_coeff ： DCT系数。解码时由ffmpeg设置，供调用者读取, 编码时不使用该字段。
  int8_t* ref_index[2] ：参考帧列表。解码时由ffmpeg设置，供调用者读取, 编码时不使用该字段。
  void* opaque : 用户私有数据。解码时由用户设置, 编码时不使用该字段。
  uint64_t error[AV_NUM_DATA_POINTERS] : error。解码时不适用该字段，编码时由ffmpeg设置。
  int type : buffer的类型。解码和编码时都由分配该结构体者设置。
  int repeat_pict : 图片延迟显示。extra_delay=repeat_pict/(2*fps)。解码时由ffmpeg设置，供调用者读取，编码时不使用该字段。
  int interlaced_frame : 隔行扫描图片标志。解码时由ffmpeg设置，供调用者读取，默认为0，编码时由调用者指定。
  int top_field_first ：如果隔行扫描图片，顶部字段首先显示标志。解码时由ffmpeg设置，供调用者读取，编码时由调用者指定。
  int palette_has_changed ：告知调用者，调色板是否从前一帧改变。解码时由ffmpeg设置，供调用者读取，默认为0，编码时不使用该字段。
  int buffer_hints ：建议buffer类型，不为0时有效。解码时由ffmpeg设置，供调用者读取，默认为0，编码时不使用该字段。
  struct AVPanScan* pan_scan ：平移扫描标志。解码时由ffmpeg设置，供调用者读取，编码时由调用者指定。
  int64_t reordered_opaque ：64位重新排列不透明值。解码时由ffmpeg设置，供调用者读取，编码时不使用该字段。
  void* hwaccel_picture_private ：硬件加速器私有数据。解码时由ffmpeg设置，供调用者读取，编码时不使用该字段。
  struct AVCodecContext* owner ：指向最后一次调用ff_thread_get_buffer()的AVCodeContext指针。解码和编码时都由ffmpeg设置。
  void* thread_opaque ：多线程时存储特定帧信息。解码和编码时都由ffmpeg设置。
  uint8_t motion_subsample_log2 ：在motion_val中单个向量表示的块的大小的log2。解码时由ffmpeg设置，供调用者读取，编码时不使用该字段。
  int sample_rate ：音频采样率。解码时由调用者设置，编码时不使用该字段。
  uint64_t channel_layout ：音频通道布局。解码时由调用者设置，编码时不使用该字段。
  AVBufferRef* buf[AV_NUM_DATA_POINTERS] ：
  AVBufferRef** extended_buf ：
  int nb_extended_buf ：
  AVFrameSideData** side_data ：
  int nb_side_data ：
  int flags ：
  enum AVColorRange color_range ：
  enum AVColorPrimaries color_primaries ：
enum AVColorTransferCharacteristic color_trc ：
  enum AVColorSpace colorspace ：
  enum AVChromaLocation chroma_location ：
  int64_t best_effort_timestamp ：在时间基础上使用各种启发式估计的帧时间戳。libavcodec之外的代码应使用av_frame_get_best_effort_timestamp(frame)访问此字段。解码时由ffmpeg设置，供调用者读取，编码时不使用该字段。
int64_t pkt_pos ：
int64_t pkt_duration ：
  AVDictionary* metadata：
int decode_error_flags：
int channels ：
int pkt_size :
  AVBufferRef* qp_table_buf :