StreamGraph是flink四层执行图中的第一层图,代码在org.apache.flink.streaming.api.graph包中,第一层graph主要做的事情是将所有的stransformation添加到DAG中,并设置并行度,设置slot槽位
具体涉及到的transformation大概有11个,继承图如下
首先我们来看一下如何获取StreamTransformation
publicDataStreamSource addSource(SourceFunction function, String sourceName, TypeInformation typeInfo) { if (typeInfo == null) { if (function instanceof ResultTypeQueryable) { typeInfo = ((ResultTypeQueryable ) function).getProducedType(); } else { try { typeInfo = TypeExtractor.createTypeInfo( SourceFunction.class, function.getClass(), 0, null, null); } catch (final InvalidTypesException e) { typeInfo = (TypeInformation ) new MissingTypeInfo(sourceName, e); } } } boolean isParallel = function instanceof ParallelSourceFunction; clean(function); StreamSource sourceOperator; if (function instanceof StoppableFunction) { sourceOperator = new StoppableStreamSource<>(cast2StoppableSourceFunction(function)); } else { sourceOperator = new StreamSource<>(function); } return new DataStreamSource<>(this, typeInfo, sourceOperator, isParallel, sourceName); }
最终返回的是DataStreamSource,内部封装了SourceTransformation,下面看一下DataStream的类图结构
可以看到DataStreamSource是DataStream的子类
DataStreamSource是DataStream的数据流抽象,StreamSource是StreamOperator的抽象,在 flink 中一个 DataStream 封装了一次数据流转换,一个 StreamOperator 封装了一个函数接口,比如 map、reduce、keyBy等。下面我们在看一下StreamOperator的类图关系
可以看到StreamMap/StreamFlatMap都是operator的子类,下面来看一段具体生成operator和transformation的代码
/** * Applies a Map transformation on a {@link DataStream}. The transformation * calls a {@link MapFunction} for each element of the DataStream. Each * MapFunction call returns exactly one element. The user can also extend * {@link RichMapFunction} to gain access to other features provided by the * {@link org.apache.flink.api.common.functions.RichFunction} interface. * * @param mapper * The MapFunction that is called for each element of the * DataStream. * @param* output type * @return The transformed {@link DataStream}. */ public SingleOutputStreamOperator map(MapFunction mapper) { TypeInformation outType = TypeExtractor.getMapReturnTypes(clean(mapper), getType(), Utils.getCallLocationName(), true); return transform("Map", outType, new StreamMap<>(clean(mapper))); }/** * Method for passing user defined operators along with the type * information that will transform the DataStream. * * @param operatorName * name of the operator, for logging purposes * @param outTypeInfo * the output type of the operator * @param operator * the object containing the transformation logic * @param * type of the return stream * @return the data stream constructed */ @PublicEvolving public SingleOutputStreamOperator transform(String operatorName, TypeInformation outTypeInfo, OneInputStreamOperator operator) { // read the output type of the input Transform to coax out errors about MissingTypeInfo transformation.getOutputType(); OneInputTransformation resultTransform = new OneInputTransformation<>( this.transformation, operatorName, operator, outTypeInfo, environment.getParallelism()); @SuppressWarnings({ "unchecked", "rawtypes" }) SingleOutputStreamOperator returnStream = new SingleOutputStreamOperator(environment, resultTransform); getExecutionEnvironment().addOperator(resultTransform); return returnStream; }
到这里基本说完了DataStream和StreamOperator,包含transformation的产生,DataStream的操作等,下一篇我们在来说一下transformation的转换