pyspark.sql.streaming.DataStreamReader#
- class pyspark.sql.streaming.DataStreamReader(spark)[source]#
- Interface used to load a streaming - DataFramefrom external storage systems (e.g. file systems, key-value stores, etc). Use- SparkSession.readStreamto access this.- New in version 2.0.0. - Changed in version 3.5.0: Supports Spark Connect. - Notes - This API is evolving. - Examples - >>> spark.readStream <...streaming.readwriter.DataStreamReader object ...> - The example below uses Rate source that generates rows continuously. After that, we operate a modulo by 3, and then writes the stream out to the console. The streaming query stops in 3 seconds. - >>> import time >>> df = spark.readStream.format("rate").load() >>> df = df.selectExpr("value % 3 as v") >>> q = df.writeStream.format("console").start() >>> time.sleep(3) >>> q.stop() - Methods - csv(path[, schema, sep, encoding, quote, ...])- Loads a CSV file stream and returns the result as a - DataFrame.- format(source)- Specifies the input data source format. - json(path[, schema, primitivesAsString, ...])- Loads a JSON file stream and returns the results as a - DataFrame.- load([path, format, schema])- Loads a data stream from a data source and returns it as a - DataFrame.- option(key, value)- Adds an input option for the underlying data source. - options(**options)- Adds input options for the underlying data source. - orc(path[, mergeSchema, pathGlobFilter, ...])- Loads a ORC file stream, returning the result as a - DataFrame.- parquet(path[, mergeSchema, pathGlobFilter, ...])- Loads a Parquet file stream, returning the result as a - DataFrame.- schema(schema)- Specifies the input schema. - table(tableName)- Define a Streaming DataFrame on a Table. - text(path[, wholetext, lineSep, ...])- Loads a text file stream and returns a - DataFramewhose schema starts with a string column named "value", and followed by partitioned columns if there are any.- xml(path[, rowTag, schema, ...])- Loads a XML file stream and returns the result as a - DataFrame.