๊ตฌ์กฐํ๋ ์ธ๋ถ ๋ฐ์ดํฐ ์์ค์์ ๋ฐ์ดํฐ๋ฅผ ์ฝ์ด Spark ๋ฐ์ดํฐํ๋ ์์ผ๋ก ๋ก๋ํ๊ณ ,
ํน์ ํฌ๋งท์ผ๋ก ๋ฐ์ดํฐํ๋ ์์ ๋ฐ์ดํฐ๋ฅผ ์จ์ ๋ด๋ณด๋ด๊ธฐ ์ํด
DataFrameReader์ DataFrameWriter ์ธํฐํ์ด์ค๋ฅผ ์ฌ์ฉํ ์ ์๋ค.
pyspark.sql.DataFrameReader — PySpark 3.5.0 documentation
Interface used to load a DataFrame from external storage systems (e.g. file systems, key-value stores, etc). Use SparkSession.read to access this. Changed in version 3.4.0: Supports Spark Connect.
spark.apache.org
pyspark.sql.DataFrameWriter — PySpark 3.5.0 documentation
Interface used to write a DataFrame to external storage systems (e.g. file systems, key-value stores, etc). Use DataFrame.write to access this. Changed in version 3.4.0: Supports Spark Connect.
spark.apache.org
์ง์๋๋ ํ์ผ ํฌ๋งท์ csv, json, orc, parquet ๋ฑ์ด๋ค.
๋ค์์ csv ํ์ผ์ ์ฝ๊ณ ์ฐ๋ ์์์ด๋ค.
df = spark.read.csv("data.csv", header=True, schema=schema)
df.write.format("csv").save("data_copy.csv")
'๐ฝ Language & Frameworks > Spark' ์นดํ ๊ณ ๋ฆฌ์ ๋ค๋ฅธ ๊ธ
[๋ฌ๋ ์คํํฌ] ๋ฐ์ดํฐํ๋ ์ ์ฐ์ฐ๊ณผ ์ ์ฒ๋ฆฌ (1) | 2023.11.20 |
---|---|
[๋ฌ๋ ์คํํฌ] Column๊ณผ Row (1) | 2023.11.19 |
[๋ฌ๋ ์คํํฌ] ๋ฐ์ดํฐํ๋ ์ ์คํค๋ง (0) | 2023.09.03 |
[๋ฌ๋ ์คํํฌ] ์คํํฌ ์ฐ์ฐ์ ์ข ๋ฅ (0) | 2023.09.03 |