์ปฌ๋ผ Column
์คํํฌ ๋ฐ์ดํฐํ๋ ์์์๋ Column์ ์ด๋ฆ์ ์ด์ฉํด ๋ค์ํ ์ฐ์ฐ์ ์ํํ ์ ์๋ค.
pyspark.sql.Column — PySpark 3.5.0 documentation
A column in a DataFrame. Changed in version 3.4.0: Supports Spark Connect. Select a column out of a DataFrame >>> df.name Column<’name’> >>> df[“name”] Column<’name’>
spark.apache.org
Pyspark์์ column์ ์ ๊ทผํ๋ ๋ฐฉ์์ ์ฌ๋ฌ ๊ฐ์ง๊ฐ ์๋๋ฐ, ํ๋๋ col("columnName") ํจ์๋ฅผ ์ฌ์ฉํ๋ ๊ฒ,๋ค๋ฅธ ํ๋๋ df.columnName์ ์ฌ์ฉํ๋ ๊ฒ์ด๋ค.
๋ค์์ Column์ ์ด์ฉํ ์ฐ์ฐ์ ๋ช๊ฐ์ง ์์์ด๋ค.
from pyspark.sql.functions import col, concat
df.withColumn("colABC", concat(col("colA"), col("colB"), col("colC"))) # colA, colB, colC๋ฅผ ํฉ์ณ colABC๋ผ๋ column ์์ฑ
df.select(col("colA")) # colA๋ง ์ ํ
df.sort(col("colA").desc()) # colA์ ๋ํด ๋ด๋ฆผ์ฐจ์์ผ๋ก ์ ๋ ฌ
df.sort(df.colA.desc()) # ์์ ๋์ผํ ์ฝ๋
๋ก์ฐ Row
์คํํฌ์ Row๋ ์์๊ฐ ์๋ ํ๋์ ์งํฉ ๊ฐ์ฒด๋ผ๊ณ ๋ณผ ์ ์๋ค. ๋ฐ๋ผ์ ์ธ๋ฑ์ค๋ฅผ ์ด์ฉํ์ฌ ์ ๊ทผํ ์ ์๋ค.
pyspark.sql.Row — PySpark 3.5.0 documentation
key in row will search through row keys. Row can be used to create a row object by using named arguments. It is not allowed to omit a named argument to represent that the value is None or missing. This should be explicitly set to None in this case. Changed
spark.apache.org
from pyspark.sql import Row
row = Row(6, "text", ["a", "b"])
row[1]
>> "text"
๋ค์๊ณผ ๊ฐ์ด Row๋ค์ ๋ฐ์ดํฐํ๋ ์์ผ๋ก ๋ง๋ค ์ ์๋ค.
rows = [Row("Alice", 11), Row("Bob", 8)]
df = spark.createDataFrame(rows, ["Name", "Age"])
'๐ฝ Language & Frameworks > Spark' ์นดํ ๊ณ ๋ฆฌ์ ๋ค๋ฅธ ๊ธ
[๋ฌ๋ ์คํํฌ] ๋ฐ์ดํฐํ๋ ์ ์ฐ์ฐ๊ณผ ์ ์ฒ๋ฆฌ (1) | 2023.11.20 |
---|---|
[๋ฌ๋ ์คํํฌ] ๋ฐ์ดํฐํ๋ ์ ์ฝ๊ณ ๋ด๋ณด๋ด๊ธฐ (0) | 2023.11.19 |
[๋ฌ๋ ์คํํฌ] ๋ฐ์ดํฐํ๋ ์ ์คํค๋ง (0) | 2023.09.03 |
[๋ฌ๋ ์คํํฌ] ์คํํฌ ์ฐ์ฐ์ ์ข ๋ฅ (0) | 2023.09.03 |