Spark DataFrame Join
Prepare Data
case class Person(id:Int, name:String)
case class Info(id:Int, age:Int)
val personDF = sc.makeRDD(Seq(Person(1,"A"), Person(2, "B"))).toDF("id", "name")
val infoDF = sc.makeRDD(Seq(Info(1,11), Info(3, 33))).toDF("id", "age")
samples
just join
personDF.join(infoDF).show() +---+----+---+---+ | id|name| id|age| +---+----+---+---+ | 1| A| 1| 11| | 1| A| 3| 33| | 2| B| 1| 11| | 2| B| 3| 33| +---+----+---+---+
inner join using column
personDF.join(infoDF, "id").show() +---+----+---+ | id|name|age| +---+----+---+ | 1| A| 11| +---+----+---+
left outer join
personDF.join(infoDF, Seq("id"), "left_outer" ).show() +---+----+----+ | id|name| age| +---+----+----+ | 1| A| 11| | 2| B|null| +---+----+----+
outer join
personDF.join(infoDF, Seq("id"), "outer" ).show() +---+----+----+ | id|name| age| +---+----+----+ | 1| A| 11| | 2| B|null| | 3|null| 33| +---+----+----+
right outer join
personDF.join(infoDF, Seq("id"), "right_outer" ).show() +---+----+---+ | id|name|age| +---+----+---+ | 1| A| 11| | 3|null| 33| +---+----+---+
Last updated