You can't do this in a nice way I'm afraid. Think about how it works under the hood - it splits the data to be counted into chunks and sends it off to different processes, each process counts it's chunk, then a single reducer adds them all up at the end. While each process is counting it doesn't know the entire size so it can't add the field on. The only way is to go back and add it to the data once the entire size is known (i.e. a join).
If each group fits in memory (and you can configure the memory), you can:
Tsv(args("input"), ('id1, 'id2))
.groupBy('id2)(_.size.toList[(String, String)](('id1, 'id2) -> 'list))
.flatMapTo[(Iterable[(String, String)], Int), (String, String, Int)](('list, 'size) -> ('id1, 'id2, 'size)) {
case (list, size) => list.map(record => (record._1, record._2, size))
}
.write(Tsv(args("output")))
But if your system doesn't have enough memory, you will have to use an expensive join.
Remark:
You can use Tsv instead of TextLine followed by mapTo and splitting.