It turns out that parallel computation is easy (and has been studied by Dongarra and the LAPACK folks).  If you have two data chunks reduced to $$R_1$$ and $$Q_1^TY_1$$, and $$R_2$$ and $$Q_2^TY_2$$, just treat each $$R$$ as an $$X$$ and each $$Q^TY$$ as a $$Y$$ to merge the QR decompositions.
So, if you have a file system that can feed $$M$$ separate QR decomposition processes, you can do the QR decomposition in $$M$$ parallel processes and then $$\log M$$ sets of parallel merges.