The only thing he disproves is that the MBTI can't predict one's occupation, which we already know. Predicting one's occupation is the major "business" aspect of the MBTI (which certainly would have helped boost his ascention into the ranks not of research, but an administration at a different university) but that's by no means its purpose.
Construct validity is the measurement of latent constructs that can't physically be measured. You can't just whip out someone's attitude or belief toward something and put a ruler next to it. Because of this, we must measure operationalized variables that serve as a proxy to estimate the latent construct.
Statistical Structure
He makes the serious mistake of putting the MBTI in the context of a bimodal distribution, which is B.S. The data can't even be described using a curve. He's reading into something that is not there.
The data is not ordinal or continuous. According to the MBTI, one is either fully option A or fully option B based on a dichotomy. One is either an introvert or an extrovert, which determine one's functions, the order in which they are preferred, etc. Someone who tests 51% introvert has the exact same functions in the same order as someone who tests 99% introvert. He's assuming the data is continuous, i.e. introversion is on a measureable scale of 1-100, when the MBTI predicts that introverted functions will be expressed at a certain frequency under different sets of conditions.
Reliability
Using a mental image of an archer shooting at a target, validity refers to the distance of a given measure/arrow from the truth/center and reliability refers to the repeatability, i.e. do the arrows hit the same spot over and over.
Validity and reliability are increased by triangulating operationalized variables and repeated sampling, both of which the MBTI does quite well, which is why it takes so $^%*(& long to complete the real assessment because there are so many questions.
He states that a single test-retest sample (two total) is enough to establish reliability, which is the absolute biggest load of bullshit I've ever heard in reference to scientific inquiry. Reliability can only be established after several repeated measures.
More importantly, using standard error as a statistic implies that the data is ordinal or continuous, which it isn't. He himself mentions this in the second to last paragraph in the reliability section. Even if the data was as he envisions, even a massive standard error doesn't disprove that the proposed relationship that results in functions and their preferences doesn't exist, but points to the notion that the variables being used as a proxy have not been well operationalized.
Validity
Factor analysis is not the appropriate method of analysis because it does not account for the order of preference of the functions. It's nothing close to the 4 simple factorial categories he describes, but at least 8 (Ti, Te, Ni, Ne, Si, Se, Fi, Fe) which are distinctly different, and that's not accounting for the order of the functions, which would be 16 (INTP, ESFJ, etc). The variation in the dataset will exist because different types share the same functions, but in a different order of preference. The man needs to do some research on cluster analysis because the MBTI cannot be pidgeonholed into his 2x2 factorial box.
I'll cut off my wall of text tangent here.