Research

The following page introduces a selection of studies and samples that have been used to establish the reliability and the validity of 4G. If you would like more information on the criterion validity, or any other aspect of 4G, please complete the form below for a summary or full copy of the test manual.

This summary introduces a new model or theory of psychology, 4G. The model represents key differences with other Jungian based approaches such as Myers Briggs (Briggs Myers 1998) and shows empirical links to the Five Factor Model (FFM) and Big 5 approaches advocated by Costa and McCrae (1992) and Goldberg (1990) respectively.

The summary is intended to present an overview of 4G in both a theoretical and empirical sense. The construction of the theory and accompanying instrument has been carried out by Charles Foster, whilst the empirical research and authoring of this document has been undertaken by Bruce Lewin. Given this split of tasks and the theoretical starting point, the authors have been conscious of a need to balance the theoretical model with the statistical rigors of classical test theory and accompanying empirical evidence.

4G is constructed of 8 scales. The 8 scales measure Intuition (4G N), Sensing (4G S), Thinking (4G T), Feeling (4G F), Extroversion (4G E), Introversion (4G I), Irrational (4G Irr) and Rational (4G Rat).

Reliability

The figures for Cronbach’s Alpha provide a measure of the internal consistency of the scales or units of measurement within 4G, an acceptable level is considered to be at least 0.7.

Sample (n) = 332
Scale Alpha Coefficient
4G N 0.88
4G S 0.83
4G T 0.84
4G F 0.91
4G E 0.85
4G I 0.89
4G Irr 0.88
4G Rat 0.89
Business people and volunteers sourced via the internet

The Standard Error of Measurement (SEM) is a way of establishing the levels of random error within an instrument. In other words, if 4G was applied on a continuous basis, what sort of random fluctuations in the scores would be generated through random measurement error?

Sample (n) = 332
Scale Standard Deviation Standard Error of Measurement
4G N 8.90 3.02
4G S 7.61 3.16
4G T 7.55 2.99
4G F 9.18 2.83
4G E 6.62 2.53
4G I 7.29 2.45
4G Irr 6.93 2.44
4G Rat 7.03 2.36
Business people and volunteers sourced via the internet

Test-Retest is the third aspect of 4G’s reliability. This measure ensures that the scales in the instrument are consistent and stable over time. By correlating the scores from period 1 with period 2, this information can be obtained and figures greater than 0.70 are considered acceptable.

Sample (n) = 159
Scale 1 Month Test-Retest Coefficient
4G N 0.73**
4G S 0.77**
4G T 0.79**
4G F 0.83**
4G E 0.77**
4G I 0.81**
4G Irr 0.78**
4G Rat 0.75**
Business people and volunteers sourced via the internet
* Correlation is Significant at the 0.05 level
** Correlation is Significant at the 0.01 level

Validity

Whilst reliability seeks to demonstrate than an instrument is consistent in what it measures, validity is concerned with the question “is this instrument measuring what you think it is measuring??_ It is a way of judging the use of the instrument and its practical applications. The first example of the validity of 4G comes from construct validity.

Sample (n) = 332
4G N 4G S 4G T 4G F 4G E 4G I 4G Irr 4G Rat
4G N 1
4G S -0.22** 1
4G T 0.22** 1
4G F -0.36** 1
4G E 0.33** 1
4G I -0.76** 1
4G Irr 0.44** -0.26** 0.25** 1
4G Rat 0.36** 0.36** -0.24** 0.23** -0.72** 1
Business people and volunteers sourced via the internet
* Correlation is Significant at the 0.05 level
** Correlation is Significant at the 0.01 level

The second example of the validity of 4G is demonstrated through the use of factor analysis. Factor analysis is a way of simplifying a large amount of data and extracting underlying themes or variables. In order for 4G to be considered valid, each of the 8 scales must reflect a factor that is consistent with the theory and demonstrate a well reasoned argument for their results.

Sample n = 332
Factor 1 Factor 2 Factor 3 Factor 4 Factor 5 Factor 6
4G N 0.98 0.24 0.24 0.42
4G S 0.99 -0.31
4G T 0.22 0.99 -0.36
4G F -0.36 0.99
4G E 0.39 0.21 0.90 0.29
4G I -0.95
4G Irr 0.45 0.93
4G Rat -0.30 0.35 -0.26 -0.92
Business people and volunteers sourced via the internet
Extraction Method - Principal Component Analysis
Rotation Method - Oblimin with Kaiser Normalization

The third aspect of validity is concurrent validity and this attempts to assess how good 4G is at predicting the scores of other instruments. When both instruments are filled out at the same time, the term concurrent validity is used. For the purpose of establishing the validity of 4G against other instruments, the NEO IPIP was chosen as a measure of the FFM model (IPIP 2001). Additionally, a shorter 120 item version was chosen, as opposed to the standard 300 item version (Johnson 2001).

Sample (n) =133
NEO O NEO C NEO E NEO A NEO N
4G N 0.38** -0.33*
4G S 0.39**
4G T 0.21* -0.20* -0.28**
4G F 0.41**
4G E 0.60** -0.34**
4G I -0.50**
4G Irr 0.39** -0.52**
4G Rat -0.42** 0.58**
Business people and volunteers sourced via the internet
* Correlation is Significant at the 0.05 level
** Correlation is Significant at the 0.01 level

Fairness

One other important psychometric property is that of fairness (freedom from systematic bias). In other words, is 4G bias against one particular subgroup or another? This is of greatest importance when considering the legal implications of psychometrics and the use of instruments in the workplace. It is generally accepted that in the context of selection, a 4/5ths bias or the 80/20 rule can be applied, whereby if the proportion of the minority group reaching a cut-off point is less than 80% of the proportion of the majority group, then the assessment instrument can be said to give rise to an adverse impact. This is reported in the UK by Baron and Miles (2002) and in the US via the Uniform Guidelines on Employee Selection Procedures (1978). For the purpose of this document, subgroups are defined via their age, sex and ethnic majority or minority membership.

On the basis of the definition above, 4G offers no significant bias against any subgroup. Furthermore, the small levels of bias that reside within the instrument around sex and race are replicated in other studies and are felt to be consistent with those findings. In the UK, Anderson and Ones (2002) write “using data from 504 participants, we found, encouragingly, that there were no large gender differences across the three inventories examined. The standard deviations of the male and female groups studied were also quite similar. Ethnic group differences were slightly larger, but still not large enough to cause concerns over adverse impact.?_ The 4G data on sex shows a small level of bias around the 4G T (6.4% to men) & 4G F (3.9% to women) scales, consistent with the findings of Feingold (1994) who states “we can expect to find the largest differences…. favouring women to occur in the Agreeableness domain?_. Regarding race, there is a small difference of 1.83% favouring the ethnic majority on 4G N and a figure of -3.03% on 4G S, but this is felt to be well within the 80/20 rule as above.

Accordingly, 4G can be judged to be psychometrically fair when considering the possibility of bias against age, sex and race. Furthermore, the small differences found within the instrument are both well within the standard error of measurement and standard deviations of the scales and are reflected via similar differences found by other researchers.

Request a copy of the 4G Test Manual

References

  • Anderson, N. & Ones, D. (2002). Gender and ethnic group differences on personality scales in selection: Some British data. Journal of Occupational and Organizational Psychology, 75, 3, 255 – 276.
  • Baron, H. & Miles, A. (2002). Personality Questionnaires: Ethnic Trends and Selection. BPS Occupational Psychology Conference. January.
  • Briggs Myers, I., McCaulley, M., Quenk, N., Hammer, A. (1998). MBTI Manual (A guide to the development and use of the Myers Briggs type indicator). Consulting Psychologists Press; 3rd ed edition.
  • Costa, P., & McCrae, R. (1992) The NEO Personality Inventory – Revised manual. Odessa, FL, Psychological Assessment Resources.
  • Feingold, A. (1994). Gender differences in personality: a meta-analysis. Psychological Bulletin, 116, 429 – 456.
  • Goldberg, L. (1990). An alternative "description of personality": the big-five factor structure. Journal of Personality and Social Psychology, 6, 1216 – 29.
  • Johnson, J. (2001). Screening Massively Large Data Sets For Non-Responsiveness In Web-Based Personality Inventories. A lecture given to the Bielefeld-Groningen Personality Research Group, University of Groningen, The Netherlands, 9th May. Website accessed 16th January 2005. http://www.personal.psu.edu/faculty/j/5/j5j/papers/screening.html
  • International Personality Item Pool (2001). A Scientific Collaboratory for the Development of Advanced Measures of Personality Traits and Other Individual Differences. Website accessed 2nd January 2005. http://ipip.ori.org.
  • Uniform Guidelines on Employee Selection Procedures (1978). 41 CFR 60-3.4 – Information on impact. US Department of Labour. Website accessed 5th January. http://www.dol.gov/dol/allcfr/ESA/Title_41/Part_60-3/41CFR60-3.4.htm

Tweet about this on TwitterShare on LinkedInShare on FacebookShare on Google+Email this to someone