Week 11 : Putting it all together - replicating Power Poses!

This week we will explore some data replicating a famous but contentious experiment. We will combine your skills from across the course to explore different aspects of the study.

Learning objectives for this session
	Quantitative Methods
	Revision of t-tests

	Data Skills
	Revision of reproducible analyses

	Open Science
	Working with published scientific data

	Ethics
	Questionable research practices
	Constructive criticism

1. The Dataset

We will be working with a replication of a vary famous study showing that our body posture can have far reaching impacts on our attitudes and even our physiology! amazing…

From the abstract of the original paper:

Carney, D. R., Cuddy, A. J. C., & Yap, A. J. (2010). Power Posing: Brief Nonverbal Displays Affect Neuroendocrine Levels and Risk Tolerance. Psychological Science, 21(10), 1363-1368. https://doi.org/10.1177/0956797610383437

Humans and other animals express power through open, expansive postures, and they express powerlessness through closed, contractive postures. But can these postures actually cause power? The results of this study confirmed our prediction that posing in high-power nonverbal displays (as opposed to low-power nonverbal displays) would cause neuroendocrine and behavioral changes for both male and female participants: High-power posers experienced elevations in testosterone, decreases in cortisol, and increased feelings of power and tolerance for risk; low-power posers exhibited the opposite pattern. In short, posing in displays of power caused advantaged and adaptive psychological, physiological, and behavioral changes, and these findings suggest that embodiment extends beyond mere thinking and feeling, to physiology and subsequent behavioral choices. That a person can, by assuming two simple 1-min poses, embody power and instantly become more powerful has real-world, actionable implications. (Carney, Cuddy, and Yap 2010)

This is a compelling result! Perhaps not so suprising that power poses would change our mood - but to cause a change hormones is amazing.

The authors quickly became famous with this result. The second author Amy Cuddy gave high profile talks (TED talk with 74 million views: Video & transcript), wrote books and appeared widely in the media as a result of this work.

Amy Cuddy at Pop Tech 2011 - Photo by Erik Hersman.

These results attracted several replications attempts and the following debate became so intense that it was attracting attention in the mainstream media - including articles summarising the story in the New York Times magazine (Dominus 2017) and the British Psychological Society (Loncar 2021). It even has it’s own Wikipedia page (Wikipedia 2025) and lead to some very awkward photos of british politicians (Wheeler 2018).

We’ll be focusing on one of these replications that tried to faithfully repeat the original methodology (Ranehill et al. 2015) with a large, independent data sample.

2. The Challenge.

We will look at how the replication might have been planned, analyse the data and make a shareable version of the analysis.

We will also use this story to think about some ethical issues around good scientific practices and how we can encourage collaborative and constructive scientific debate.

3. Understanding Power Posing and planning a replication.

Let’s understand the original experiment in a bit more detail. Read through the introduction and methods section of (Carney, Cuddy, and Yap 2010) and answer the following questions.

Check Your Understanding

Question

Read the introduction (you can focus on the hypotheses) and methods section of (Carney, Cuddy, and Yap 2010) and answer the following questions.

Introduction

The authors predict that high-power poses will lead to in testosterone.

The authors predict that high-power poses will lead to in the stress hormone cortisol.

The authors predict that high-power poses will lead to in subjective feelings of power.

The authors predict that high-power poses will lead to in tolerance to risk.

Methods Section

What is the design of this study? .

How were participants assigned to groups? .

What is the independent variable (or grouping factor)? ?

Which of these is NOT a dependent variable of interest? ?

Are participants told the true purpose of the study?

Did the experimentors know which condition the participant they were testing was in?

Check Your Understanding

Read the results section of (Carney, Cuddy, and Yap 2010). Identify the reported statistics for the high-power/low-power pose groups for the following effects: Testosterone, Cortisol and Feelings of Power.

Enter the reported effect sizes (r values) in the table below and use this online tool to convert them to Cohen’d D values.

Finally - use jPower in Jamovi to estimate the sample size you would need to have 90% power to detect this effect again with an type 1 error rate of 0.05. We covered this in Week 5 Section 7

Hypothesis Test	Reported Effect Size (r)	Converted Effect Size (Cohen’s d)	Sample size per group for 90% power
Testosterone
Cortisol
Feelings of power

(Carney, Cuddy, and Yap 2010) report the following in their results.

As hypothesized, high-power poses caused an increase in testosterone compared with low-power poses, which caused a decrease in testosterone, F(1, 39) = 4.29, p < .05; r = .34 (Fig. 3). Also as hypothesized, high-power poses caused a decrease in cortisol compared with low-power poses, which caused an increase in cortisol, F(1, 38) = 7.45, p < .02; r = .43 Finally, high-power posers reported feeling significantly more “powerful” and “in charge” (M = 2.57, SD = 0.81) than low-power posers did (M = 1.83, SD = 0.81), F(1, 41) = 9.53, p < .01; r = .44.

(Carney, Cuddy, and Yap 2010) reported ANOVAs with r values as effect sizes - but remember that we can convert the effect sizes to Cohen’s d using the online tool.

Hypothesis Test	Reported Effect Size (r)	Converted Effect Size (Cohen’s d)	Sample size per group for 90% power
Testosterone	`0.34`	`0.723`	`42`
Cortisol	`0.43`	`0.953`	`25`
Feelings of power	`0.44`	`0.980`	`23`

Data Skills - The replication study has a planned sample size of 100 - is this enough to have a 90% chance of detecting the effect if is is there?

Yes, this should be plenty. The smallest reported effect in the original paper is for Testosterone, which suggests that two groups of 42 participants should be sufficient.

4. Analysing the replication data.

Key step

Make sure you have downloaded the dataset and loaded it into Jamovi before starting this section.

Take a moment to have a look through the dataset paying particular attention to the ‘Variables’ tab.

Next, we’re going to look at the dataset from (Ranehill et al. 2015) from the replication study. The research team were very careful to closely replicate the original methods.

There were a small number of changes in the replication compared to the original study:

Whereas participants in Carney et al.’s study held each position for 1 min, we extended this time to 3 min.
Second, Carney et al. manually configured participants’ positions before leaving the room, whereas participants in our study received instructions on a computer. As in Carney et al.’s study, participants were recorded while posing; the recordings verified that participants complied with task instructions. Because the instructions were given via computer, the experimenter was blind to experimental condition, which negated potential experimenter effects.
Third, the filler task in our study involved constructing words from letters and spaces; in Carney et al.’s study, the task was to form impressions of faces.
Finally, we did not use deception, but informed participants briefly in the consent form that the study investigated whether physical position influences hormone levels and behavior (the instructions and consent form are in the Supplemental Material).

These are reasonable changes that aim to improve the orignal design (blinded experimentors, longer pose duration) without making substantial changes to its purpose.

Let’s take a look at the data.

Check Your Understanding

In the original study, (Carney, Cuddy, and Yap 2010) reported that:

High-power posers reported feeling significantly more “powerful” and “in charge” (M = 2.57, SD = 0.81) than low-power posers did (M = 1.83, SD = 0.81), F(1, 41) = 9.53, p < .01; r = .44.

Can you replicate this analysis using the powerful and hp_condition (1 is ‘high power pose’ and 0 is ‘low power pose’) variables in the replication dataset from (Ranehill et al. 2015)? (Carney, Cuddy, and Yap 2010) used an ANOVA but you should use a t-test.

Use the results to fill in the gaps in the reporting below.

The results the original finding about subjective feeling. High-power posers (M = , SD = ) reported feeling low-power posers did (M = , SD = ), t() = , p = ; Cohen’s d = .

High-power posers reported feeling significantly more “powerful” (M = 2.22, SD = 0.753) than low-power posers did (M = 1.978, SD = 0.663), t(193) = -2.40, p = 0.017; Cohen’s d = -0.344.

Data Skills - Compare your results above to those reported in the replication paper - are there any differences?

No! oddly they do not… (Ranehill et al. 2015) report the following:

Using two-tailed t tests, we replicated Carney et al.’s finding that participants in the high-power condition self-reported, on average, higher feelings of power than did participants in the low-power condition (mean difference = 0.245, 95% confidence interval, or CI = [0.044, 0.446]), t(193) = 2.399, p = .017, Cohen’s d = 0.344. This suggests that the power-posing manipulation worked.

Most of the reported values are the same, but but notice that the sign of the t-value is different!

We can confirm that this is a weird variable issue by checking hte descriptive statistics. The direction of our effect is definitely the same as what is reported in the paper.

Key step - open after answering the previous question

Something strange has happened - it seems like (Ranehill et al. 2015) must have computed the t-test the other way around compared to our analysis. They have a positive t-value and ours is negative.

Remember that the direction of a t-test (ie whether the t-statistics is positive or negative) doesn’t effect the p-values but does change the interpretation of the result a little.

This is because the order of subtraction makes a difference to the answer - if we do 2 + 3 or 3 + 2 we get the same answer however 2 - 3 and 3 - 2 are not the same.

The means of our groups are 1.98 for low power and 2.22 for high power poses. (Ranehill et al. 2015) report a positive t-value of 2.399 - so they must have run the equivalent of high power minus low power (2.22 - 1.98) and have a positive t-value indicating that the mean for high power is larger.

In contrast, our negative t-value indicates we have run low power minus high power (1.98 - 2.22) and have a negative t-value indicating that the mean for low power is smaller.

The order of the conditions in the t-test is controlled from the data varaible view. Double click on hp_condition in the ‘Variables’ tab, or on the column header of the ‘Data’ tab. We should need a ‘Nominal’ variable with two ‘Levels’ - 0 and 1. The conditions in the t-test are sorted in the order of this ‘Levels’ list.

You can reorder the levels to put 1 first and 0 second - which should update your t-test live to match (Ranehill et al. 2015)’s result.

We could make this even easier to remember by changing the names of the Levels to something more memorable. This is often a good idea when data has used an abstract numerical coding. Statistics is about precision and anything we can do to make the interpretation easier is a good thing.

Your adapted result should match the following:

Check Your Understanding

Can you replicate the core findings from (Carney, Cuddy, and Yap 2010) using the data from (Ranehill et al. 2015)?

Make sure you have completed the ‘Key step’ above before starting this exercise!

Run t-tests to analyse the effect of high vs low power posing on the following variables: powerful, testosterone_diff, cortisol_diff and num_risky_gain_pc as dependent variables.

Read through the results of the original paper (Carney, Cuddy, and Yap 2010) - does the second study confirm the findings of the original?

Variable	t-value	p-value	Effect size	Replicates (Carney, Cuddy, and Yap 2010)?
Feelings of power
Risk Taking (Gain)
Testosterone
Cortisol

Variable	t-value	p-value	Effect size	Replicates (Carney, Cuddy, and Yap 2010)?
Feelings of power	`2.40`	`0.017`	`0.344`	Yes
Risk Taking (Gain)	`-1.24`	`0.215`	`-0.176`	No
Testosterone	`-1.40`	`0.162`	`-0.200`	No
Cortisol	`-1.10`	`0.272`	`-0.157`	No

5. Assumption Checking.

We have the results of the replication! but before going any further we should be sure that the assumptions of our test are met.

Check Your Understanding

Check the assumptions of the high-power vs low-power t-tests for the following variables: powerful, testosterone_diff, cortisol_diff and num_risky_gain_pc.

Are the assumptions of Normality and Homogeneity of Variance confirmed or violated?

Variable	Normality	Homogeneity of Variance	Preferred Test
Powerful
Testosterone
Cortisol
Risk Taking (Gain)

Variable	Normality	Homogeneity of Variance	Preferred Test
Powerful	Violated	Violated	Mann-Whitney U
Testosterone	Violated	Confirmed	Mann-Whitney U
Cortisol	Violated	Confirmed	Mann-Whitney U
Risk Taking (Gain)	Violated	Confirmed	Mann-Whitney U

(Ranehill et al. 2015) do something unusual with their assumption checking. The don’t explicitly check whether the assumptions are met or not, rather they report both parametric and non-parametric statistics for all their tests (Non-parametric tests are reported in supplemental materials) and state that the conclusions would be the same irrespective of which test is would have been chosen (please do not do this in your reports!).

6. Interpreting the results.

It can be challenging to interpret the results of a null replication. If both the original study and replication show the same finding then things are much more straightforward….

To interpret a failed replication we have to consider two possibilities:

The original study was a false positive and the replication is a correct rejection.
The original study was a true positive and the replication is a false negative.

One of our experiments is not accurate.

It is very very difficult to separate these options with 100% confidence… Often a second replication is the way to converge on an answer, and perhaps a third and fourth would be better still.

One thing we do know is that if the effect size from the original study were accurate, then the replication should have had enough participants to see it again.

The same is not true the other way around. Take a look at the figures below:

Original and replication effect size with 95% confidence intervals for that effect size. Figures by Data Colada: https://datacolada.org/37#identifier_0_910

The replication confidence interval is much smaller than the original and even crosses zero for all three key variables. This strongly indicates a very tiny or zero effect.

The replication effect is so small that, if it is accurate, then the original study with its smaller sample would only have a 5.9% chance of detecting it. (Analysis by Data Colada).

So which study do we trust? Ultimately that your decision as an individual scientist. Consider what you know about both studies and evaulate which you find more compelling.

In this case, there is a substantial consensus in the field that the original effects on hormones was likely a false positive, though the increase in the subjective feelings of power after a power pose replicates well. The replications have several advantages including larger sample sizes, blinded and objective experimentors, and longer pose durations.

The original authors have engaged in the debate and made several points in defense of their work. The original study was famous by the time the replication was carried out - perhaps the participants had heard of the original study. Similarly, the replication study participants were told what was being measured in advance which might have effected their feelings during hte study. Finally, the original study was in hte USA and hte replication in Switzerland - perhaps there is a cultural component to power posing?

7. Ethics & Collaboration.

The original paper attracted huge scientific and media interest

61 news outlets
33 blogs
2 policy sources
110 X users
1 peer review site
11 Facebook pages
8 Wikipedia pages
1 Redditor
11 YouTube creators
1 Bluesky user

data from Altmetric, see here for full summary.

This was a mix of constructive debate, harsh criticism and genuine interest. Unfortunately a section of this debate became aggressive and personal towards Amy Cuddy to the extent that it had a serious impact on her life, health and career.

In addition to her fame and influence, Cuddy’s critics seemed obsessed with profits from her book and her fees for speaking engagements. Dollar amounts frequently emerged in conversations where academics discussed the research, and recently, one of her most vocal critics said her speaking engagement fees made him feel “ill.” One Ivy League professor tweeted, “You can listen to her speak about her irreplicable results for $25-$40,000.” In reality, her book and speaking topics have little to do with the power posing studies.

Many of the negative comments were personal in nature and aimed at Cuddy directly. “They were comparing me to Harvey Weinstein,” Cuddy says, recalling a post to the PsychMAP Facebook group which the site owners were eventually persuaded to remove. As if that’s not bad enough, Cuddy also received death threats via direct messages on social media. “I became an untouchable in my field,” she adds, describing how colleagues were shunned for merely friending her on Facebook.

Although Cuddy says Harvard wanted her to stay, she left her full-time position, because she no longer felt safe doing her work (she remains a faculty member at Harvard Business School in Executive Education). “My dean wanted me to stay, but I later told him, I couldn’t stay, because I couldn’t endure this abuse. It was relentless,” says Cuddy. Her experiences inspired her new book (still a work in progress) entitled, Bullies, Bystanders and Bravehearts aimed at helping others who find themselves facing similar experiences.

Exerpts from (Elsesser 2024).

This has led to a follow-up discussion […] consisting of two general camps: those who think Cuddy is being treated unfairly, and those who believe that complaints about bullying or harassment are a distraction from the most serious problem here, which is the elevation of shoddy research to a place of such profitable prominence.

Except from (Singal 2017)

There is perhaps a tension here between our ethical obligation to act with integrity in research and our obligation for act with respect (BPS, n.d.). Or perhaps some people are just bullies.

Have a read through these sources and take a moment to do a quick web search for yourself.

Share what you think about this situation on the padlet board below. what do you think of the situation surrounding this debate? how can we do better in future?

Taking and receiving criticsm is a huge part of progressive constructive science but can be challenging to do well.

8. Summary.

Replication is the gold standard for scientific progress but it is rarely as straightforward as we’d like it to be!

A wide range of issues can arise - including statistical approaches, experimental design and sometimes interpersonal conflict. We have to navigate all of these points with high ethical standards for the scientific community to progress.

Thank you for your time this semester - I hope you have enjoyed these computer practical sessions!

References

BPS. n.d. Code of Ethics and Conduct - BPS Explore. https://explore.bps.org.uk/content/report-guideline/bpsrep.2021.inf94.

Carney, Dana R., Amy J. C. Cuddy, and Andy J. Yap. 2010. “Power Posing: Brief Nonverbal Displays Affect Neuroendocrine Levels and Risk Tolerance.” Psychological Science 21 (10): 1363–68. https://doi.org/10.1177/0956797610383437.

Dominus, Susan. 2017. “When the Revolution Came for Amy Cuddy.” The New York Times. The New York Times. https://www.nytimes.com/2017/10/18/magazine/when-the-revolution-came-for-amy-cuddy.html.

Elsesser, Kim. 2024. “The Debate on Power Posing Continues: Here’s Where We Stand.” Forbes. Forbes Magazine. https://www.forbes.com/sites/kimelsesser/2020/10/02/the-debate-on-power-posing-continues-heres-where-we-stand/.

Loncar, Tom. 2021. BPS. The British Psychological Society. https://www.bps.org.uk/psychologist/decade-power-posing-where-do-we-stand.

Ranehill, Eva, Anna Dreber, Magnus Johannesson, Susanne Leiberg, Sunhae Sul, and Roberto A. Weber. 2015. “Assessing the Robustness of Power Posing: No Effect on Hormones and Risk Tolerance in a Large Sample of Men and Women.” Psychological Science 26 (5): 653–56. https://doi.org/10.1177/0956797614553946.

Singal, Jesse. 2017. “How Should We Talk about Amy Cuddy, Death Threats, and the Replication Crisis?” The Cut. The Cut. https://web.archive.org/web/20240509073115/https://www.thecut.com/2017/04/amy-cuddy-death-threats.html.

Wheeler, Brian. 2018. “When Is It OK to Do a ‘Power Pose’?” BBC News. BBC. https://www.bbc.co.uk/news/uk-politics-43960452.

Wikipedia. 2025. “Power posing — Wikipedia, the Free Encyclopedia.” http://en.wikipedia.org/w/index.php?title=Power%20posing&oldid=1270473243.