Pessin Research Paper

Posted on by Tara

Trust and Fertility Dynamics

Arnstein Aassve, Carlo F. Dondena, Centre for Research on Social Dynamics and, Department of Policy Analysis and Public Management, Università Bocconi, Via Guglielmo Röntgen 1, 20136 Milan, Italy, ti.inoccobinu@evssaa.nietsnra, Tel. + 39. 02.5836. 5657, Fax. + 39.02.5836.2798;

Contributor Information.

Corresponding author.

Author information ►Copyright and License information ►

Copyright notice and Disclaimer

The publisher's final edited version of this article is available at Soc Forces


We argue that the divergence in fertility trends in advanced societies is influenced by the interaction of long-standing differences in generalized trust with the increase in women’s educational attainment. Our argument builds on the idea that trust enhances individuals’ and couples’ willingness to outsource childcare to outside their extended family. This becomes critically important as women’s increased education enhances the demand for combining work and family life. We test our hypothesis using data from the World Values Survey and European Values Study on 36 industrialized countries between the years 1981 and 2009. Multilevel statistical analyses reveal that the interaction between national-level generalized trust and cohort-level women’s education is positively associated with completed fertility. As education among women expands, high levels of generalized trust moderate fertility decline.

Keywords: Generalized trust, low fertility, women’s education


The concept of generalized trust has recently attracted considerable interest. Whereas sociologists were the first to emphasize the important role trust plays in our societies (Coleman 1990, Fukuyama 1995, Gambetta 1988), its significance has now been acknowledged more broadly across the social sciences (e.g. Alesina and La Ferrara 2002; Aghion, Algan and Cahuc 2008; Bjørnskov 2007). Trust fosters cooperation and acts as a lubricant, easing the way in which transactions are made. High levels of trust positively correlate with the quality of institutions and political participation, enhancing also civic engagement and social cohesion in general (Knack 2002; Uslaner 2002). Trust is negatively related with corruption (Uslaner 2002), crime and delinquency (Buonanno, Montolio and Vanin 2009) and income inequality (Uslaner 2002), but positively related to the functioning of financial institutions (Guiso, Sapienza, and Zingales 2004) and economic growth (e.g. Knack and Kiefer 1997).

In this paper we argue that generalized trust also plays a key role in explaining the dynamics of reproduction – in particular fertility – in industrialized societies. Starting from the baby boom in the sixties and the seventies, fertility decline has diverged across societies. The notion of “lowest-low fertility” emerged in the demography literature to refer to countries in which the Total Fertility Rate (TFR) fell below 1.3 (Kohler, Billari and Ortega 2002). To the astonishment of researchers and policy makers alike, lowest-low fertility emerged in countries that were traditionally considered family-oriented, such as Italy, Spain and Greece (Chesnais 1996, Billari and Kohler 2004). Later, fertility fell dramatically in the former communist countries of Central Europe and in East Asia. Germany and Austria have kept low fertility levels over a longer period, at times touching lowest-low levels. Fertility dynamics in these countries are in stark contrasts to both Northern European and English-speaking OECD countries. In this latter group, despite fertility declining to lower levels when compared to the baby boom period, the TFR remained close to the replacement rate of 2.1 children per woman, and more recently it might even have started to rebound (Myrskylä, Kohler and Billari 2009).

We review three main explanations that, among others, are used in the literature to explain fertility decline (for a complete review see Balbo, Billari and Mills 2013). First, fertility dynamics might be driven by ideational change. Inspired by Inglehart (1971), Lesthaeghe and van de Kaa proposed the “Second Demographic Transition” thesis, which focuses on the rise of post-materialist values and the decreased centrality of family and children, with a progressive diffusion across all industrialized societies (Lesthaeghe and van de Kaa 1986; van de Kaa 1987; Lesthaeghe 2010). A second line of research emphasizes the interplay between gender and institutions. Despite women’s increased educational attainment and greater financial independence, traditional gender roles tend to persist, especially in the family sphere. McDonald (2000) and Goldscheider (2000) have argued that men have not compensated the reduction of women’s unpaid housework. Thus, as women enter the labor market in increasing numbers, they face a double burden of housework and childrearing and market work. Only institutional change can accommodate for changing gender roles. Indeed, Myrskylä, Kohler and Billari (2011) and Arpino, Esping-Andersen and Pessin (2015) both show that the recent increases in fertility depend on the degree of gender equality in a given society. A third and related strand of the literature focuses on the role of welfare and family policies. In particular, it is argued that in Nordic countries fertility levels are higher because the state provides ample and affordable childcare services, facilitating women’s participation in the labor market while still enabling them to achieved their desired fertility (e.g. Neyer and Andersson 2008).

Taken on their own, none of these three sets of explanations is entirely satisfying. Despite the appealing nature of the Second Demographic Transition thesis, there is no clear evidence of changing fertility preferences (Rindfuss, Guzzo and Morgan 2003). In fact, fertility trends have not mirrored changes in related behaviors such as marriage, which are clearly more directly influenced by ideas. Desired fertility has remained remarkably constant over time and countries (Bongaarts 2001, Sleebos 2003), so that the gap between preferences and behavior, i.e. the “baby gap”, is seen as a policy challenge (OECD 2011). Moreover, countries that have progressed farthest in terms of post-modern family attitudes and behaviors are those that now have the highest levels of fertility (e.g., Sobotka 2008; Aassve, Bassi and Sironi 2011). As concerns gender, empirical tests of the relationship between gender equality and fertility still remain limited (Mills 2010), and the effect of the revolution in gender roles on fertility calls for further theoretical elaborations (Esping-Andersen and Billari 2015). In particular, it is not clear why some countries have experienced a change from low to high gender equity in family-oriented institutions and others have not. Also, it remains doubtful whether such a change would impact fertility in the same manner in different societal contexts (Myrskylä et al. 2011).

We argue that the broad expansion in women’s education in the last four decades (which we consider an exogenous structural change) and the persistent differences in generalized trust have worked together towards producing divergent fertility dynamics across industrialized societies. The key ingredient of our thesis is simple: as women attain higher levels of education and couples seek to combine working life with family formation, traditional childrearing activities need to be outsourced. Societal responses to this new need might differ. In Nordic countries, a large provision of public care institutions for young children (and for the elderly) allows such outsourcing. Still, four decades ago the male breadwinner model was dominant in Nordic countries. The family was the key care institution, where men specialized in paid work and women dealt with the care duties of the household. Today the male breadwinner model is near extinct in the Nordic societies. Conversely, during the same period, in market-oriented societies, such as the U.S. and the U.K., services that allow women to combine work and family have been mostly generated through the market. Paradoxically, fertility is today higher in those countries where the male breadwinner model is disappearing, irrespective of whether active welfare interventions have allowed a better combination of work and family.

We argue that during the period of educational expansion, generalized trust has acted as a catalyzer in the process of fertility decline because trust matters for individuals’ willingness to outsource caring activities. Our thesis reconciles the paradox that fertility has remained high in both Social-democratic and Liberal welfare regimes – despite the fact that these welfare regimes are at the opposite ends of standard welfare state typologies (e.g., Esping-Andersen 1990). Generalized trust tends to be high in societies characterized by Social-democratic welfare regimes as well as Liberal welfare regimes. In the former, outsourcing of care activities goes to individuals operating in public institutions. In the latter, outsourcing goes to individuals operating in the market. In Southern and Eastern Europe, in which generalized trust is much lower, outsourcing remains limited to the extended family. As long as people do not trust other individuals for care activities, outsourcing remains residual, a feature that is not reconcilable with women wanting both to pursue a working career and have children. In other words, in low generalized trust countries the expansion in women’s education is scarcely compatible with the realization of fertility desires.

We test our hypothesis using the World Values Survey and the European Values Study (WVS-EVS). We carry out a series of 4-level Poisson regression models (with levels being country, region, cohort and individual), which take into account the nested structure of these data. By applying a multilevel statistical approach, we are able to get to the micro-macro nature of our research question and thus capture how the variation in contextual measures of trust, i.e. average regional and national levels of generalized trust, and women’s cohort education, i.e. women’s average educational level by birth cohort, may be associated to individual-level completed fertility.

The remainder of this paper is structured as follows. Section 2 provides the background and the theoretical arguments for the role of trust in shaping fertility dynamics. Section 3 introduces the data and methods we use. Section 4 presents and discusses our results, while Section 5 includes several robustness checks. Section 6 concludes summarizing our key findings and discussing its implications for future research.

Theoretical Background

Generalized Trust and Fertility

Generalized trust refers to the way individuals in a society trust fellow individuals other than those belonging to their own family or those they already know through past interactions. There is considerable variation in generalized trust across societies, but differentials are rather stable over time (Bjørnskov 2007).

Our key argument is that high levels of trust foster an adequate environment for work and family reconciliation. We hypothesize that generalized trust affects fertility dynamics through two main channels. The first effect runs through the association of trust with key parameters such as economic prosperity1, low corruption and the functioning of democratic systems, and more broadly the stability of key institutions in advanced societies. This stability is also beneficial for childbearing. Therefore, it is not surprising that if we look at very recent data, trust has a (strong) positive cross-country correlation with fertility. Furthermore, many of the existing explanations for international differences in fertility concern characteristics that correlate with trust. For instance, as documented by Myrskylä et al. (2009), in recent years, fertility is increasing in countries where economic prosperity is high. In developed societies where economic prosperity is relatively low, a good example being the East European countries, fertility is also very low and so is trust. Another argument is that high fertility levels in Scandinavian countries are maintained by the generous welfare state, providing rather long maternity leaves and generous financial support for families with young children. Again, trust is high in those countries where welfare support is strong. Likewise, the ideas developed by McDonald (2000) concerning gender equity and fertility are consistent with country levels of trust. On average, trust is high in countries where gender equality is high and much lower in countries where gender equality is low. Also, countries that have progressed farthest in the Second Demographic Transition – both in terms of attitudes and behaviors – are also countries that have the highest levels of generalized trust (Aassve et al. 2011).

The second effect is more profound in a dynamic perspective, as it deals with why countries have actually diverged in their fertility paths. The key is that high levels of generalized trust imply a higher predisposition to outsource care activities that were traditionally restricted to the realm of the family. As women attend higher education in ever greater numbers and aspire to combine work and family, the demand for outsourcing care activities will naturally increase. Indeed, several studies show that outsourcing domestic services or goods has become an alternative strategy to housewifery as women’s gender roles are changing (Raz-Yurovich 2014; De Ruijter, Treas and Cohen 2005). From a supply side perspective, policy makers or the private sector will only create, respectively, institutions or services to outsource care activities if they expect that there is a demand for them.

Figure 1 shows the education trends since 1970 for selected industrialized countries.

Figure 1

Changes in women’s enrollment in tertiary education in Bulgaria, Italy, Norway and the United States (1970–2010)

Together with boosting labor force participation rates, women’s increasing educational attainment is having a dramatic impact on women’s autonomy, economic independence, attitudes and preferences. A key implication is that their increased aspirations for ambitious working careers are not always easily compatible with childbearing (Brewster and Rindfuss 2000). The focus on the link between education and fertility is certainly not new2. Increased education especially among women is one of the most robust predictors of fertility decline (Cleland and Wilson 1987) and, as a corollary, family and work incompatibilities have been touted as an important driver behind country differences in fertility (Kalwij 2010; OECD 2011). However, the way in which our argument differs from earlier studies is that generalized trust may affect the likelihood of outsourcing childcare, and therefore fertility, as women enter higher education. If generalized trust is low, the diffusion of outsourcing will be slow and hence hamper the evolution and emergence of high quality childcare institutions, which is a precondition to prevent fertility from dropping to “lowest low” levels.

An important implication of our argument is that generalized trust is not a pre-condition for high fertility in societies in which a specialized division of housework holds, since care activities are in any case taking place within the household or extended family. In these societies, traditional attitudes prevail and women’s aspirations towards higher education and successful working careers are weaker –hence leading to high fertility as long as women do not work. It is only when moving towards an egalitarian society that trust becomes critical, as the outsourcing of care activities becomes an essential part of a gender equitable society. In countries where generalized trust is high, individuals will endorse institutions to provide for those activities that traditionally belonged in the family sphere. Where generalized trust is low, the development of care institutions will be slow, since individuals and couples do not trust other individuals and institutions providing care. In these latter situations, fertility decline might become rather deep and long lasting.

At this point, it is useful to look at the fertility trends for some selected countries. In Figure 2 we have plotted the evolution of TFR for Bulgaria, Italy, Norway and the United States.

Figure 2

Changes in total fertility rate in Bulgaria, Italy, Norway and the United States (1970–2010)

Here we see that TFR dynamics for Norway and the US are rather similar, as are TFR for Bulgaria and Italy – the latter two having a stronger dip compared to the former. Out of the four countries, Norway has the highest level of generalized trust, which is somewhat lower for the US, and significantly lower for Bulgaria and Italy. Holding Figures 1 and ​2 together, the suggestion is that the US and Norway can move towards a society in which it is possible to combine work and family quicker than what is the case for Bulgaria and Italy. Moreover, since childcare outsourcing is taking place more extensively in Norway and the US, fertility does not decline in the same fashion.

Another important implication of this argument is that the willingness to outsource childcare does not necessarily depend on the existence of an extensive welfare state. In so far as individuals trust others to undertake these care activities reliably and with high quality, outsourcing will take place. Thus, whether care activities are offered publicly or privately may not be critical for fertility. In “Liberal” welfare regimes, trust is put into the market. If the market can provide childcare, which is acceptable to more highly educated mothers (and their often equally-educated partners), individuals will be willing to use it as a potential provider of family related activities.

The validity of these arguments depends on two main assumptions. The first is that higher levels of trust are associated with a higher willingness to outsource traditional family activities. The second is that generalized trust levels are highly stable over time and, therefore precede institutions rather than follow them. In the following, we discuss the strong foundations of each of these assumptions.

Generalized Trust and Outsourcing

One of the underlying mechanisms linking generalized trust to fertility lies in the assumption that higher levels of trust facilitate the willingness to outsource traditional family activities. Inspired by Williamson’s (1975) transaction cost approach, the management and business administration literature have established solid findings supporting the positive role of trust for successful outsourcing relationships between firms and suppliers (e.g. Zaheer and Venkatraman 1995).

While the relationship between trust and outsourcing is well established in the management literature, it is only recently that it has been extended to the family (Raz-Yurovich 2014). De Rujter, van der Lippe and Raub (2003), using a transaction-cost framework, develop a theoretical model to explain why trust should matter for outsourcing decisions within families. Extending De Rujter, van der Lippe and Raub’s theoretical model (2003), Raz-Yurovich (2014) provides a comprehensive framework to understand the importance of outsourcing housework and care activities for work-family balance. As stressed by Rujter, van der Lippe and Raub (2003), she argues that trust is an essential facilitator for families when it comes to outsourcing decisions.

Turning to the empirical evidence, De Rujter and van der Lippe (2009) find that, among Dutch couples, the woman’s level of trust positively predicts outsourcing of childcare. Using the European Social Survey, Carl (2014) finds that generalized trust at the country- and individual-level is positively associated to choosing external childcare. Similarly, El-Attar (2013) finds that less trusting mothers are more likely to stay at home with young children.

The Stability of Generalized Trust over Time

The second assumption for which our argument rests is that generalized trust is stable over time. That is, we assume that current cross-national differences in generalized trust originated before the societal transition from the male breadwinner model to the new egalitarian model dominated by dual earner couples. The existing literature on generalized trust has taken three main approaches to show that modern cross-national levels of trust reflect historical differences in trust: (1) using existing survey measures of generalized trust; (2) comparing levels of trust among immigrants and their home countries, and (3) applying historical instrumental variables to explain current levels of trust.

While no historical measures of generalized trust exist, the generalized trust question ‘Generally speaking, do you think that most people can be trusted?’ was asked for the first time in the World Values Survey (WVS) in 1981 (for a review see Nannestad 2008). The WVS then further extended the sample of surveyed countries and repeated the question over several waves. Using the WVS and the Danish Social Capital Project, Bjørnskov (2007) shows that country-levels of generalized trust appear to be very stable over time. In our sample, we also find a significant and close to 0.9 correlation between country-level trusts in earlier and later waves (See Table S1 and Table S2 in the Supplemental Material). Nevertheless, these findings only cover a short period of time and are not sufficient to fully assess the historical stability of trust within countries.

An alternative approach is to compare the trust levels of second and higher-order immigrants to trust in their country of origin. The theoretical idea behind the intergenerational transmission of trust was first developed by Tabellini (2008), in which he argues that trust – a cultural trait – is transmitted from parents to children through early socialization. Several studies have shown that country-of-origin levels of trust are highly correlated to current levels of trust among second and higher-order generation immigrants (e.g. Algan and Cahuc, 2010; and Helliwell Wang and Xu 2014 for 130 countries). Also, Dohmen et al. (2012) use longitudinal German data to show that trust is highly transmitted between parents and children and that this mechanism is reinforced by positive assortative mating on trust.

Descriptively, the known correlates of cross-national levels of generalized trust are also found to be persistent, such as income inequality and religious composition (Bjørnskov, 2007). Building upon this idea, a third approach consists in showing how positive or negative historical events have long-lasting effects on current trust levels. This idea is well illustrated by Banfield (1958), who concludes that centuries of feudalism and servile relationship with local landowners in South of Italy created a detachment of inhabitants from any form of enlarged cooperation or association outside the family, giving a pervasive sense of distrust of each other. Similarly, Putnam (1993) argues that differences between the good institutions of northern Italian cities and the poor institutions of the south have origins that trace back to the Middle Ages. Guiso, Sapienza and Zingales (2008) find that indeed the historical experience of having lived in a free city-state can explain at least 50% of the North-South difference in social capital in Italy. Other examples include Durante (2010), who shows that regions exposed to bad climate conditions centuries ago and hence had a greater need for cooperation and risk sharing exhibit higher levels of trust today. Also, Nunn and Wantchekon (2011) show that, in Africa, the descendants of the ethnic group that experienced the worse slave trade raids have lower levels of generalized trust today. Finally, contributing to the idea that high levels of trust precede the development of the welfare state, Bergh and Bjørnskov (2011) and Bjørnskov and Svenden (2013) provide direct evidence of the causal effect of generalized trust on welfare generosity by using an instrumental variable approach. Taken together, these results suggest that generalized trust has a strong cultural component and remains a highly persistent trait over time.

Data, Measures, and Methods

The key challenge in our empirical implementation is to test whether generalized trust is associated with higher fertility as education among women expands. For this purpose, we use the World Values Survey and the European Values Study (WVS-EVS). The WVS-EVS is a series of repeated individual level surveys from 1981 to 2009. The sample size varies, but it is roughly around 1500 for each country for each wave – though not all countries participated in all rounds3. The WVS was carried out in five different waves roughly around the years 1981, 1990, 1995, 2000 and 2005. Similarly, the EVS has also completed four waves around 1981, 1990, 1999 and 2008. To carry out our empirical analysis, we first restrict our sample to men and women aged 40 years and over, where the dependent variable is the number of children ever born to the respondents. Furthermore, we exclude from our sample respondents for which our dependent variable, number of children, and our key independent variables of interest, trust and education, are missing4. Our final sample is composed of 93,213 individuals nested in 36 countries5. Table 1 presents means, standard deviations, and the minimum and maximum values for all variables included in the analysis.

Dependent variable

To measure individual fertility, respondents were asked in each survey wave “Have you had any children? If yes, how many?”. The variable can take values between 0 and 8, in which 0 corresponds to ‘No child’ and 8 to ‘8 children or more’. Our interest lies in completed fertility, which measures how many children individuals have over their lifetime. To do so we limit our sample to respondents of age 40 and above. We choose this age restriction in order to both maximize our sample size and accurately measure respondents’ completed fertility. While `late’ and `very-late fertility’ have increased in recent years in Western countries, fertility beyond age 40+ only represents about 2–4% of all births (e.g. Billari et al. 2007). Therefore, with selected sample we should capture completed fertility in satisfactory manner.

Independent variables

Regional- and national-level generalized trust

The question concerning generalized trust is framed as follows “Generally speaking, would you say that most people can be trusted or that you need to be very careful in dealing with people?” In the World Values Survey (WVS), the possible answers are dichotomous as follows: 0: “Can’t be too careful” versus 1 “Most people can be trusted”. We start the empirical analysis by generating average measures of generalized trust. We consider both regional and national measures of generalized trust because in line with fertility trends, we know that trust levels may vary within a country (Tabellini 2010). Each of the contextual trust variables represents the percent of respondents who agree with the statement “Most people can be trusted” (trusting respondents). We will refer to these measures as regional generalized trust and national generalized trust:

Regional generalized trustr,c,t = % trusting respondents in region r, in country c and in survey wave t.

National generalized trustc,t = % trusting respondents in country c and in survey wave t.

Our analytical sample is restricted to respondents of age 40+. However, aggregate-level trust measures are based on the entire sample, which includes respondents of all ages. The difference in sample selection derives from our theoretical framework: we argue that the societal trust levels should lead to higher individual fertility when women’s education is expanding. Consequently, in order to capture accurately the contextual levels of trust, we construct trust measures including survey answers from all respondents in the sample. Ideally, we would measure the societal levels of trust corresponding to the situation prior to when individuals were making their childbearing decisions, but such historical data does not exist. Nevertheless, given that trust is found to be rather stable, we assume that within country, trust levels should not have varied much over time.

Taking the average across time, Figure 3 shows the national generalized trust levels of the 36 countries. As is clear, there is a large variation across countries. The Nordic countries have by far the highest levels of trust, whereas the Eastern bloc countries have the lowest – though the country with the very lowest trust value is Cyprus. It is of interest to observe that trust is also rather high in English-speaking countries – and certainly higher than the trust levels observed in the Mediterranean countries.

Figure 3

Average trust scores based on the World Values Survey and the European Values Study

Cohort-level women’s education

Our second key variable of interest is cohort-level women’s education. We construct ten-year birth cohorts: <1920, 1920–1929, 1930–1939, 1940–1949, 1950–1959, and 1960–1969. In the WVS-EVS, the only education variable available across all survey waves and for a majority of countries is the following: “At what age did you (or will you) complete your full time education, either at school or at an institution of higher education? Please exclude apprenticeships.” The variable can take values between 1 and 10, where 1 corresponds to “<12 years’ and 10 to ‘21+ years’. For each cohort within each survey wave, the cohort-level women’s education measure is given as the percent of women that have completed their full time education at age 21+. We will refer to this variable as cohort-level women’s education:

Cohort-level women’s educationb,r,c,t = % women who completed their full-time education by age 21 or above in the birth cohort b, in region r, country r and in survey wave t.

While the age at completing education does not exactly reflect the education level obtained, belonging to the category 21 years and above will necessarily suggest that the respondent obtained some tertiary education. Since it is aggregated over cohorts, the measure will capture the degree of expansion of women’s education when the respondents were having children. In this case, the cohort measure is preferred to the period measure because it reflects better the historical context in which childbearing decisions were made.

Generalized trust x Cohort-level women’s education

The key empirical test lies in the interactions between the contextual measures of generalized trust and women’s education at the cohort level. Our key explanatory variables are, thus, macro-macro level interactions between contextual measures of trust and cohort-level women’s education. We expect to find a positive and significant interaction: in regions and countries with high levels of generalized trust, completed fertility should be higher as women’s enrollment in higher education is expanding. We include interaction variables between regional generalized trust and cohort-level women’s education as well as national generalized trust and cohort-level women’s education.

Control variables

We include several variables at the individual level that provide alternative explanations for completed fertility. In particular, we consider the respondent’s religious attendance, gender and age at the time of survey. Also, we include a set of dummy variables for each survey wave. Religious attendance is considered as a possibly confounding variable because religiosity tends to be associated with higher levels of fertility (e.g. Hayford and Morgan 2008). The variable is based on the survey question “Apart from weddings, funerals and christenings, about how often do you attend religious services these days?”, which takes eight possible answering ranging from 1 ‘More than once a week’, 2 ‘Once a week’, 3 ‘Once a month’, 4 ‘Only on special holy days/Christmas/Easter days’, 5 ‘Other specific holy days’, 6 ‘Once a year’, 7 ‘Less often’, and 8 ‘Never practically never’. We transform the responses into an indicator variable taking the value of 1 when respondents attend religious services at least once a month and 0 otherwise. While there should not be substantial difference in terms of fertility by gender, as men might underreport their fertility in surveys (Rendall et al. 1999). Consequently, we control for the gender of the respondent: 1 being ‘woman’ and 0 ‘man’. Then, we construct a categorical variable based on the respondent’s age: ‘40–49’, ‘50–59’, ‘60–69’, and ‘+70’. Age at survey is calculated by taking the difference between the survey year and the respondent’s birth year. Finally, we include a set of dummy variables for each wave: EVS 1981–1984, EVS 1989–1993, EVS 1999–2004, EVS 2005–2009, WVS 1989–1993, WVS 1994–1999, WVS 1999–2004, and WVS 2005–2009. The wave dummies control for period effects and differences between surveys.

The number of control variables included in the analysis is rather conservative. We select variables that are relevant to our outcome variable. However, we exclude any potentially intermediate variables such as educational attainment, marital or employment status. The latter variables are excluded because “bad controls” may induce a selection bias in our regression analysis (Angrist and Pischke 2008, Chapter 3: 64–68; Elwert and Winship 2014).


Given the multilevel nature of our data and hypotheses, we implement a series of four-level Poisson regression models with a natural logarithm as a link function (Skrondal and Rabe-Hesketh 2004: 182–183). The four-level data structure is composed of: 93,213 observations (level-one units) nested within 711 birth cohorts (level-two units) nested within 123 regions (level-three units) nested within 36 countries (level-four units).

The Poisson distribution is selected because it is better suited than standard linear regression to model our dependent variable – number of children, which is a count variable6. In its simplest form, our model is specified as follows:

ln(Cibrc) = β0β1Xibrcβ2Tbrcβ3Zrcβ4Wcδcεrcγbrceibrc

where the subscript i denotes the individual, b denotes the birth cohort, r denotes the region, and c denotes the country. The outcome variable Cibrc is the expected number of children for individual i. Variables are defined at each level of the data structure: X for the individual-level, T for the birth cohort-level, Z for the region-level and W for the country-level. The variance-components of our model are specified according to the four levels: the country-specific error term is denoted by δc, the region-specific error term is denoted by εrc, the birth cohort-specific error term is denoted by γbrc and the individual-specific error term is denoted by eibrc. We fit our models using Markov chain Monte Carlo estimation as implemented by MLwiN through the STATA module runmlwin. All estimation results are based on 100,000 Markov chain Monte Carlo samples, with a burn-in of 10,000. The MCMC approach tends to give less biased estimates with respect to quasi-likelihood estimation when the outcome variable is discrete and also when the data presents itself with small within cluster sample size (Rodriguez 2008). Furthermore, the relative fit of the models can be assessed by using the deviance information criterion DIC (Spiegelhalter et al. 2002). A model with a lower DIC-value indicates a relatively better fit but only if the difference is larger than 5 (Lunn et al. 2012).

The variables we include in the empirical analysis are summarized in Table 1. At the individual level, we include religious attendance, gender, age categories and survey wave indicator variables. At the birth-cohort level, we use the women’s education measure. At the regional and national level, we include the average generalized trust measures. One issue with using individual and macro variables from the same sample is that they tend to be highly correlated. To avoid this problem, we center the lower level variable on the higher level mean (Skrondal and Rabe-Hesketh, 2004: 52). Including only one variable, our model takes the following form:

In multilevel analyses, we are primarily interested in the interaction between regional- and country-levels of generalized trust and birth cohort-levels of average women’s education. For example, considering simply average national generalized trust and average birth cohort women’s education, the model would take the following form:

ln(Cibrc) = β0β1Tcχ3Ebrcα(Tc × Ebrc) + δcεrcγbrceibrc

A positive value of α suggests that individuals living in a country with both high levels of trust and from a birth cohort with high women’s education tend to have more children as both the aggregate variables increase in value.


Descriptive analysis

We, first, describe the historical trends in fertility and women’s education by national levels of generalized trust. As presented in Table 2, we classify countries in our sample into quartiles according to their average levels of generalized trust.

Table 2

Country classification by generalized trust quartile

Over the past decades, regardless of the trust quartiles, women have increased their participation in higher education while fertility levels have declined (Table 3). However, the decline in TFR levels between 1970 and 2009 is moderate in the fourth quartile, i.e. where generalized trust is on average the highest. In the top trust quartile, median fertility levels reach about 1.9 children, whereas in the lower trust quartiles, the median TFRs are of about 1.4–1.5 children.

Table 3

Generalized trust, total fertility rate and women’s enrollment in tertiary education

Following a similar approach to Rindfuss, Guzzo and Morgan (2003), we first compute the year-to-year correlation between women’s enrollment in tertiary education and TFR for low trust countries and high trust countries. Then, we calculate the sensitivity of a change in fertility to a change in women’s education. Countries in the first and second trust quartiles are assigned to the low trust category while countries in the third and fourth trust quartiles are put in the high trust categories. This simple descriptive illustration shows whether the relationship between women’s education and fertility differs between low and high trust countries.

For each year between 1971 and 2012, Figure 4 shows the correlation between women’s enrollment in tertiary education and TFR for low and high trust countries. The patterns are drastically different between the two groups. In high trust countries, the correlation coefficient starts at a low of −0.3 in 1971, becomes positive in 1985 and reaches levels around 0.5–0.6 in the 2000s. In low trust countries, the correlation coefficient remains negative throughout the whole period with the exception of the years 1976 and 1977.

Figure 4

Year-by-year correlation between total fertility rate and women’s enrollment in tertiary education by generalized trust levels, 1970–2012

We also use the sensitivity measure proposed by Rindfuss, Guzzo and Morgan (2003):

where t is the last year with observed data and 1 is the first year in which the TFR falls below replacement rate. Again, confirming our initial descriptive analysis, we find that the sensitivity of a change in fertility to a change in women’s enrollment in tertiary education differs substantially among the trust quartiles. Going from the lowest trust quartile to the highest, we find an average sensitivity of a change in fertility to women’s education of: −0.19; −0.15; −0.11; 0.03. The sensitivity measure is, on average, the most negative in low trust countries and the closest to zero in high trust countries. A 1% increase in women’s education in low trust countries is associated to an 19% decline in fertility. In contrast, a 1% increase in women’s education in high trust countries is associated to a 3% decline in fertility.

Taken together, these descriptive findings suggest that the relationship between women’s education and fertility drastically differ between low and high trust countries. As expected, high trust countries appear to provide a better setting for work and motherhood reconciliation than low trust countries, at least descriptively. We now turn to the multilevel analysis to provide a more rigorous empirical test of our theoretical argument.

Multilevel analysis

Table 4 presents results from the multilevel Poisson analysis of the association between the cross-level interactions of generalized trust and women’s education with completed fertility. All the models in Table 4 are presented as exponentiated coefficients. Model 1 includes the cross-level interaction between regional generalized trust and cohort-level women’s education, Model 2 the cross-level interaction between national generalized trust and cohort-level women’s education, and Model 3 both regional and national cross-level interactions.

Table 4

Multilevel Poisson models of the association between completed fertility and the interaction of generalized trust and women’s cohort education

The three models presented in Table 4 show that the variances of the country, region and cohort random terms, though small in magnitude, remain significant after including the key explanatory variables of interest as well as the control variables. The variation is largest between countries, followed by birth cohort whereas the variation of the region random term is rather small.

We now turn to the fixed part estimates of the multilevel results. As expected, Model 1 reveals that the interaction between regional generalized trust and cohort-level women’s education is positively associated (>1) with the number of children. The baseline of the cross-level interaction, regional trust and women’s education, are both negatively associated to the outcome variable. Thus, when regional generalized trust is zero, individual fertility outcomes decline as women’s access to tertiary education expands. Similarly, regional generalized trust is negatively associated with the number of children as women’s education is null. Taken together with the effect of regional generalized trust and cohort-level women’s education, the positive and significant cross-level interaction suggests that a regional context of high generalized trust moderates the decrease in fertility as women’s education is expanding. Looking at Model 2 and 3 in Table 4, we also find a positive interaction between the contextual measures of generalized trust and women’s education. Differently from Model 1, the results suggest that there is no significant association between generalized trust and fertility when cohort-level women’s education is zero. In other words, at the earliest stage of women’s revolution, national-level generalized trust is not relevant in explaining different fertility levels. Turning to Model 3 in Table 4, we find that when including both the regional and national cross-level interactions, only the interaction between national generalized trust and cohort-level women’s education remains statistically significant. Thus, Model 3 suggests that the cross-level interaction between national generalized trust and women’s education dominates the regional one when explaining number of children. Taken together, the results show that a context of high generalized trust moderates the decline in individual completed fertility as women’s education expands.

We illustrate graphically the results presented in Table 4 by predicting the number of children for different combinations of levels of trust and women’s education. Figure 5 illustrates the average predicted number of children for the cross-level interactions between generalized trust and cohort-level women’s education using Model 3 in Table 4. We present two different sets of predictions. On the left quadrant, we illustrate three theoretical trust scenarios, while on the right quadrant, we present the average predictions for trust levels corresponding to Norway and Cyprus.

Figure 5

Interaction between generalized trust and cohort-level women’s education predicting number of children

In the left quadrant, the average predictions are computed for different values of the women’s education variable in three different context of generalized trust. The first line (long-dash and blue) represents a hypothetical country where both regional and national trust levels are low (Low value refers to the 1st-quartile). The second line (full line and green) illustrates the association between women’s education and the number of children, when both regional and national trust levels are intermediate (Intermediate value refers to the median). Finally, the third line (short-dash and red) represents the scenario where generalized trust is high (High value refers to the 3rd-quartile). Figure 5 shows that, in all three case scenarios, as women’s education increases fertility declines. Figure 5 also shows how at higher levels of women’s education generalized trust moderates the decline in the average predicted number of children. In line with our theoretical argument, we find that when the women’s education process is at its early stage, different contextual levels of generalized trust do not seem to matter for predicting completed fertility. This is illustrated in Figure 5 in the top left quadrant, where we can observe that the three different lines are almost overlapping. It is only when we start moving from a traditional to an egalitarian society – when the share of highly educated women is increasing – that the role played by trust kicks in. According to Figure 5, when the cohort-level measure of education reaches about 20%, the decline in the average predicted number of children is most precipitous in national contexts where generalized trust is low and moderate when it is high. The difference in the size effects of each trust scenario may seem modest but small differences in fertility rates have large consequences on population size (e.g., Coleman and Rowthorn 2011, 223–224).

In the right-quadrant, we take two existing contexts of generalized trust: Cyprus and Norway. Cyprus is the country with the lowest national levels of trust and Norway is the country with the highest national levels of trust in the survey (See Figure 3). The first line (long-dash and blue) represents a country that takes the average national generalized trust level of Cyprus. The second line (short-dash and red) illustrates the case of Norway. Figure 5 shows that the two opposite contexts of generalized trust predict a very different association between women’s education and the number of children. While in the case of Norway, we observe that the average number of children weakly declines, but still stays around replacement level, as cohort-women’s education increases. For the case of Cyprus, we find that the average predicted number of children steeply declines as the share of highly educated women increases.

Overall, the average predictions at very high values of women’s education should be interpreted with caution as they represent fewer cases and the youngest cohort. Nevertheless, these results give us a hint as to the role that generalized trust may play in the future for completed fertility outcomes as women’s education expansion continues.

For control variables, estimated coefficients are consistent throughout the three models summarized in Table 4. We find that respondents that attend religious services once a month or more have about 11% more children than those who do not. The estimates also indicate that women have more children. This result is most likely driven by both underreporting of the number of children by men and the slightly earlier age at which women reach their completed fertility with respect to men. Also, we find that the number of children significantly increases with respondents’ age, which suggests that older respondents are more likely to have reached their total number of children with respect to the youngest age group (Age: 40–49). The wave indicators’ estimates reveal that, within country, the number of children has declined in more recent waves with respect to the first wave (EVS 1981).

Finally, we compare the relative fit of the models presented in Table 5. Model 1 and 3 are weakly preferred to Model 2 because the values for the DIC improve by, respectively, 6 and 4 points.

Robustness Checks

We briefly assess the robustness of the results presented above. We summarize the results of each robustness check in Table 5. First, we compare the precision of the estimated coefficients using alternative estimation methods and statistical specification (Model “PQL2”). Second, we test the robustness of our results to the addition of further control variables (Model “additional var.”). Third, we detect and exclude influential cases at the highest-level of our analysis (Model “w/o influential”). Fourth, we replicate our main results with a stricter age cut-off by excluding respondents under the age of 50 (Model “age > 50”). Fifth, we exclude the Eastern bloc countries to assess to what extent they contribute to our results (Model “w/o Eastern block”). Finally, we re-estimate our main model simplifying the models by removing the regional level (w/o region level). Throughout all these analyses, we consistently find a positive and statistically significant cross-level interaction between the degree of expansion of women’s education and contextual levels of generalized trust. Results from the robustness analysis as well as a more complete explanation of how each test is carried out can be found in the Supplementary Materials.


We have shown that (generalized) trust is potentially a key ingredient to explain the divergence in fertility trends observed in industrialized societies during the last decades. Our hypothesis builds on the idea that trust becomes important for fertility as women enter higher education in greater numbers. The combination of work, childbearing and childrearing requires that traditional care activities within the family be outsourced. Without outsourcing, women cannot be expected to manage both work and have children with ease, which results in fertility decline. Generalized trust increases individuals’ willingness to outsource childcare activities to other individuals. In male breadwinner societies, generalized trust was not relevant for fertility, because care activities were in any case undertaken within the extended family. In higher trust contexts, with the expansion of women’s education, the demand for outsourcing increased, generating either policy- or market-based responses.

Our argument has important implications for the understanding of fertility dynamics and reproduction in modern societies in general. The interaction between persistent cultural traits such as generalized trust and broad structural change such as the expansion in women’s educational attainment generate different fertility dynamics. If that is true, for some countries, the fertility decline experienced in recent decades may be rather long lasting, or at least hard to reverse. To some extent, the argument also reconciles the fact that fertility levels are high both in Social democratic countries such as Sweden and Norway, and in Liberal countries, such as the US and the UK. In other words, generous public welfare provision, which is the hallmark of Scandinavian societies, is not a necessary pre-condition for high fertility.

The arguments put forward also relate to the literature concerning family ties. Here the idea is that long standing patterns of family ties matter for demographic behavior (e.g. Reher 1998; Dalla Zuanna 2001), and that the centrality of the family is not necessarily beneficial for promoting fertility. Interestingly, generalized trust is inversely correlated with strong family ties. Taking generalized trust as a starting point has greater appeal however, both because it represents a clearly defined behavioral concept and because it finds its place in survey questionnaires.

The former communist countries in Eastern Europe fit the observed pattern in the sense that both fertility and generalized trust are low. The fertility decline, in these countries, happened at a later stage than in the Mediterranean countries. It was driven by the collapse of communism, which brought about dramatic societal upheavals in many of these former Soviet states (Billingsley 2010; Sobotka 2002). Since then, women’s labor force participation has declined or stagnated in many socialist countries (Kotowska and Jozwiak 2003), while women’s enrollment in higher education has continued to increase. The result, as with the Mediterranean countries, is lower fertility.

Our empirical analyses provide rather robust support for our theoretical argument through both descriptive analysis and multilevel regression techniques. We conduct several checks and our empirical findings are robust to different estimations methods and model specification, additional confounding variables, stricter age selection and influential points. In particular, we consistently find a positive and statistically significant cross-level interaction between the degree of expansion of women’s education and contextual levels of generalized trust.

Our analysis does not come without caveats. Our theoretical arguments build on a dynamic perspective starting from the male breadwinner model of the 60s and the 70s. Our data, however, starts in 1981 and not all countries were even included in the WVS-EVS at that time. For example, we cannot empirically address the potential effect of the Soviet experience on generalized trust as there is no pre-1989 data for those countries. There is consequently an inconsistency between our theoretical and empirical arguments. Nor are we able to capture very recent fertility trends. The survey question on generalized trust is binary, and hence rather crude compared to other surveys such as the ESS, where trust is based on the 10-point scale. There are also clear outliers that do not fit the argument. The WVS-EVS reports for instance low generalized trust in France, whereas fertility is generally high. Japan and South Korea are other two examples, where fertility is extremely low, but where generalized trust levels are intermediate. Consequently, we acknowledge that country-specific fertility trends may very well depend on country specificities not captured in our empirical modeling. Additional research will shed new light on our idea.


The work has benefited from useful discussions with Leif Andreassen, Bruno Arpino, Joan Carreras Timoneda, Daniela del Boca, Pearl Dykstra, Gøsta Esping-Andersen, Vincenzo Galasso, Tale Hellevik, Chris Flinn, Diego Gambetta, Letizia Mencarini, Werner Raub, Wendy Sigle-Rushton, Thomas Siedler, Liat Raz-Yurovich, Jan van Bavel, Agnese Vitali, and participants at Alp-pop (La Thuile, 2001), AISP-SIS (Ancona, 2001) and at seminars at ISER (Essex) and the Oxford Institute of Population Ageing. Special thanks to Vicky Bancroft. Arnstein Aassve gratefully acknowledges financial support from the European Research Council through the starting grant ERC-2007-StG-201194 (CODEC - Consequences of Demographic Change). Arnstein Aassve and Francesco Billari gratefully acknowledge support from the Italian Ministry of Education, University and Research (PRIN Programme – “Programmi di ricerca di Rilevante Interesse Nazionale”). Léa Pessin gratefully acknowledges financial support during her PhD from the European Research Council through the advanced ERC Grant ERC-2010-AdG-269387 (Family polarization, P.I. Gøsta Esping-Andersen) and during her postdoctoral fellowship from the Eunice Kennedy Shriver National Institute of Child Health and Human Development to the Population Research Institute at The Pennsylvania State University for Population Research Infrastructure (R24HD041025) and Family Demography Training (T-32HD007514).



Arnstein Aassve

Arnstein Aassve is Professor of Demography at the Department of Policy Analysis and Public Management at Bocconi University. His research concentrates on population, culture and institutions and has recently published in Demography, the European Journal of Population and the European Sociological Review. He was recently awarded a European Research Council advanced grant to study the interaction between institutional quality and cultural traits as a driver of recent demographic change.


Francesco C Billari

Francesco C Billari is Professor of Sociology and Demography at the Department of Sociology and Professorial Fellow at Nuffield College, Oxford University, UK. His research has mainly focused on fertility and family change, the transition to adulthood, life course analysis, population forecasting, agent-based modeling, and the design and analysis of comparative surveys. He is a Fellow of the British Academy and an Honorary President of the European Association for Population Studies.


Léa Pessin

Léa Pessin is a NICHD postdoctoral fellow at the Population Research Institute, Pennsylvania State University. In 2016, she obtained her PhD degree from the Pompeu Fabra University, with a dissertation titled “Changing Gendered Expectations and Diverging Divorce Trends - Three Papers on Gender Norms and Partnership Dynamics.” Her main research interests include women’s work and family, gender norms, partnership and fertility dynamics.


1The causal relationship between generalized trust and economic growth is not entirely settled. Nevertheless, several recent articles using sophisticated identification techniques have found strong evidence in support of causal channel from trust to economic growth (See Algan and Cahuc 2013 for a review).

2Building our argument around women’s educational attainment is done for both theoretical and empirical reasons. First, we use cohort-level women’s education attainment as an indicator of the advancement of the gender revolution (Goldin 2006, p.2). While the expansion of women’s access to higher education is a shared phenomenon across industrialized countries, trends in women’s labor force participation, similarly to fertility, vary widely across countries. Compared to trends in women’s labor force participation, it is of a more exogenous nature. In fact, we consider women’s labor market participation to be, as fertility, an outcome of the process we attempt to describe. With higher educational attainment, women should have a better access to the labor market, but whether they can also afford to work also depends on the extent to which they can reconcile work and motherhood.

3As a result, our final sample is an unbalanced repeated cross-section according to data availability.

4The distribution of missing variables is reported in Table S15 in the Supplementary Materials.

5Table S3 in the Supplementary Material describes the analytical sample size by country and for each wave. Countries were selected based on two criteria (1) whether they were EU or high-income OECD members (World Development Indicators classification) by the last wave of the survey; (2) if they had experienced fertility levels around or below replacement rates. Table S4 in the Supplementary Material lists in details the regions included in the analysis.

6The dependent variable does not present any evidence of overdispersion (Mean = 2.17 and Variance = 2.03). Only about 12% of the sample is childless, thus, we do not have an excess of zero count data.

Contributor Information

Arnstein Aassve, Carlo F. Dondena, Centre for Research on Social Dynamics and, Department of Policy Analysis and Public Management, Università Bocconi, Via Guglielmo Röntgen 1, 20136 Milan, Italy, ti.inoccobinu@evssaa.nietsnra, Tel. + 39. 02.5836. 5657, Fax. + 39.02.5836.2798.

Francesco C. Billari, Department of Sociology and Nuffield College, University of Oxford.

Léa Pessin, Population Research Institute, The Pennsylvania Research Institute.


  • Aassve Arnstein, Sironi Maria, Bassi Vittorio. Explaining Attitudes towards Demographic Behavior. European Sociological Review. 2013;29(2):316–333.
  • Aghion Philippe, Algan Yann, Cahuc Pierre. Can Policy Influence Culture? Minimum Wage and the Quality of Labor Relations. NBER Working Paper 14327. 2008
  • Alesina Alberto, La Ferrara Eliana. Who Trusts Others? Journal of Public Economics. 2002;85:207–234.
  • Algan Yann, Cahuc Pierre. Inherited Trust and growth. The American Economic Review. 2010;100(5):2060–2092.
  • Algan Yann, Cahuc Pierre. Trust and growth. Annual Review of Economics. 2013;5(1):521–549.
  • Angrist Joshua D, Pischke Jörn-Steffen. Mostly harmless econometrics: An empiricist’s companion. Princeton University Press; 2008.
  • Arpino Bruno, Esping-Andersen Gosta, Pessin Léa. Changes in Gender Role Attitudes and Fertility: A Macro-Level Analysis. European Sociological Review. 2015;31(3):370–382.
  • Balbo Nicoletta, Billari Francesco C, Mills Melinda C. Fertility in Advanced Societies, a Review. European Journal of Population. 2013;29(1):1–38.[PMC free article][PubMed]
  • Banfield Edward C. The Moral Basis of a Backward Society. New York: Free Press; 1958.
  • Bergh A, Bjørnskov C. Historical trust levels predict the current size of the welfare state. Kyklos. 2011;64(1):1–19.
  • Billari FC, Kohler HP. Patterns of low and lowest-low fertility in Europe. Population Studies. 2004;58(2):161–176.[PubMed]
  • Billari Francesco C, Kohler Hans P, Andersson Gunnar, Lundström H. Approaching the Limit: Long - Term Trends in Late and Very Late Fertility. Population and Development Review. 2007;33(1):149–170.
  • Billingsley S. The Post-Communist Fertility Puzzle. Population Research and Policy Review.


Driver behavior impacts traffic safety, fuel/energy consumption and gas emissions. Driver behavior profiling tries to understand and positively impact driver behavior. Usually driver behavior profiling tasks involve automated collection of driving data and application of computer models to generate a classification that characterizes the driver aggressiveness profile. Different sensors and classification methods have been employed in this task, however, low-cost solutions and high performance are still research targets. This paper presents an investigation with different Android smartphone sensors, and classification algorithms in order to assess which sensor/method assembly enables classification with higher performance. The results show that specific combinations of sensors and intelligent methods allow classification performance improvement.

Citation: Ferreira J Júnior, Carvalho E, Ferreira BV, de Souza C, Suhara Y, Pentland A, et al. (2017) Driver behavior profiling: An investigation with different smartphone sensors and machine learning. PLoS ONE 12(4): e0174959.

Editor: Houbing Song, West Virginia University, UNITED STATES

Received: August 12, 2016; Accepted: March 17, 2017; Published: April 10, 2017

Copyright: © 2017 Ferreira et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability:

Funding: This work was supported by MCTI/CT-Info/CNPq, process 277440880/2013-0. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

1 Introduction

Driver behavior strongly impacts traffic security [1] and causes the vast majority of motor vehicle accidents [2]. In 2010, the total economic cost of motor vehicle crashes in the United States was 242 billion [3]. This figure represents the costs for approximately 33 thousand fatalities, 4 million nonfatal injuries, and 24 million damaged vehicles. Driver behavior adaptations might increase overall security and lessen vehicle fuel/energy consumption and gas emissions [4, 5]. In this context, driver behavior profiling tries to better understand and potentially improve driver behavior, leveraging a safer and more energy aware driving.

Driver monitoring and analysis or driver behavior profiling is the process of automatically collecting driving data (e.g., speed, acceleration, breaking, steering, location) and applying a computational model to them in order to generate a safety score for the driver. Driving data collection may be achieved by several kinds of sensors, from the general ones in smartphones, to dedicated equipment such as monitoring cameras, telematics boxes, and On-Board Diagnostic (OBD) adapters.

Modern smartphones provide sensors suitable to collect data for driver profile analysis. Previous work [6–8] shows that properly preprocessed and handled smartphone sensors data are an interesting alternative to conventional black boxes for the monitoring of driver behavior.

Driver behavior profiling relevance has grown in the last few years. In the insurance telematics domain, plans such as Usage-Based Insurance (UBI) or Pay-How-You-Drive (PHYD) make car insurance cheaper by rewarding drivers with good driving scores, instead of only considering group based statistics (e.g., age, gender, marital status) for that end. In the freight management domain, automated, continuous, and real-time driver behavior profiling enables managers to institutionalize campaigns aiming to improve drivers score, and, as a consequence, decrease accidents, and increase resource economy, and vehicle lifetime. Furthermore, notifications of unsafe driving events presented to drivers in real-time can help prevent accidents. For example, a smartphone app may notify the driver every time she performs an aggressive turn.

Several driver behavior profiling work [9–14] use a smartphone based sensor-fusion to identify aggressive driving events (e.g., aggressive acceleration, aggressive break) as the basis to calculate driver score. Another work [15] uses vehicle sensor data to provide driving tips and assess fuel consumption as a function of driver profile. The machine learning algorithms (MLAs) employed in these papers come down to fuzzy logic or variations of Dynamic Time Warping (DTW). Dynamic Time Warping is an algorithm to find similar patterns in temporal series. It was originally employed in the speech recognition problem [16]. We believe that other MLAs and sensor combination can be applied to the task of identifying aggressive driving events with promising results. In this context, to the best of our knowledge, there is no work that quantitatively assesses and compares the performances combinations of smartphone sensor and MLAs in a real-world experiment.

The main contribution of this work is to evaluate the performance of multiple combinations of machine learning algorithms and Android smartphone sensors in the task of detecting aggressive driving events in the context of a real-world experiment. We perform a data-collecting phase, driving a car, and gathered data from several different sensors while performing different maneuverers. We present how the machine learning methods can be employed in the task and evaluate the accuracy of the combination of sensors and technique aiming to find the best match of sensor/technique to each class of behavior.

The remainder of this paper is organized as follows. In Section 2 we present a comprehensive set of works that are related to our proposal, followed by Section 3, in which we present concepts of the employed techniques. Section 4 describes the methodology, presenting the data-gathering phase and the details of how we model the proposed machine learning application. It is followed by results and discussion (Section 5). Finally, we conclude the paper presenting the conclusions and pointing out potential future work.

2 Related work

In this section we describe recent driver behavior profiling work. It is worth noting that several driver behavior profiling solutions are commercially available mowadays, mostly in the insurance telematics and freight management domains. Examples include Aviva Drive (, Greenroad (, Ingenie (, Snapshot (, and SeeingMachines ( However, technical details of these solutions are not publicly available.

Nericell, proposed by Mohan et al. [17], is a Windows Mobile smartphone application to monitor road and traffic conditions. It uses the smartphone accelerometer to detect potholes/bumps, and braking events. It also employs the microphone to detect honking, and the GPS/global system of mobile (GSM) communications to obtain vehicle localization and speed. Braking, bumps and potholes are detected by comparing a set of empirically predefined thresholds to abrupt variations of accelerometer data or to their mean over a sliding window of N seconds. No MLA is employed in the detection of such events. Some event detection results in terms of False Positives (FPs) and Fale Negatives (FNs) include: 4.4% FN, and 22.2% FP for breaking events; 23% FN, and 5% FP for bumps/potholes detection at low speed (<25 kmph); and 0% FN and FP for honk detection on an exposed vehicle (e.g., a motorbike).

Dai and colleagues [9] propose an Android application aimed at real time detection and alert of dangerous driving events typically related to Driving Under the Influence (DUI) of alcohol. The application uses the smartphone accelerometer and orientation (yaw, pitch, and roll angles) sensors to detect Abnormal Curvilinear Movements (ACM) and Problems in Maintaining Speed (PMS), which are the two main categories of drunk driving related behaviors. A series of equations are used to determine lateral and longitudinal acceleration vectors. An ACM event is detected if the difference between maximum and minimum values of lateral acceleration within a 5 seconds time window exceeds an empirical threshold. A PMS is detected if longitudinal acceleration exceeds positive or negative fixed empirical thresholds at any given time. Similarly to [17], no MLA is employed for event detection. Experimental results include: 0% FN, and 0.49% FP for abnormal curvilinear movements; and 0% FN, and 2.90% FP for problems of speed control.

An iPhone application called MIROAD was created by Johnson and Trivedi [10]. MIROAD uses a smartphone based sensor-fusion of magnetometer, accelerometer, gyroscope, and GPS to detect aggressive driving events and accordingly classify driver’s style into aggressive or nonaggressive. Aggressive events are detected by a single classifier based on the DTW algorithm. All processing is executed in real-time on the smartphone. Experimental analysis shows that 97% of the aggressive events were correctly detected.

WreckWatch, proposed by White et al. [18], is an Android smartphone-based client/server application to detect car accidents. The client application detects accidents, records related data, and sends them to the server application which can notify relevant authorities. WreckWatch client uses data from the accelerometer, GPS, and microphone, with threshold-based filtering to detect an accident. The accident prediction framework is composed of a 11-tuple model of the phone state, and a function that evaluates the model to signal if it represents an accident. Model variables include the maximum acceleration experienced in any direction, and an indication if a loud sound has occurred. One scenario that triggers accident detection is when acceleration, sound, and vehicle speed are all higher than empirical thresholds. Similarly to [9, 17], no MLA is employed for accident detection. Experiment results show that dropping a phone is unlikely to cause FPs, some accidents may not be detected with smartphones, and acoustic data is not enough to detect accidents.

Araujo and colleagues [15] present a smartphone application to evaluate fuel consumption efficiency as a function of driver behavior. The application also provides real-time driving hints such as shift gear earlier and accelerating too high. Instead of collecting data from smartphone sensors, this application uses an OBD Bluetooth adapter and the Torque Pro smartphone app to collect data (e.g., speed, acceleration, RPM) from vehicle sensors. After data collection, the app extracts a series of features and applies three classifiers (one linear discriminant and two fuzzy systems) to them. All processing is executed in real-time on the smartphone.

Eren et al. [11] propose an iPhone application to classify driver behavior as either safe or risky based on risky driving events. The application detects sudden turns, lane departures, braking and acceleration events. The sensors used for event detection are the smartphone accelerometer, gyroscope, and magnetometer. The application uses the endpoint detection algorithm to demarcate start and end times of an event. The demarcated event is then compared with template event data by means of the DTW algorithm. Finally, a Bayesian classifier labels a driver’s behavior as safe or risky based on the number of events over time. Experimental results show that 14 out of 15 drivers (93.3%) were correctly classified. It is worth noting that the paper only provides driver classification results. Hence, event classification performance results are not provided.

The work by Fazeen and colleagues [19] uses an Android smartphone accelerometer and GPS to identify vehicle conditions (speed and shifting), driving patterns (acceleration/deceleration and changing lanes), and road condition (smooth, uneven, rough, or containing a bump or pothole). Events are mainly detected by calculating the time duration, difference, and slope between successive accelerometer readings on certain axes and comparing them to empirical fixed/dynamic thresholds. For example, the work states that safe acceleration and deceleration never reach a g-force of more than ±0.3g on y-axis. Similarly to [9, 17, 18], no MLA is employed for event detection. Experimental results include the following road anomaly classification accuracy: 81.5% for bumps, 72.2% for potholes, 75% for rough roads, 91.5% for smooth roads, and 89.4% for uneven roads.

Castignani et al. [12] propose a driver behavior profiling mobile tool based on fuzzy logic that makes use of accelerometer, magnetometer, and gravity sensors in Android smartphones. This tool classifies driver behavior as Normal, Moderate, and Aggressive, which correspond to a driving score between 0 (best) and 100 (worst). The classification and score are not processed in real-time as sensor data are collected by UBI-Meter mobile application and stored locally on the smartphone. Later on, these data are sent to a remote server on the Internet for processing. The work by Saiprasert and collaborators [13] employs GPS, accelerometer, and magnetometer smartphone sensors to profile driver behavior as Very Safe, Safe, Aggressive, and Very Aggressive. This profile is calculated in real-time by detecting relevant driving events. Event detection is performed by a variation of the DTW algorithm [20].

SenseFleet, proposed by Castignani et al. [14], is a driver behavior profiling platform for Android smartphones that is able to detect risky driving events independently from the mobile device and vehicle. The mobile application collects data from accelerometer, magnetometer, gravity sensor, and GPS smartphone sensors and makes use of a fuzzy system to detect risky events such as excessive speed, turning, acceleration, and breaking that might occur in a trip. The application calculates a driving score between 0 (worst) and 100 (best) for each trip as a function of detected risky events. All processing is done in real-time on the smartphone. Authors performed several experiments. In one of these experiments, more than 90% of risky events were correctly identified by the application.

Wahlstrom, Skog and Handel [21] propose a framework for the detection of dangerous vehicle cornering events based on the theoretical likelihood of tire slippage (slipping) and vehicle rollover. The Global Navigation Satellite System (GNSS)—a general term for GPS—of Android smartphones is the sole sensor used for event detection. By employing classical mechanics equations, theoretical thresholds for slipping and vehicle rollover are defined. The slipping threshold is defined with respect to tangential and rotational velocities, tangential acceleration, and the coefficient of friction between the vehicle’s tires and the road. The vehicle rollover threshold is defined with respect to the vehicle’s center of gravity, track width, and height. Similarly to [9, 17–19], no MLA is employed in event detection. Experimental analysis with GNSS data collected from three different Android smartphone showed an average rate of 13% for both FP and FN as best results.

We have identified some noteworthy points in related work presented here: (i) all papers—except [21]—use sensor-fusion data as input for the event detection algorithms; (ii) the employed MLAs come down to fuzzy logic [12, 14, 15] or variations of DTW [10, 11, 13]; (iii) most recent papers make use of smartphones sensors instead of vehicle or telematics boxes sensors; and (iv) some papers [9, 17–19, 21] do not use MLAs to detect driving events, instead they employ physical equations or fixed/dynamic thresholds or a combination of both to this end. In our work, we run the machine learning algorithms locally, although, future versions might run on a cloud-based fashion, as presented in [22, 23]. Other approaches related to driver behavior are related to social mechanisms, like the evaluation performed by Song and Smith [24], in which driver behavior is investigated in a high-occupancy toll lane. A more comprehensive approach about smart and connected communities can be seen in the work by Sun and colleagues [25]. For a more thorough comparison of smartphone-based sensing in vehicles please refer to the work by Engelbrecht and colleagues [26].

3 Machine learning algorithms

In our evaluation, we compare the performance of four MLAs: Artificial Neural Networks (ANN), Support Vector Machines (SVM), Random Forest (RF), and Bayesian Network (BN). Those MLAs were chosen given their great presence in the literature of classification problems, and the fact that they represent different machine learning “tribes” [27], which ensures a machine learning algorithmic diversity. In this section, basic concepts of the aforementioned MLAs are explained.

3.1 Artificial neural networks

Artificial Neural Networks (ANN) are composed by several computational elements that interact through connections with different weights. With inspiration in the human brain, neural networks exhibit features such as the ability to learn complex patterns of data and generalize learned information [28]. The simplest form of an ANN is the Multi Layer Perceptron (MLP) consisting of three layers: the input layer, the hidden layer, and the output layer.

Haykin [29] states that the learning processes of an artificial neural network are determined by how parameter changes occur. Thus, the process of learning an ANN is divided into three parts: (i) the stimulation by extraction of examples from an environment; (ii) the modification of its weights through iterative processes in order to minimize ANN output error; and (iii) the network responds in a new way as a result of the changes that occurred. Parameter configuration directly impacts on the process of learning an ANN. Some examples of parameters are: learning rate, momentum rate, stop criteria and form of network training.

3.2 Support vector machines

Support Vector Machines (SVM) [30] are a supervised learning method used for regression and classification. The algorithm tries to find an optimal hyperplane which separates the d-dimensional training data perfectly into its classes. An optimal hyperplane is the one that maximizes the distance between examples on the margin (border) which separates different classes. These examples on the margin are the so-called “support vectors”.

Since training data is often not linearly separable, SVM maps data into a high-dimensional feature space though some nonlinear mapping. In this space, an optimal separating hyperplane is constructed. In order to reduce computational cost, the mapping will be performed by kernel functions, which depend only on input space variables. The most used kernel functions are: linear, polynomial, radial base function (RBF) and sigmoid.

3.3 Random forrest

Random Forests (RF) are sets of decision trees that vote together in a classification. Each tree is constructed by chance and selects a subset of features randomly from a subset of data points. The tree is then trained on these data points (only on the selected characteristics), and the remaining “out of bag” is used to evaluate the tree. Random Forests are known to be effective in preventing overfitting.

Proposed by Leo Breiman [31] its features are: (i) it is easy to implement; (ii) it has good generalization properties; (iii) its algorithm outputs more information than just class label; (iv) it runs efficiently on large data bases; (v) it can handle thousands of input variables without variable deletion; and (vi) it provides estimates of what variables are important in the classification.

3.4 Bayesian networks

According to Ben-Gal [32], Bayesian Networks (BNs) belong to the family of probabilistic graphical models. These graph structures are used to represent knowledge about an uncertain domain. In particular, each node in the graph represents a random variable, while the edges between the nodes represent probabilistic dependencies among the corresponding random variables. Such conditional dependencies in the graph are often estimated using known statistic and computational methods. Thus, Bayesian networks combine principles of graph theory, probability theory, and statistics.

4 Methodology

We modeled this work as a multi-label supervised learning classification problem where the labels are driving events types. The goal of this work is to identify the best combination of motion sensor (and its axes), learning algorithm (and its parameters), and number of frames in the sliding window (nf) to detect individual driving event types. To this end, we define an evaluation assembly in the form EA = {1:sensor, 2:sensor axis(es), 3:MLA, 4:MLA configuration, 5:nf}.

An assembly is evaluated by training, testing, and assessing the performance of the classifier generated by the specified MLA (element #3) with its configuration parameters (element #4) over a data set identified by sensor (element #1), its axis(es) (element #2), and number of frames in sliding window (element #5). By changing the value of an element in this assembly, we achieve a different driving event detection performance. Therefore, this assessment evaluates several combinations of element values in order to reveal the best performing ones for each driving event type.

Fig 1 shows a high level view of our evaluation pipeline. In the first step of the pipeline, smartphone sensor raw data is sampled and translated from the device coordinate system to Earth’s coordinate system (Fig 2). This translation is necessary in order to achieve device position independence inside the vehicle. Translated sensor data are then stored in the smartphone file system. In the second step, translated sensor data files are retrieved from the smartphone and used as input to generate attribute vector data sets. In the third step, attribute vector data sets are used to train, test and assess MLAs performances. As depicted in Fig 1, the first pipeline step is executed on the smartphone, whereas the second and final steps are executed on a regular computer.

The metric we use to evaluate assembly performance for each driving event type is the area under the ROC curve (AUC) [33, 34]. The AUC of a classifier ranges from 0.0 (worst) to 1.0 (best), but no realistic classifier should have an AUC less than 0.5 which is equivalent to random guessing. Hence, the closer an evaluation assembly AUC is to 1.0, the better it is at detecting a particular driving event type.

In the remainder of this section, Subsection 4.1 presents the detailed evaluation assembly of machine learning and sensors. Subsection 4.2 describes the proposed attribute vector, used as input for the machine learning algorithms. Finally, Subsection 4.3 presents how we performd the data collection in a real-world experiment.

4.1 Evaluation assembly

The sensor is the first element of the evaluation assembly. It represents one of the following Android smartphone motion sensors: accelerometer (Acc), linear acceleration (LinAcc), magnetometer (Mag), and gyroscope (Gyr). The accelerometer measures the acceleration in meters per second squared (m/s2) applied to the device, including the force of gravity. The linear acceleration sensor is similar to the accelerometer, but excluding the force of gravity. The magnetometer measures the force of the magnetic field applied to the device in micro-Tesla (μT), and works similar to a magnet. The gyroscope measures the rate of rotation around the device’s axes in radians per second (rad/s). These sensors provide a 3-dimensional (x, y e z) temporal series with nanoseconds precision in the standard sensor coordinate system (relative to the device).

The second element of the evaluation assembly is the sensor axis(es). Available values for this element are (i) all 3 axes; (ii) x axis alone; (iii) y axis alone; and (iv) z axis alone. For example, the accelerometer originates the following data sets: accelerometer (with data from all three axes), accelerometer_x, accelerometer_y, and accelerometer_z. The only exception to this rule is the magnetometer whose x axis values are always 0 or close after translated to Earth’s coordinate system. For that reason, there is no magnetometer_x data set. As we evaluate data from 3 sensors that originate 4 data sets, and 1 sensor that originates 3 data sets, there is a total of 3 * 4 + 3 = 15 data sets. We separated sensor axes in distinct data sets to observe if any single axis would emerge as the best to detect a particular driving event type.

The MLA is the third element of the evaluation assembly. As detailed in Section 3, we evaluate the classification performance of MLP, SVM, RF, and BN MLAs. We used the WEKA (version 3.8.0) implementations of these algorithms in conjunction with LIBSVM [35] library (version 3.17). We trained and tested these classifiers using 10-fold cross-validation in order to minimize overfitting.

Algorithm configuration is the forth element of the evaluation assembly. We performed a parameter grid search to assess each algorithm with every possible combination of parameter values on Table 1. We set most of the parameter values experimentally, and followed the guidelines provided in [36] for SVM. We also used WEKA default values for parameters not listed on Table 1.

Raw sensor data are basically composed of 3-axes values and a nanosecond timestamp indicating the instant the sample was collected. However, we do not send raw sensor data to classifiers. Instead, we group sensor time series samples in one-second length frames to compose a sliding time window which is later summarized to originate an attribute vector. As time passes, the window is slided in 1 frame increments over the temporal series as depicted in Fig 3. We consider f0 as the frame of the current second, f−1 as the frame of the previous second, and so forth down to f−(nf − 1), where nf (the fifth element of the evaluation assembly) is the number of frames that compose the sliding time window. We used the following nf values in this evaluation: 4, 5, 6, 7, and 8. These values were defined experimentally so that the sliding window can accommodate the length of collected driving events which range from 2 to 7 seconds depending on how aggressive the event is.

Fig 3. Time window composed of nf one-second frames which group raw sensor data samples.

The time window slides in 1 frame increments as time passes. f0 is the frame of the current second, f−1 is the frame of the previous second, and so forth down to fi, where i = nf − 1.

In this work, we assessed the performance of several evaluation assemblies to find the ones that best detect each driving event type. The number of assemblies is the result of all combinations of 15 data sets, 5 different values for nf, 4 configurations (Table 1) of the BN algorithm, 5 of the MLP, 6 of the RF, and 36 of the SVM. This results in a total of 15 * 5 * 4 + 15 * 5 * 5 + 15 * 5 * 6 + 15 * 5 * 36 = 3825 evaluation assemblies.

4.2 Attribute vector

An attribute vector is the summarization of the sliding window depicted in Fig 3. One instance of the attribute vector is generated for every time window that contains a driving event on it. Correspondingly, if there is no driving event for a particular time window, no attribute vector instance is generated.

We create an instance of the attribute vector by calculating the mean (M) Eq (1), median (MD) Eq (2), standard deviation (SD) Eq (3), and the increase/decrease tendency (T) Eq (4) over sensor data samples in the frames composing the time window. The number of attributes in the vector is dependent on the number of frames in the sliding window (nf). There are nf mean, median, and standard deviation attributes, and nf − 1 tendency attributes. Fig 4 depicts the structure of an attribute vector for a single axis of sensor data. When the data set is composed of more than one axis, the attribute vectors for each axis are simply concatenated and only the class label attribute of the last vector is preserved as they all have the same value.

The attributes of the vector are calculated as: (1)(2)(3)(4) Where i = [0..(nf − 1)], SF(fj) is a summarizing function (mean, median, or standard deviation) applied over the samples of the jth frame, and SF(fj, fk) is a summarizing function applied over the samples from the jth to the kth frame (j < k).

The class label attribute of the vector comes from the driving events ground-truth, as described later in Section 4.3. The class label is the driving event whose start timestamp is between the first timestamp of the samples in frame f−(nf − 1) and the last timestamp of the samples in frame f0. We should mention that attribute vectors originating from different driving trips can be grouped together in the same data set as they are time-independent.

It is important to highlight that a single driving event raw sensor data sample generates nf attribute vector instances. This occurs because the window frame that contains the start timestamp of the event changes its position in the window as it slides. Therefore, if there are s samples of a particular driving event type, there will be s * nf attribute vector instances for that same event type. This behavior allows for multiple windows to capture different portions or signatures of the same event.

4.3 Data collection in a real-world experiment

We performed a real-world experiment in order to collect sensor data for driving events. In this experiment, an Android application recorded smartphone sensor data while a driver executed particular driving events. We also recorded the start and end timestamps of the driving events to generate the ground-truth for the experiment.

We performed the experiment in 4 car trips of approximately 13 minutes each in average. The experiment conditions were the following: (i) the vehicle was a 2011 Honda Civic; (ii) the smartphone was a Motorola XT1058 with Android version 5.1; (iii) the smartphone was fixed on the car’s windshield by means of a car mount, and was neither moved nor operated while collecting sensor data; (iv) the motion sensors sampling rate varied between 50 and 100 Hz, depending on the sensor; (v) two drivers with more than 15 years of driving experience executed the driving events; and (vi) the weather was sunny and the roads were dry and paved with asphalt.

The driving events types we collected in this experiment were based on the events in [13]. Our purpose was to establish a set of driving events that represents usual real-world events such as breaking, acceleration, turning, and lane changes. Table 2 shows the 7 driving events types we used in this work and their number of collected samples. Fig 5 shows sensor data for an aggressive left lane change event as it is captured by the four sensors used in this evaluation.

5 Results

We executed all combinations of the 4 MLAs and their configurations described on Table 1 over the 15 data sets described in Section 4.3 using 5 different nf values. We trained, tested, and assessed every evaluation assembly with 15 different random seeds. Finally, we calculated the mean AUC for these executions, grouped them by driving event type, and ranked the 5 best performing assemblies in the boxplot displayed in Fig 6. This figure shows the driving events on the left-hand side and the 5 best evaluation assemblies for each event on the right-hand side, with the best ones at the bottom. The assembly text identification in Fig 6 encodes, in this order: (i) the nf value; (ii) the sensor and its axis (if there is no axis indication, then all sensor axes are used); and (iii) the MLA and its configuration identifier.

Fig 6. Top 5 best AUC assemblies grouped by driving event type as the result of 15 MLA train/test executions with different random seeds.

Values closer to 1.0 are better. Driving events are on the left-hand side and assemblies are on the right-hand side. Assemblies with the best mean AUC are closer to the bottom.

In light of the these results, we can draw a few conclusions in the context of the performed experiment. Firstly, MLAs perform better with higher nf values (i.e., bigger sliding window sizes). Of the 35 best performing assemblies, 23 have nf = 8, 6 have nf = 7, 5 have nf = 6, and only 1 has nf = 4.

Secondly, the gyroscope and the accelerometer are the most suitable sensors to detect the driving events in this experiment. On the other hand, the magnetometer alone is not suitable for event detection as none of the 35 best assemblies use that sensor. Also, using all sensor axes performs better in a general way than using a single axis. The only exception being the z axis of the gyroscope that alone best detects aggressive left turns.

Thirdly, RF is by far the best performing MLA with 28 out of 35 best assemblies. The second best is MLP with 7 best results. RF dominates the top 5 performances for nonaggressive events, and aggressive turns, breaking, and acceleration. However, MLP is better at aggressive lane changes. BN and SVM were not ranked in the best 35 performing assemblies.

Fourthly, MLP configuration #1 was the best performing. In this configuration, the number of neurons in the hidden layer is defined as (#attr. + #classes)/2. This is also the default WEKA configuration. For RF, configurations #6 (# of iterations = 200; # of attributes to randomly investigate = 15), and #5 (# of iterations = 200; # of attributes to randomly investigate = 10) gave the best results.

Finally, we found a satisfactory and equivalent performance in the top 35 ranked evaluation assemblies. This is true because the difference between the worst AUC mean (0.980 for the aggressive breaking event) and the best one (0.999 for the aggressive right lane change event) is only 0.018. A difference that is not significant in the context of this experiment.

6 Conclusions and future work

In this work we presented a quantitative evaluation of the performances of 4 MLAs (BN, MLP, RF, and SVM) with different configurations applied in the detection of 7 driving event types using data collected from 4 Android smartphone sensors (accelerometer, linear acceleration, magnetometer, and gyroscope). We collected 69 samples of these event types in a real-world experiment with 2 drivers. The start and end times of these events were recorded serve as the experiment ground-truth. We also compared the performances when applying different sliding time window sizes.

We performed 15 executions with different random seeds of 3865 evaluation assemblies of the form EA = {1:sensor, 2:sensor axis(es), 3:MLA, 4:MLA configuration, 5:number of frames in sliding window}. As a result, we found the top 5 performing assemblies for each driving event type. In the context of our experiment, these results show that (i) bigger window sizes perform better; (ii) the gyroscope and the accelerometer are the best sensors to detect our driving events; (iii) as general rule, using all sensor axes perform better than using a single one, except for aggressive left turns events; (iv) RF is by far the best performing MLA, followed by MLP; and (v) the performance of the top 35 combinations is both satisfactory and equivalent, varying from 0.980 to 0.999 mean AUC values.

As future work, we expect to collect a greater number of driving events samples using different vehicles, Android smartphone models, road conditions, weather, and temperature. We also expect to add more MLAs to our evaluation, including those based on fuzzy logic and DTW. Finally, we intend use the best evaluation assemblies observed in this work to develop an Android smartphone application which can detect driving events in real-time and calculate the driver behavior profile.

Author Contributions

  1. Conceptualization: JFJ CdS YS AP GP.
  2. Data curation: JFJ EC BVF.
  3. Investigation: JFJ EC BVF.
  4. Methodology: JFJ CdS YS AP GP.
  5. Software: JFJ GP.
  6. Writing – original draft: JFJ EC BVF CdS YS AP GP.


  1. 1. Xiaoqiu F, Jinzhang J, Guoqiang Z. Impact of Driving Behavior on the Traffic Safety of Highway Intersection. In: Measuring Technology and Mechatronics Automation (ICMTMA), 2011 Third International Conference on. vol. 2; 2011. p. 370–373.
  2. 2. Evans L. Traffic safety. Science Serving Society; 2004.
  3. 3. Blincoe L, Miller TR, Zaloshnja E, Lawrence BA. The Economic and Societal Impact of Motor Vehicle Crashes, 2010 (Revised). National Highway Traffic Safety Administration; 2015.
  4. 4. Haworth N, Symmons M. Haworth, MS. Driving to Reduce Fuel Consumption and Improve Road Safety. Road Safety Research, Policing and Education Conference, 2001, Melbourne, Victoria, Australia. 2001;(5):7.
  5. 5. Van Mierlo J, Maggetto G, Van de Burgwal E, Gense R. Driving style and traffic measures-influence on vehicle emissions and fuel consumption. Proceedings of the Institution of Mechanical Engineers, Part D: Journal of Automobile Engineering. 2005;218:43–50.
  6. 6. Paefgen J, Kehr F, Zhai Y, Michahelles F. Driving Behavior Analysis with Smartphones: Insights from a Controlled Field Study. In: Proceedings of the 11th International Conference on Mobile and Ubiquitous Multimedia. MUM’12. New York, NY, USA: ACM; 2012. p. 36:1–36:8.
  7. 7. Skog I, Handel P, Ohlsson M, Ohlsson J. Challenges in smartphone-driven usage based insurance. In: Global Conference on Signal and Information Processing (GlobalSIP), 2013 IEEE; 2013. p. 1135–1135.
  8. 8. Händel P, Ohlsson J, Ohlsson M, Skog I, Nygren E. Smartphone-Based Measurement Systems for Road Vehicle Traffic Monitoring and Usage-Based Insurance. Systems Journal, IEEE. 2014;8(4):1238–1248.
  9. 9. Dai J, Teng J, Bai X, Shen Z, Xuan D. Mobile phone based drunk driving detection. In: 2010 4th International Conference on Pervasive Computing Technologies for Healthcare; 2010. p. 1–8.
  10. 10. Johnson DA, Trivedi MM. Driving style recognition using a smartphone as a sensor platform. In: Intelligent Transportation Systems (ITSC), 2011 14th International IEEE Conference on; 2011. p. 1609–1615.
  11. 11. Eren H, Makinist S, Akin E, Yilmaz A. Estimating driving behavior by a smartphone. In: Intelligent Vehicles Symposium (IV), 2012 IEEE; 2012. p. 234–239.
  12. 12. Castignani G, Frank R, Engel T. Driver behavior profiling using smartphones. In: Intelligent Transportation Systems (ITSC), 2013 16th International IEEE Conference on; 2013. p. 552–557.
  13. 13. Saiprasert C, Thajchayapong S, Pholprasit T, Tanprasert C. Driver behaviour profiling using smartphone sensory data in a V2I environment. In: Connected Vehicles and Expo (ICCVE), 2014 International Conference on; 2014. p. 552–557.
  14. 14. Castignani G, Derrmann T, Frank R, Engel T. Driver Behavior Profiling Using Smartphones: A Low-Cost Platform for Driver Monitoring. Intelligent Transportation Systems Magazine, IEEE. 2015;7(1):91–102.
  15. 15. Araujo R, Igreja A, de Castro R, Araujo RE. Driving coach: A smartphone application to evaluate driving efficient patterns. In: Intelligent Vehicles Symposium (IV), 2012 IEEE; 2012. p. 1005–1010.
  16. 16. Sakoe H, Chiba S. Dynamic programming algorithm optimization for spoken word recognition. Acoustics, Speech and Signal Processing, IEEE Transactions on. 1978;26(1):43–49.
  17. 17. Mohan P, Padmanabhan VN, Ramjee R. Nericell: Rich Monitoring of Road and Traffic Conditions Using Mobile Smartphones. In: Proceedings of the 6th ACM Conference on Embedded Network Sensor Systems. SenSys’08. New York, NY, USA: ACM; 2008. p. 323–336.
  18. 18. White J, Thompson C, Turner H, Dougherty B, Schmidt DC. WreckWatch: Automatic Traffic Accident Detection and Notification with Smartphones. Mob Netw Appl. 2011;16(3):285–303.
  19. 19. Fazeen M, Gozick B, Dantu R, Bhukhiya M, González MC. Safe Driving Using Mobile Phones. IEEE Transactions on Intelligent Transportation Systems. 2012;13(3):1462–1468.
  20. 20. Saiprasert C, Pholprasit T, Pattara-Atikom W. Detecting driving events using smartphone. In: Proceedings of the 20th ITS World Congress; 2013.
  21. 21. Wahlström J, Skog I, Händel P. Detection of Dangerous Cornering in GNSS-Data-Driven Insurance Telematics. IEEE Transactions on Intelligent Transportation Systems. 2015;16(6):3073–3083.
  22. 22. Shojafar M, Cordeschi N, Baccarelli E. Energy-efficient adaptive resource management for real-time vehicular cloud services. IEEE Transactions on Cloud computing. 2016;PP(99):1–1.
  23. 23. Wei W, Fan X, Song H, Fan X, Yang J. Imperfect information dynamic stackelberg game based resource allocation using hidden Markov for cloud computing. IEEE Transactions on Services Computing. 2016;PP(99):1–1.
  24. 24. Song H, Smith BL. Empirical investigation of the impact of high-occupancy-toll operations on driver behavior. In: Transportation Research Board 88th Annual Meeting. 09-1462; 2009.
  25. 25. Sun Y, Song H, Jara AJ, Bie R. nternet of things and big data analytics for smart and connected communities. IEEE Access. 2016;4:766–773.
Categories: 1

0 Replies to “Pessin Research Paper”

Leave a comment

L'indirizzo email non verrà pubblicato. I campi obbligatori sono contrassegnati *