Linear, nonlinear, and monotonic relationships - Minitab
The concept of linear relationship suggests that two quantities are proportional to each other: doubling one causes the other to double as well. Using the U.S. National Health Interview Survey-Linked Mortality Files, which link the Identifying potential nonlinear relationships is important for understanding the As a result, in recent years, women's rates of educational. A nonlinear relationship is a type of relationship between two entities in the other quantity either increases or decreases at a constant rate.
Nonlinearity in the Relationship between Education and Mortality While it is common to assume a nonlinear relationship between health outcomes and income, few studies have systematically examined the shape of the education-mortality slope.
Rather, the relationship is most often estimated by categorizing years of schooling into a series of dummy variables see Kitagawa and Hauser ; Molla, Madans, and Wagener ; Rogers et al. Using a linear term is problematic because much research has shown that the relationship between years of education and mortality risk is nonmonotonic Rosse and Mirosky ; Backlund, Sorlie, and Johnson ; Montez et al.
While using dummy variables for groups of more than one year of education skirts the issue of nonlinearity, it often obscures potential differences in the shape of the mortality curve as well as potential threshold differences between populations. More recently Montez and colleagues used 13 different logistic regression model specifications to examine the functional form of the education-mortality gradient.
Neither of these studies found a step-change at 16 years of education. While theirs is the most thorough examination to date, their models are not semiparametric and thus are constrained by the specified model. Other studies that have used semiparametric nonlinear models to explore the relationship among income, biomarkers of coronary heart disease, and overall mortality Rehkopf et al.
Efficient test for nonlinear dependence of two continuous variables
Semiparametric models also allow a formal statistical test of nonlinearity and are generally more stable with smaller sample sizes than models with dummy variables and slopes; thus they allow examination of relationships within population subgroups Ramsey and Ripley ; Wood The use of a penalized spline on education allows us to use all available information on years of education and create more stable estimates for each year of education.
We test two research questions: This data set is ideal for our purposes because it is extremely large, is nationally representative, and allows us to examine the link between education and mortality among several subpopulations. To provide a public-use version of these data, NCHS perturbed the dates or cause of death for a small number of records to ensure that individuals could not be identified.
Lochner and colleagues demonstrated that the public-use and restricted data sets produce equivalent results for overall mortality. We restrict our sample to respondents ages 35 to 65 years old, approximately the working age population, for three reasons: Focusing on non-Hispanic whites and blacks results in a sample size ofrespondents, of whom 56, died during the follow-up period. We use Cox proportional hazard models to examine mortality by race and gender.
Education is coded as a continuous variable that ranges from 0 to 18 or more years of education that is, it is top-coded at 18 or more years. NHIS changed the coding strategy for education between and to make educational attainment a categorical variable, asking whether respondents had completed years of school, graduated from high school, or obtained an associate's, bachelor's, master's, professional, or doctoral degree.
GEDs are coded as 12 years of education, a limitation of our study given the body of work that suggests differences in the health returns to GEDs compared to high school diplomas [ Rogers et al.
To adjust for this coding change, we converted the categories into comparable years of education and include a dummy variable for whether persons were surveyed before or in or after We graphed the mean education level by year and did not discern any breaks between and due to coding changes, which suggests that the measurement change has little effect on the study results. Other studies have used a similar recoding strategy Everett et al.
Additionally, we include a control for the year of the survey, which ranges from to Method We use Cox proportional hazard models with a penalized spline on the education term. The penalized spline model has substantial advantages over simpler models where the user specifies one or more inflection points or one or more categories to model the relationship. Categories must be prespecified by the user, and they limit power to detect differences.
The primary emphasis of this analysis is to determine the extent of non-linearity in different population groups.
Bivariate relationship linearity, strength and direction (video) | Khan Academy
The penalized spline algorithm allows our models to adjust for potential nonlinearity by fitting a model in two steps. First, the relationship between education and mortality is estimated using a large number of knots to form splines, or separate slopes for the relationship between each year of education and mortality risk.
That is, at each year of education, the slope is allowed to vary. Second, the models then uses Akaike Information Criterion AIC fit to smooth the spline using a penalized smoothing parameter that balances over-fitting the model with excessive degrees of freedom with a more restricted model that still allows for nonlinear variations in the relationship between education and mortality. For example, if the relationship between education and mortality were a straight line, allowing for changes in slopes at every year of education would unnecessarily increase the degrees of freedom used in the model and result in an overfitting of the data.
Ruppert, Wand, and Carroll have shown that this method is insensitive to the number of knots initially chosen. The penalized spline approach, therefore, allows for the results to reflect nonlinear variations in the relationship between education and mortality while optimizing model fit Eilers and Marx ; Hurvich, Simonoff, and Tsai We estimate a series of Cox proportional hazard models with different specifications to ensure that we are correctly modeling the relationship between education and mortality, and to examine how robust our results are to different model specifications.
Thus, we include results from Cox proportional hazard models with the following model specifications: The frailty model can therefore be thought of as a random effect that allows for individuals to deviate from the mean hazard to either raise or lower their individual risk from the population.
The GEE specification does not specify a functional form of the random effect. Functional forms of the random effect are specified in frailty models: Following Kom, Graubard, and Midthunewe use age to indicate the time to death, which ensures that our mortality analyses are age-adjusted.
We adjust for sample weights and strata to account for NHIS's complex sampling frame. All analyses are completed in R version 2. The proportional hazard models use the coxph commands, and penalized splines are fit with the pspline Ramsey and Ripley This describes a linear relationship between jet fuel cost and flight cost.
Strong positive linear relationship Plot 2: Strong negative linear relationship When both variables increase or decrease concurrently and at a constant rate, a positive linear relationship exists.
The points in Plot 1 follow the line closely, suggesting that the relationship between the variables is strong. When one variable increases while the other variable decreases, a negative linear relationship exists. The points in Plot 2 follow the line closely, suggesting that the relationship between the variables is strong.
Weak linear relationship Plot 4: Nonlinear relationship The data points in Plot 3 appear to be randomly distributed. They do not fall close to the line indicating a very weak relationship if one exists. If a relationship between two variables is not linear, the rate of increase or decrease can change as one variable changes, causing a "curved pattern" in the data.Linear and nonlinear functions (example 1) - 8th grade - Khan Academy
This curved trend might be better modeled by a nonlinear function, such as a quadratic or cubic function, or be transformed to make it linear.
Plot 4 shows a strong relationship between two variables. This relationship illustrates why it is important to plot the data in order to explore any relationships that might exist.