Weibull Notes – not ready to publish
Observational data can be usefully summarized by fitting it with a probability distribution.
The most common – and useful – are the Normal (or Gaussian), the LogNormal, and the Weibull distributions.
(Software for constructing these plots is available.)
Unfortunately the Weibull model is often used, not because it is the best tool for the job, but because it can approximate many other probability densities, like the normal and LogNormal. It has been compared to a mechanic’s crescent wrench, which isn’t as effective a having a complete set of wrenches, but is more likely to be available. It is always preferable, however, to use the best tool for the job, not just the easiest.
- Weibull is not location, scale. That means that a good Weibull model for data in the 100 to 1000 range may be less effective for data having a similar shape, but in the 1000 to 10,000 range. Modeling data farther remeoved from \(x= 0\) may require a may require a \(t_0\) “correction.” This is dangerous – sometimes even doubleplussungood. Why? Because often we are interested in the early occurrences (e.g. failures), those with a low probability but very high consequence. Using a \(t_0\) “correction” defines the probability to be ZERO for ALL occurrences less than \(t_0\). In other words you have defined away your problem! Dumb!
- compare 3-parm Weibull to 2-parm lognormal.
- assigning probability zero to stuff you’re really interested in.
- over-fitting (i.e. – talking yourself into thinking you know more that the data told you)
- you can have both left and right censoring with, say, a signal response, limited at the left by background noise and on the right bymaximum signal output, but fatigue failure data will be right censored only, caused by parts removed from testing before they fail.
- “Weibayes” isn’t Bayesian. It’s opportunistic marketing gone awry that treats engineers as statistical rubes, who will believe anything some “expert” tells them. The unfortunate cobbling-together of two surnames is insulting to both men, and the ad hoc “methodology” is specious. “Weibayes” simply assumes that you already know the exact value of the Weibull shape parameter, beta, (“slope”), and use the data to estimate the location parameter, eta.
- Real Bayesian analysis assumes that you know only approximate values for the shape and location and also have some idea about their variabilities. These “priors” are then updated in light of the data to provide more accurate estimates of both eta and beta. “Weibays” leaves the unsuspecting user with the mistaken notion that he can compute what he needs to know (an early failure percentile, for example) based on a guess. While that might work if you are interested in, say, the 0.1th percentile, it is laughable for anything smaller. Are you betting your company’s future on a guess?
- side-by-side? plot of cdf and pdf
- Smaller plots look much better on higher resolution monitors.
- Show density plots with wide left tail.
- “straightforward explanations free of confusing statistical jargon” is sometimes also free of statistical validity, viz. the quoted “probabilities” are improbable.
- simultaneous confidence bounds It is well known that Weibull model parameters are highly correlated. Computing confidence bounds for them individually is therefore misleading. The method suggested by Cheng and Iles (“Confidence Bands for Cumulative Distribution Functions of Continuous Random Variables,” Technometrics, vol. 25, no. 1, pp 77 – 86 (1983)) computes their joint influence and produces simultaneous confidence bounds on the cumulative density function.
- Weibull analysis is NOT a regression of plotting positions:
- plotting positions are arbitrary. There are several in common usage.
- the regression treats X as known and the percentile as random. In truth, the percentile is known (after considering possible censored observations) and the X value is a random
variable. (Think of \(sN\) curves where \(N=f(stress)\) not \(stress =
- The correct method is mle. Anything that disagrees with that is therefore wrong. Plot 3 regressions and mle
- There are two valid confidence bounds, binomial and loglikelihood ratio. Regression bounds are wrong because the are based observations and how they are plotted as OLS (Ordinary Least Squares) regression.
- situation where XXX software is not appropriate: multiple censoring. interval censoring. Contact an expert: Me.
- created a menu for Weibull topics pointing individual pages .
- if Time has a Weibull distribution, then log(Time) has a SEV distribution.