Why? Monads [from category theory] were [applied in comp sci] as a way to give imperative programs formal semantics. Making use of them to retain referential transparency in the presence of effects is probably Haskell's greatest success. [Edited for clarity].
Monads are just one of many ways of formalizing notions of state. All of them have their tradeoffs. For example, in temporal logic it's much more obvious that "state does not compose" due to the frame problem.
Please don't say "referntial transparency" because it doesn't mean what some Haskellers think it does. It means the following: an expression e is referentially transparent if any sub-term in it can be replaced with any other having the same reference (aka denotation) without changing e's reference. This is true for Java, and most programming languages, really, but, incidentally, not true for Template Haskell (or any language with macros). Referential opacity or opaqueness increases the expressiveness of the language, which is why Lisp, which is not transparent because of macros, is more expressive than the lambda calculus (which is), and why modal logic is so expressive. It's an interesting property, but it's not what pure-FP fans think it means. What they mean is that the language is referentially transparent (like most language) and a term's reference (denotation) is an object value in the language -- this is not true in imperative language where terms refer to something called "predicate transformers". This quality is also called "value semantics" which is pretty much a synonym for "pure functional." In other words, some FP fans have redefined referential transparency to mean pure functional, and then use "referential transparency" incorrectly as some more general thing, when they actually just mean "pure functional."
And yes, monads are a way to express some non-deterministic computations in lambda-calculus-based languages, as are linear types.
> Haskell's greatest success
Success in what sense? In expressing what amounts to side-effects in a language based on a lambda calculus as monads? Sure.
Monads are from category theory, so at most they were adopted, not developed.
They are a workaround, because by definition purity implies that nothing changes in the world.
I can run the same Haskell program multiple times, and it might eventually produce different values as output given the same input values, e.g. an URL or file handle.
Are you referring to how an online document or a file might change so that Haskell program might have a different output given the same URL as an input?
As @willtim explains, monads were applied to PL theory to model effects. The IO monad models the effect of state, i.e. imperative programs. The type `IO a` is essentially a "wrapper" or "alias" for the type `RealWorld -> (RealWorld, a)`. You should think of the entire world as being the input to a Haskell program.
Haskell does have impure backdoors into IO such as unsafePerformIO, but the IO monad itself is perfectly pure.
P.S. For more information, check out the work of Eugenio Moggi [0], who started using monads to give semantics to programming languages, and Philip Wadler [1], who applied the idea on a programmer-facing level.