First of all, I’d like to remind that the term optimisation comes from mathematics and means selection one of the available options in a certain set to maximise certain function. Of course the term was misappropriated by computer “scientists” in the same way as physicists did with biological term plasma and the first association people have with that word is “something that modern games lack”.
And the crucial part in the original definition is that there’s a metric present by which the choice is judged. So even software may be optimal, it’s just the goals set during its creation are different from what you expect (and most likely set by the marketing team).
Hence you have the first kind of bloat: creeping featuritis caused by the desire to earn more money for the product by making it more appealing to the wider masses. Often it does not matter if most of those features are unpolished or even non-existent, as long as the product is bought you can keep lying that they’ll be fixed/improved with the next update or in an upgraded version of the product. I’m still old enough to remember the times when this was called Korean cell-phone syndrome (after electronics manufacturers from a certain country notorious for such tactic) instead of being normal product cycle.
Other kinds of bloat come from the trade-offs that everybody has to make. For instance, there’s sediment accumulating from the previous features that were hard to remove or even to refactor. So it’s there and may break things if you try to remove it (as other components rely on it in unpredictable ways). The most obvious example of that is human genome, what humans create on macroscopic level is usually much much more effective (there’s no need to list exceptions).
And finally, it’s rather conscious sacrifice of efficiency for some other goals. For example, car engines can be more effective in, say, kilometres per litre of fuel consumed but they won’t allow you to accelerate or won’t pass ecological standards (which is a problem with diesel engines, even Volkswagen had to admit it eventually).
The same applies to developing programs: you may write a program without superfluous features, it will require minimal storage and RAM, and it will run extremely fast—but you’ll end up writing it somewhere next century. And it’ll be easier to write it anew than make a port for another system. Also let’s not even think about debugging.
That’s the story with the low- to high-level programming languages. They offer more and more composability and convenience for being slower. You can write program in machine codes but you have to calculate all jump and call offsets by yourself. That’s why assembler was created—and also introduced macros which allowed to not repeat the common sequences by sacrificing a bit of efficiency (since sometimes you could’ve omitted or improved an instruction or two that were not needed in that particular place). But it’s not good enough when you have to port your program to a different OS or even a different CPU, thus C was invented, earning a nickname “portable assembly” (which is no longer true thanks to the standardisation committee but that’s another story). With C you can write a program not only that you often may compile and run on another system, but also modular enough that you can have a comprehension of what it does without going insane after looking at one huge unstructured file. Of course this also has its cost, both in compilation time and efficiency losses that even optimisation passes can’t compensate.
In the same time there were high-level programming languages, and they had a radically different approach: “we don’t care about the details, we want this calculation to run on the machine and produce results”. The main advantage of C++ over C is that it allows better encapsulation of different objects and types (so you can e.g. define your own matrix class and operations over it writing more natural mat_a + mat_b
instead of mat_c = matrix_add(mat_a, mat_b)
and worrying if you passed the right types). Of course this does not come for free either (virtual tables for class methods take space, you know) but it’s often a small price to pay (especially if you’re using a language that has actually learned something from C++ mistakes).
The same applies to the reusable components aka libraries. You save time on writing code (and sometimes on debugging it as well) while sacrificing speed (since those components are not tailored to your specific use case) and space (because they often bring in dependencies that you don’t really need; web development is full of egregious examples of this).
At least here it is your choice and you decide how you’re going to build your project what you can sacrifice for what goals. With other apps forced onto you it’s not the case.
In conclusion I want to give an example from my field. My hobby multimedia project written by me from scratch does not strive to be the fastest (except occasionally when it gets the fastest open-source encoder or decoder—but that’s for the lack of competition) and I work on it time from time mostly to make sure I understood some concept right; when I try to make something faster it’s usually to make it fast enough for my practical needs. And there’s Paul M. whose project I haven’t advertised enough (here’s the link just in case)—he took the existing codebase and added or improved some bits, striving to be better than the competition (and often successfully). You can give arguments why either approach resulted in bloated software, but I’d say that as long as we’re happy with the results it’s not bloated in either case. It’s optimal in the original meaning of that term, you just haven’t considered what metric was used for picking that solution.