The spec for loading a media element is quite huge. This implements just
enough to parse the attribute, fetch the corresponding media object, and
decode the media object (if it is a video). While doing so, this also
implements most network state tracking and firing DOM events that may be
observed via JavaScript.