fbpx Digital media standards for improved user experience | Science in the net

Digital media standards for improved user experience

Primary tabs

Read time: 6 mins

Standards are a vital component of communication. In 1984, while working for CSELT, the research center of what is today Telecom Italia, I submitted RACE IVICO (Integrated Video Codec), a project aimed at developing a European microelectronic technology for digital video in partnership with representatives of most relevant European industries. The project was approved but two years later was terminated because of the jarring differences with the European audio-visual policy of that time (digital audio-video planned to play a role in the first decade of the 21st century), but also because digital audio-visual was assumed to play a major role in the broadband strategy inside a Telco like Telecom Italia.

A year later, seeing that it was not possible to develop a European microelectronic technology for audio-visual, I decided that an international standard could at least be developed and, in 1988, I established the Moving Picture Experts Group (MPEG), a working group in ISO/IEC JTC 1 Information Technologies.

“Standard” is a well-used and misused word, but not all standards are the same. In the case of audio and video there should not be separate standards for audio and video but unified audio-visual standards for all client industries: Broadcasting, Consumer Electronics, IT and Telecommunications, a single standard for digital representation of audio-visual information separate from potentially different delivery standards. I believe this is the main reason for the success of MPEG standards.

Several definitions of standard can be found:

  • Webster’s
    • A conspicuous object (as a banner) formerly carried at the top of a pole and used to mark a rallying point especially in battle or to serve as an emblem
    • Something that is established by authority, custom or general consent as a model or an example to be followed
  • Encyclopaedia Britannica
    • (Technical specification) that permits large production runs of component parts that are readily fitted to other parts without adjustment
  • My definition
    • Codified agreement between parties who recognise the advantage by all doing certain things in the same way

One common claim made against standards is that they are anti-competitive and stop innovation. This may be true in other fields but not in MPEG as can be seen from the performance verification tests carried out in 1995 on MPEG-2 Video, which showed that coding was subjectively transparent at 6 Mbit/s for composite (PAL) and at 8 Mbit/s for components (YUV). The bitrate originally selected for operation was 4 Mbit/s, but today MPEG-2 is typically used at 2 Mbit/s, without changing decoders.

This was possible because MPEG standards specify the decoder (which provides the ability to reach customers) but are silent on the encoder, whose only constraint is the ability to produce conforming bitstreams.

Standards are an important component in the chain that bring innovation to consumers. An innovator is in a position to file a patent that has value per se, but has a greater value if the patent is within a standard. Since the goal of MPEG is to produce standards yielding maximum performance, licences are typically required to exercise MPEG standard. Royalties allow an innovator to continue innovating and filing other patents for use in new standards. Indeed, MPEG standards do not stop innovation.

Many industrial users are concerned by the amount they have to pay to exercise a patent in a standard, but that should not necessarily be the highest concern because often it is not so a matter of “how much” but of “how”.

In the analogue world patent remuneration used to be typically “apiece” and, in the digitised MPEG world of MPEG-2 remuneration is still per piece of “electronics” (but also per piece of “content” on a DVD). In the digital MPEG-4 Visual world remuneration is per piece of electronics but also per hour of pay streamed content. This licence clause has prevented for years adoption of the standard for pay video services on the web.

Use of digital technologies was hampered for many years by the large bitrate involved in digital audio and video as shown by the tables below, which give indicative nitrates:

 

Video

 

VHSSDHD4k8k
#lines2885761,0802,1604,320
#pixels3607201,9203,8407,680
Frame freq2525255050
Mbit/s411668296,63626,542

Audio

SpeechCDStereo5.122.2
Sampling freq.844.1484848
bits/sample816161616
#channels1225.3322.66
Mbit/s0.0641.4111.5364.09317.403

 

BaseScalableStereoDepthSelectable viewpoint
MPEG-1~VHS----
MPEG-22Mbit/s-10%-15%--
MPEG-4 Visual-25%-10%-15%--
MPEG-4 AVC-30%-25%-25%-20%5/10%
HEVC-60%-25%-25%-20%5/10%
 

Fortunately, there has been a constant progress in compressing digital audio and video while preserving the original quality, as shown by the table below:

In the “Base” column, percentage numbers refer to compression improvement compared to the previous generation of compression technology. The percentage numbers in the “Scalable”, “Stereo” and “Depth” columns refer to compression improvement compared to the technology on the immediate left. “Selectable viewpoint” refers to the ability to select and view an image from a viewpoint that was not transmitted.

In this context, it is interesting to inquire about the bitrate between eye/ear and the brain. There are about 1.2 million nerve fibres connecting the retina to the brain and about 30 thousand nerve fibres connecting the cochlear nerve to the brain. One nerve fibre can transmit a new impulse every ~6ms, i.e. it can generate 160 spikes/s. Assuming that 16 spikes are needed to make a bit we see that one eye sends ~12 Mbit/s and one ear sends ~300 kbit/s to the brain, as depicted in the figure below:

Video can take many forms:

  • Scalable video gives the possibility to extract different streams at different bitrates from a single bitstream
  • Multiview video is a signal that is generated by an array of cameras capturing the scene so that a user can see a scene from different viewpoints (possibly by interpolating existing views to create a view that was not captured and transmitted)
  • Screen content is a type of natural video that is mixed with graphics
  • High Dynamic Range seeks to extend the maximum brightness achievable on today’s displays beyond the usual not 100 nits (cd/sqm) and go to several thousand nits
  • Wide Colour Gamut is a system that is capable of reproducing a much larger set of colours that it is possible today
  • Augmented Reality is the integration of 3D natural and synthetic video and audio (and more)

We have seen that human eyes perform sophisticated processing to convert Pbit/s of visual input information to some output Mbit/s. Compact Descriptors for Visual Search (CDVS), a standard for video search, analysis and detection applications being developed by MPEG tries to do something conceptually similar. Applications for this standards are manifold and extend to mobile, automotive, SmartTV, surveillance, equipment maintenance, robotics, infomobility, tourism services, cultural heritage etc.

On the 10th of June, from 14 to 17 at Via Sannio 2 Milan, the Italian Institute for standardisation UNI will host an event (http://www.uninfo.it/) titled “ISO/IEC artificial vision standards for new services and industrial applications” and organised by UNINFO, the entity federated with UNI and delegated to handle Information Technologies and their applications.  

In conclusion, it is important to remember that standards are (just) enablers because how to benefit from standards is the real issue. In order to address this question, it is important to ask ourselves whether Italy is able to

  • exploit the Intellectual Property of standards;
  • capitalise on the standards-enabled (hardware and software) manufacturing;
  • have a holistic view of the entire process.

I suggest to have a look at how Digital Media in Italia (http://www.dmin.it/) endeavoured to “define and propose action areas that enable Italy to acquire a primary role in the exploitation of the global digital media phenomenon”.


Scienza in rete è un giornale senza pubblicità e aperto a tutti per garantire l’indipendenza dell’informazione e il diritto universale alla cittadinanza scientifica. Contribuisci a dar voce alla ricerca sostenendo Scienza in rete. In questo modo, potrai entrare a far parte della nostra comunità e condividere il nostro percorso. Clicca sul pulsante e scegli liberamente quanto donare! Anche una piccola somma è importante. Se vuoi fare una donazione ricorrente, ci consenti di programmare meglio il nostro lavoro e resti comunque libero di interromperla quando credi.


prossimo articolo

Why have neural networks won the Nobel Prizes in Physics and Chemistry?

This year, Artificial Intelligence played a leading role in the Nobel Prizes for Physics and Chemistry. More specifically, it would be better to say machine learning and neural networks, thanks to whose development we now have systems ranging from image recognition to generative AI like Chat-GPT. In this article, Chiara Sabelli tells the story of the research that led physicist and biologist John J. Hopfield and computer scientist and neuroscientist Geoffrey Hinton to lay the foundations of current machine learning.

Image modified from the article "Biohybrid and Bioinspired Magnetic Microswimmers" https://onlinelibrary.wiley.com/doi/epdf/10.1002/smll.201704374

The 2024 Nobel Prize in Physics was awarded to John J. Hopfield, an American physicist and biologist from Princeton University, and to Geoffrey Hinton, a British computer scientist and neuroscientist from the University of Toronto, for utilizing tools from statistical physics in the development of methods underlying today's powerful machine learning technologies.