Pubblicato il 29/05/2014Read time: 6 mins

Standards are a vital component of communication. In 1984, while working for CSELT, the research center of what is today Telecom Italia, I submitted RACE IVICO (Integrated Video Codec), a project aimed at developing a European microelectronic technology for digital video in partnership with representatives of most relevant European industries. The project was approved but two years later was terminated because of the jarring differences with the European audio-visual policy of that time (digital audio-video planned to play a role in the first decade of the 21^st century), but also because digital audio-visual was assumed to play a major role in the broadband strategy inside a Telco like Telecom Italia.

A year later, seeing that it was not possible to develop a European microelectronic technology for audio-visual, I decided that an international standard could at least be developed and, in 1988, I established the Moving Picture Experts Group (MPEG), a working group in ISO/IEC JTC 1 Information Technologies.

“Standard” is a well-used and misused word, but not all standards are the same. In the case of audio and video there should not be separate standards for audio and video but unified audio-visual standards for all client industries: Broadcasting, Consumer Electronics, IT and Telecommunications, a single standard for digital representation of audio-visual information separate from potentially different delivery standards. I believe this is the main reason for the success of MPEG standards.

Several definitions of standard can be found:

Webster’s
- A conspicuous object (as a banner) formerly carried at the top of a pole and used to mark a rallying point especially in battle or to serve as an emblem
- Something that is established by authority, custom or general consent as a model or an example to be followed
Encyclopaedia Britannica
- (Technical specification) that permits large production runs of component parts that are readily fitted to other parts without adjustment
My definition
- Codified agreement between parties who recognise the advantage by all doing certain things in the same way

One common claim made against standards is that they are anti-competitive and stop innovation. This may be true in other fields but not in MPEG as can be seen from the performance verification tests carried out in 1995 on MPEG-2 Video, which showed that coding was subjectively transparent at 6 Mbit/s for composite (PAL) and at 8 Mbit/s for components (YUV). The bitrate originally selected for operation was 4 Mbit/s, but today MPEG-2 is typically used at 2 Mbit/s, without changing decoders.

This was possible because MPEG standards specify the decoder (which provides the ability to reach customers) but are silent on the encoder, whose only constraint is the ability to produce conforming bitstreams.

Standards are an important component in the chain that bring innovation to consumers. An innovator is in a position to file a patent that has value per se, but has a greater value if the patent is within a standard. Since the goal of MPEG is to produce standards yielding maximum performance, licences are typically required to exercise MPEG standard. Royalties allow an innovator to continue innovating and filing other patents for use in new standards. Indeed, MPEG standards do not stop innovation.

Many industrial users are concerned by the amount they have to pay to exercise a patent in a standard, but that should not necessarily be the highest concern because often it is not so a matter of “how much” but of “how”.

In the analogue world patent remuneration used to be typically “apiece” and, in the digitised MPEG world of MPEG-2 remuneration is still per piece of “electronics” (but also per piece of “content” on a DVD). In the digital MPEG-4 Visual world remuneration is per piece of electronics but also per hour of pay streamed content. This licence clause has prevented for years adoption of the standard for pay video services on the web.

Use of digital technologies was hampered for many years by the large bitrate involved in digital audio and video as shown by the tables below, which give indicative nitrates:

Video

	VHS	SD	HD	4k	8k
#lines	288	576	1,080	2,160	4,320
#pixels	360	720	1,920	3,840	7,680
Frame freq	25	25	25	50	50
Mbit/s	41	166	829	6,636	26,542

Audio

	Speech	CD	Stereo	5.1	22.2
Sampling freq.	8	44.1	48	48	48
bits/sample	8	16	16	16	16
#channels	1	2	2	5.33	22.66
Mbit/s	0.064	1.411	1.536	4.093	17.403

	Base	Scalable	Stereo	Depth	Selectable viewpoint
MPEG-1	~VHS	-	-	-	-
MPEG-2	2Mbit/s	-10%	-15%	-	-
MPEG-4 Visual	-25%	-10%	-15%	-	-
MPEG-4 AVC	-30%	-25%	-25%	-20%	5/10%
HEVC	-60%	-25%	-25%	-20%	5/10%

Fortunately, there has been a constant progress in compressing digital audio and video while preserving the original quality, as shown by the table below:

In the “Base” column, percentage numbers refer to compression improvement compared to the previous generation of compression technology. The percentage numbers in the “Scalable”, “Stereo” and “Depth” columns refer to compression improvement compared to the technology on the immediate left. “Selectable viewpoint” refers to the ability to select and view an image from a viewpoint that was not transmitted.

In this context, it is interesting to inquire about the bitrate between eye/ear and the brain. There are about 1.2 million nerve fibres connecting the retina to the brain and about 30 thousand nerve fibres connecting the cochlear nerve to the brain. One nerve fibre can transmit a new impulse every ~6ms, i.e. it can generate 160 spikes/s. Assuming that 16 spikes are needed to make a bit we see that one eye sends ~12 Mbit/s and one ear sends ~300 kbit/s to the brain, as depicted in the figure below:

Video can take many forms:

Scalable video gives the possibility to extract different streams at different bitrates from a single bitstream
Multiview video is a signal that is generated by an array of cameras capturing the scene so that a user can see a scene from different viewpoints (possibly by interpolating existing views to create a view that was not captured and transmitted)
Screen content is a type of natural video that is mixed with graphics
High Dynamic Range seeks to extend the maximum brightness achievable on today’s displays beyond the usual not 100 nits (cd/sqm) and go to several thousand nits
Wide Colour Gamut is a system that is capable of reproducing a much larger set of colours that it is possible today
Augmented Reality is the integration of 3D natural and synthetic video and audio (and more)

We have seen that human eyes perform sophisticated processing to convert Pbit/s of visual input information to some output Mbit/s. Compact Descriptors for Visual Search (CDVS), a standard for video search, analysis and detection applications being developed by MPEG tries to do something conceptually similar. Applications for this standards are manifold and extend to mobile, automotive, SmartTV, surveillance, equipment maintenance, robotics, infomobility, tourism services, cultural heritage etc.

On the 10^th of June, from 14 to 17 at Via Sannio 2 Milan, the Italian Institute for standardisation UNI will host an event (http://www.uninfo.it/) titled “ISO/IEC artificial vision standards for new services and industrial applications” and organised by UNINFO, the entity federated with UNI and delegated to handle Information Technologies and their applications.

In conclusion, it is important to remember that standards are (just) enablers because how to benefit from standards is the real issue. In order to address this question, it is important to ask ourselves whether Italy is able to

exploit the Intellectual Property of standards;
capitalise on the standards-enabled (hardware and software) manufacturing;
have a holistic view of the entire process.

I suggest to have a look at how Digital Media in Italia (http://www.dmin.it/) endeavoured to “define and propose action areas that enable Italy to acquire a primary role in the exploitation of the global digital media phenomenon”.

prossimo articolo

How far has scientific culture come in Italy in the last twenty years?

di Luca Carra

Pubblicato il 20/03/2024

It will be presented on March 18 the 20th edition of the Science Technology and Society Yearbook by Observa, which gathers twenty years of data to provide an overview of the most significant dynamics and trends in the relationships between science, technology, and society. Here is our review of the report.

Scientific research policy

Often when the Italian speaker discusses any topic, they express their opinions. The Anglo-Saxon speaker, on the other hand, often starts by presenting data, and then, if really necessary, offers their opinion.

Digital media standards for improved user experience

Primary tabs

prossimo articolo

Questo sito utilizza i cookies per migliorare l'esperienza di navigazione.