The Audio Engineering research team here submit a lot of conference papers. In our internal reviewing and when we review submissions by others, certain things come up again and again. I’ve compiled all this together as some general advice for putting together a research paper for an academic conference, especially in engineering or computer science. Of course, there are always exceptions, and the advice below doesn’t always apply. But its worth thinking of this as a checklist to catch errors and issues in an early draft.
Make sure the abstract is self-contained. Don’t assume the person reading the abstract will read the paper, or vice-versa. Avoid acronyms. Be sure to actually say what the results were and what you found out, rather than just saying you applied the techniques and analysed the data that came out.
The abstract is part summary of the paper, and part an advertisement for why someone should read the paper. And keep in mind that far more people read the abstract than read the paper itself.
Make clear what the problem is and why it is important. Why is this paper needed, and what is going to distinguish this paper from the others?
In the last paragraph, outline the structure of the rest of the paper. But make sure that it is specific to the structure of the paper.
Background/state of the art/prior work – this could be a subsection of introduction, text within the introduction, or its own section right after the introduction. What have others done, what is the most closely related work? Don’t just list a lot of references. Have something to say about each reference, and relate them to the paper. If a lot of people have approached the same or similar problems, consider putting the methods into a table, where for each method, you have columns for short description, the reference(s), their various properties and their assumptions. If you think no one has dealt with your topic before, you probably just haven’t looked deep enough 😉 . Regardless, you should still explain what is the closest work, perhaps highlighting how they’ve overlooked your specific problem.
Problem formulation – after describing state of the art, this could be a subsection of introduction, text within the introduction, or its own section. Give a clear and unambiguous statement of the problem, as you define it and as it is addressed herein. The aim here is to be rigorous, and remove any doubt about what you are doing. It also allows other work to be framed in the same way. When appropriate, this is described mathematically, e.g., we define these terms, assume this and that, and we attempt to find an optimal solution to the following equation.
The structure of this, the core of the paper, is highly dependent on the specific work. One good approach is to have quite a lot of figures and tables. Then most of the writing is mainly just explaining and discusses these figures and tables, and the ordering of these should be mostly clear.
A typical ordering is
Describe the method, giving block diagrams where appropriate
Give any plots that analyse and illustrate the method, but aren’t using the method to produce results that address the problem
Present the results of using your method to address the problem. Keep the interpretation of the results here short, unless detailed explanation of a result it is needed to justify the next result that is presented. If there is lengthy discussion or interpretation, then leave that to a discussion section.
Equations and notation
For most papers in signal processing and related fields, at least a few equations are expected. The aim with equations is always to make the paper more understandable and less ambiguous. So avoid including equations just for the sake of it, avoid equations if they are just an obvious intermediate step, or if they aren’t really used in any way (e.g. ‘we use the Fourier transform, which by the way, can be given in this equation. Moving on…’), do use equations if they clear up any confusion when a technical concept is explained just with text.
Make sure every equation can be fully understood. All terms and notation should be defined, right before or right after they are used in the text. The logic or process of going from one equation to the next should be made clear.
Tables and figures
Where possible, these should be somewhat self-contained. So one should be able to look at a figure and understand it without reading the paper. If that isn’t possible, then it should be understood just by looking at the figure and figure caption. If not, then by just looking at the figure, caption and a small amount of text where the figure is described.
Figure captions typically go immediately below figures, but table captions typically above tables.
Label axes in figures wherever possible, and give units. If units are not appropriate, make clear that an axis is unitless. For any text within a figure, make sure that the font size used is close to the font size of the main text in the paper. Often, if you import a figure from software intending for viewing on a screen (like matlab), then the font can appear miniscule when the figure is imported into a paper.
Make sure all figures and tables are numbered and are all referenced, by their number, in the main text. Position them close to where they are first mentioned in the text. Don’t use phrasing that refers to their location, like ‘the figure below’ or ‘the table on the next page’, partly because their location may change in the final version.
Make sure all figures are high quality. Print out the paper before submitting and check that it all looks good, is high resolution, and nicely formatted.
Discussion and future work may be separate sections or part of the conclusion. Discussion is useful if the results need to be interpreted, but is often kept very brief in short papers where the results may speak for themselves.
Future work is not about what the author plans to do next. Its about research questions that arose or were not addressed, and research directions that are worth pursuing. The answers to these research questions may be pursued by the author or others. Here, you are encouraging others to build on the work in this paper, and suggesting to them the most promising directions and approaches. Future work is usually just a couple sentences or couple paragraphs at the end of conclusion, unless there is something particularly special about it.
The conclusion should not simply repeat the abstract or summarise the paper, though there may be an element of that. Its about getting across what were the main things that the reader should take away and remember. What was found out? What was surprising? What are the main insights that arose? If the research question is straightforward and directly addressed, what was the answer?
The most important criterion for references is to cite wherever it justifies a claim, clarifies a point, identifies where an idea is coming from someone else, or helps the reader find pertinent additional material. If you’re dealing with a very niche or underexplored topic, you may wish to give a full review of all existing literature on the subject.
Aim for references to come from high impact, recent peer reviewed journal articles, or as close to that as possible. So for instance, choose a journal over a conference article if you can, but maybe a highly cited conference paper over an obscure journal paper.
Avoid using web site references. If the reference is essentially just a URL, then put that directly in the text or as a footnote, but not as a citation. And no one cares when you accessed the website so no need to say ‘accessed on [date]’. If it’s a temporary record that may have only been there for a short period of time before the paper submission date, its probably not a reliable reference, won’t help the reader and you should probably find an alternative citation.
Check your reference formatting, especially if you use someone else’s reference library or some automatically generated citations. For instance some citations will have a publisher and a conference name, so it reads as ‘the X Society Conference, published by the X Society.
Be consistent. So for instance, have all references use author initials, or none of them. Always use journal abbreviations, or never use them. Always include the city of a conference, or never do it. And so on.
The 142nd AES Convention was held last month in the creative heart of Berlin. The four-day program and its more than 2000 attendees covered several workshops, tutorials, technical tours and special events, all related to the latest trends and developments in audio research. But as much as scale, it’s attention to detail that makes AES special. There’s an emphasis on the research side of audio topics as much as the side of panels of experts discussing a range of provocative and practical topics.
It can be said that 3D Audio: Recording and Reproduction, Binaural Listening and Audio for VR were the most popular topics among workshops, tutorial, papers and engineering briefs. However, a significant portion of the program was also devoted to common audio topics such as digital filter design, live audio, loudspeaker design, recording, audio encoding, microphones, and music production techniques just to name a few.
For this reason, here at the Audio Engineering research team within C4DM, we bring you what we believe were the highlights, the key talks or the most relevant topics that took place during the convention.
What better way to start AES than with a mastering experts’ workshop discussing about the future of the field? Jonathan Wyner (iZotope) introduced us to the current challenges that this discipline faces. This related to the demographic, economic and target formatting issues that are constantly evolving and changing due to advances in the music technology industry and its consumers.
When discussing the future of mastering, the panel was reluctant to a fully automated future. But pointed out that the main challenge of assistive tools is to understand artistry intentions and genre-based decisions without the need of the expert knowledge of the mastering engineer. Concluding that research efforts should go towards the development of an intelligent assistant, able to function as an smart preset that provides master engineers a starting point.
This paper described a method to digitally model an analogue dynamic range compression. Based on the analysis of processed and unprocessed audio waveforms, a generic model of dynamic range compression is proposed and its parameters are derived from iterative optimization techniques.
Audio samples were reproduced and the quality of the audio produced by the digital model was demonstrated. However, it should be noted that the parameters of the digital compressor can not be changed, thus, this could be an interesting future work path, as well as the inclusion of other audio effects such as equalizers or delay lines.
In the paper ‘Formal Usability Evaluation of Audio Track Widget Graphical Representation for Two-Dimensional Stage Audio Mixing Interface‘ an evaluation of different graphical track visualization styles is proposed. Multitrack visualizations included text only, different colour conventions for circles containing text or icons related to the type of instruments, circles with opacity assigned to audio features and also a traditional channel strip mixing interface.
Efficiency was tested and it was concluded that subjects preferred instrument icons as well as the traditional mixing interface. In this way, taking into account several works and proposals on alternative mixing interfaces (2D and 3D), there is still a lot of scope to explore on how to build an intuitive, efficient and simple interface capable of replacing the good known channel strip.
This tutorial, was based on the engineering brief ‘Quantization Noise of Warped and Parallel Filters Using Floating Point Arithmetic’ where warped parallel filters are proposed, which aim to have the frequency resolution of the human ear.
Thus, via Matlab, we explored various approaches for achieving this goal, including warped FIR and IIR, Kautz, and fixed-pole parallel filters. Providing in this way a very useful tool that can be used for various applications such as room EQ, physical modelling synthesis and perhaps to improve existing intelligent music production systems.
Abbey Road’s James Clarke presented a great poster with the actual algorithm that was used for the remixed, remastered and expanded version of The Beatles’ album Live at the Hollywood Bowl. The method achieved to isolate the crowd noise, allowing to separate into clean tracks everything that Paul McCartney, John Lennon, Ringo Starr and George Harrison played live in 1964.
The results speak for themselves (audio comparison). Thus, based on a Non-negative Matrix Factorization (NMF) algorithm, this work provides a great research tool for source separation and reverse-engineer of mixes.
Other keynotes worth to mention:
The rest of the paper proceedings are available in the AES E-library.
Last week saw the 2017 International Conference on Semantic Audio by the Audio Engineering Society. Held at Fraunhofer Institute for Integrated Circuits in Erlangen, Germany, delegates enjoyed a well-organised and high-quality programme, interleaved with social and networking events such as a jazz concert and a visit to Erlangen’s famous beer cellars. The conference was a combined effort of Fraunhofer IIS, Friedrich-Alexander Universität, and their joint venture Audio Labs.
As the topic is of great relevance to our team, Brecht De Man and Adán Benito attended and presented their work there. With 5 papers and a late-breaking demo, the Centre for Digital Music in general was the most strongly represented institution, surpassing even the hosting organisations.
Adán Benito presented “Intelligent Multitrack Reverberation Based on Hinge-Loss Markov Random Fields“, machine learning approach to automatic application of a reverb effect to musical audio.
Brecht De Man demoed the “Mix Evaluation Browser“, an online interface to access a dataset comprising several mixes of a number of songs, complete with corresponding DAW files, raw tracks, preference ratings, and annotated comments from subjective listening tests.
Still from the Centre for Digital Music, Delia Fano Yela delivered a beautifully hand-drawn and compelling presentation about source separation in general and how temporal context can be employed to considerably improve vocal extraction.
Rodrigo Schramm and Emmanouil Benetos won the Best Paper award for their paper “Automatic Transcription of a Cappella recordings from Multiple Singers”.
Emmanouil further presented another paper, “Polyphonic Note and Instrument Tracking Using Linear Dynamical Systems”, and coauthored “Assessing the Relevance of Onset Information for Note Tracking in Piano Music Transcription”.
Several other delegates were frequent collaborators or previously affiliated with Queen Mary. The opening keynote was delivered by Mark Plumbley, former director of the Centre for Digital Music, who gave an overview of the field of machine listening, specifically audio event detection and scene recognition. Nick Jillings, formerly research assistant and master project student at the Audio Engineering group, and currently a PhD student at Birmingham City University cosupervised by Josh Reiss, head of our Audio Engineering group, presented his paper “Investigating Music Production Using a Semantically Powered Digital Audio Workstation in the Browser” and demoed “Automatic channel routing using musical instrument linked data”.
Other keynotes were delivered by Udo Zölzer, best known from editing the collection “DAFX: Digital Audio Effects”, and Masataka Goto, a household name in the MIR community who discussed his own web-based implementations of music discovery and visualisation.
Paper proceedings are already available in the AES E-library, free for AES members.
The next Audio Engineering Society convention is just around the corner, May 20-23 in Berlin. This is an event where we always have a big presence. After all, this blog is brought to you by the Audio Engineering research team within the Centre for Digital Music, so its a natural fit for a lot of what we do.
These conventions are quite big, with thousands of attendees, but not so big that you get lost or overwhelmed. The attendees fit loosely into five categories: the companies, the professionals and practitioners, students, enthusiasts, and the researchers. That last category is where we fit.
I thought I’d give you an idea of some of the highlights of the Convention. These are some of the events that we will be involved in or just attending, but of course, there’s plenty else going on.
On Saturday May 20th, 9:30-12:30, Dave Ronan from the team here will be presenting a poster on ‘Analysis of the Subgrouping Practices of Professional Mix Engineers.’ Subgrouping is a greatly understudied, but important part of the mixing process. Dave surveyed 10 award winning mix engineers to find out how and why they do subgrouping. He then subjected the results to detailed thematic analysis to uncover best practices and insights into the topic.
2:45-4:15 pm there is a workshop on ‘Perception of Temporal Response and Resolution in Time Domain.’ Last year we published an article in the Journal of the Audio Engineering Society on ‘A meta-analysis of high resolution audio perceptual evaluation.’ There’s a blog entry about it too. The research showed very strong evidence that people can hear a difference between high resolution audio and standard, CD quality audio. But this brings up the question of why? Many people have suggested that the fine temporal resolution of oversampled audio might be perceived. I expect that this Workshop will shed some light on this as yet unresolved question.
Overlapping that workshop, there are some interesting posters from 3 to 6 pm. ‘Mathematical Model of the Acoustic Signal Generated by the Combustion Engine‘ is about synthesis of engine sounds, specifically for electric motorbikes. We are doing a lot of sound synthesis research here, and so are always on the lookout for new approaches and new models. ‘A Study on Audio Signal Processed by “Instant Mastering” Services‘ investigates the effects applied to ten songs by various online, automatic mastering platforms. One of those platforms, LandR, was a high tech spin-out from our research a few years ago, so we’ll be very interested in what they found.
For those willing to get up bright and early Sunday morning, there’s a 9 am panel on ‘Audio Education—What Does the Future Hold,’ where I will be one of the panellists. It should have some pretty lively discussion.
Then there’s some interesting posters from 9:30 to 12:30. We’ve done a lot of work on new interfaces for audio mixing, so will be quite interested in ‘The Mixing Glove and Leap Motion Controller: Exploratory Research and Development of Gesture Controllers for Audio Mixing.’ And returning to the subject of high resolution audio, there is ‘Discussion on Subjective Characteristics of High Resolution Audio,’ by Mitsunori Mizumachi. Mitsunori was kind enough to give me details about his data and experiments in hi-res audio, which I then used in the meta-analysis paper. He’ll also be looking at what factors affect high resolution audio perception.
From 10:45 to 12:15, our own Brecht De Man will be chairing and speaking in a Workshop on ‘New Developments in Listening Test Design.’ He’s quite a leader in this field, and has developed some great software that makes the set up, running and analysis of listening tests much simpler and still rigorous.
From 1 to 2 pm, there is the meeting of the Technical Committee on High Resolution Audio, of which I am co-chair along with Vicki Melchior. The Technical Committee aims for comprehensive understanding of high resolution audio technology in all its aspects. The meeting is open to all, so for those at the Convention, feel free to stop by.
Sunday evening at 6:30 is the Heyser lecture. This is quite prestigious, a big talk by one of the eminent people in the field. This one is given by Jorg Sennheiser of, well, Sennheiser Electronic.
Monday morning 10:45-12:15, there’s a tutorial on ‘Developing Novel Audio Algorithms and Plugins – Moving Quickly from Ideas to Real-time Prototypes,’ given by Mathworks, the company behind Matlab. They have a great new toolbox for audio plugin development, which should make life a bit simpler for all those students and researchers who know Matlab well and want to demo their work in an audio workstation.
Again in the mixing interface department, we look forward to hearing about ‘Formal Usability Evaluation of Audio Track Widget Graphical Representation for Two-Dimensional Stage Audio Mixing Interface‘ on Tuesday, 11-11:30. The authors gave us a taste of this work at the Workshop on Intelligent Music Production which our group hosted last September.
In the same session – which is all about ‘Recording and Live Sound‘ so very close to home – a new approach to acoustic feedback suppression is discussed in ‘Using a Speech Codec to Suppress Howling in Public Address Systems‘, 12-12:30. With several past projects on gain optimization for live sound, we are curious to hear (or not hear) the results!
Featuring contributions from Dave Moffat and Brecht De Man
As you might know, or can guess, we’re heavily involved in the Audio Engineering Society, which is the foremost professional organisation in this field. We had a big impact at two of their recent conferences.
The 60th AES conference on Dereverberation and Reverberation of Audio Music and Speech took place in Leuven, Belgium, 3-5 February 2016.. The conference was based around a European funded project of the same name (DREAMS – http://www.dreams-itn.eu/) and aimed to bring together all expertiese in reverb and reverb removal.
The conference started out with a fantastic overview of reverberation technology, and how it has progressed over the past 50 years, by Vesa Välimäki. The day then went on to present work on object based coding and reverberation, computation dereverberation techniques.
Day two started with Thomas Brand discussing sound spatialisation and how participants are much more tolerant of reverberation in binaural listening conditions. Further work then presented on physical modelling approaches to reverberation simulation, user perception, and spatialisation of audio in the binaural context.
Day three began with Emanuël Habets, presenting on the past 50 years of reverberation removal, discussing that since we started modelling reverberation, we have also been trying to remove it from audio signals. Work was then presented on multichannel dereverberation and computational sound field modelling techniques.
The Audio Engineering group from C4DM were there in strength, presenting two papers and a demo session. David Moffat presented work on the impact dereverberation can make when combined with state of the art source separation technologies. Emmanouil Theofanis Chourdakis presented a hybrid model which, based on machine learning technologies, can intelligently apply reverberation to an audio track. Brecht De Man presented his latest research, as part of the demo session and again in a plenary lecture, on analysis of studio mixing practices, focused on analysing the perception of reverberation in multitrack mixes.
The following week was the AES Audio for Games conference in London. This is the fifth game audio conference they’ve had, and we’ve been involved in this conference series since its inception in 2009. C4DM researchers Dave Moffat, Will Wilkinson and Christian Heinrichs all presented work related to sound synthesis and procedural audio, which is becoming a big focus of our efforts (more to come!).
Brecht De Man put together an excellent report of the conference, where you can find out a lot more.