[josm-dev] Audio user interface

Thu Feb 28 11:07:31 GMT 2008

Comments below, but I thought it might be useful to say that I have 
documented the procedures for audio mapping in some detail using it as 
it stands now, in the JOSM help:  Help > Help in JOSM, and click "Audio 
mapping" or directly at 
http://josm.openstreetmap.de/wiki/Help/HowTo/AudioMapping
I have removed the burgeoning detail in the main wiki 
(http://wiki.openstreetmap.org/index.php/JOSM#Automatically_Matching_Sound_Recordings_to_GPS_traces) 
and instead just summarized the techniques and put a link in to the 
detailed procedure.

On 28/02/2008 10:35, Chris Morley wrote:
> Being able to record a commentary hands-free while surveying and then 
> use it in JOSM is a great liberation and a big advance by David Earl. 
> But, like the rest of JOSM, it has been developed incrementally, which 
> doesn't always make for the best user interface. So forgetting about the 
> technical difficulties, what would the user (e.g. me) like to see? This 
> is about the no way-point mode only, which I see as likely to be used 
> more, if its interface was easy.

You're right - I built on what was there, which was the concept of Audio 
Markers, and some code to play sound. I also started with the waypoint 
method, which has rather coloured the later additions to support the 
voice method.

> The orange progress maker is a great concept. 

I've christened it the "play head" by the way.

> Not only is is fun to see 
> it moving, it also encapsulates the user understanding "I was here when 
> I said this". So to provide "what I said when I was here" the obvious 
> way is to move the marker - drag the orange marker, which would snap to 
> a GPX track point. This would avoid the current annoying interference 
> between the audio interface (clicking on an audio icon) and the normal 
> JOSM interface (selection,  making nodes etc). It would also get rid of 
> the blizzard of automatic audio markers (see 
> http://wiki.openstreetmap.org/index.php/Image:JOSMAudioMarkers2.png), 
> which have positions that are arbitrary to the user, and which occupy 
> more space than is justified by their usefulness.

Indeed. I think it ought to be possible to drag the play head, and had 
wondered about doing that. What I don't think I can do, without 
introducing a mode is to click anywhere on the track to start playing 
there. But so long as you don't have to follow the track while moving 
the marker I don't think that's a big deal.

So I think this one would be quite straightforward to do, and I'll work 
on it in due course.

In the meantime you can (a) dispense with the markers once you have 
synchronized - use the Show/Hide Text/Icons context menu entry, and (b) 
reduce their frequency to just a single one at the beginning if you like 
by setting the sampling rate in preferences to a very large number of 
seconds (say 100000), or to a gentle snowstorm instead of a blizzard, 
say 600 - one every 10 minutes.

Don't forget that you can now fast and very fast forward and back and 
recognize sound of interest as you do it.

> Calibration is currently complicated and difficult to remember. With 
> this proposal, the audio would be started and paused at the place you 
> said NOW. The orange marker would then be SHIFT dragged to the 
> geographical position that NOW corresponds to. It is is easy to 
> understand "I *was* here when I said this", with the SHIFT implying 
> coercion.

(This is 'synchronization' in my terminology - 'calibration' makes a 
tiny scaling adjustment for difference in clock speeds, 
'synchronization' aligns a point on the the audio with a point on the 
GPS track)

I'm not convinced that's so much easier. I think using the play head, 
which follows from the above, makes sense, but I'm not sure about using 
a modifier, as you have to remember that, as opposed to a menu entry - 
but they needn't be mutually exclusive I suppose. But obviously changing 
how you navigate the soundtrack would change how you do the synchronization.

Again, I'll look at this if when I look at manipulating the play head.

> It would be good if the interface could show where you were speaking and 
> where you were silent. Background noise makes this a challenge, but 
> assuming that it could be achieved, the information could be presented 
> as a slight enlargement of the gpx points or thickening of any line 
> connecting them, so that the information can be found if it is looked 
> for, but the presentation is not too overwhelming if it not.

The background noise is so overwhelming in my experience - not just wind 
and traffic, but also my own heavy breathing while biking - that there 
is no chance this would work.

What might stand a chance is trying to identify one particular pattern 
in the sound ("MARK!"), maybe trained: in effect an audio waypoint. Just 
seeking a single recognizable sound is less of a challenge than more 
general recognition, but even so this is a whole order of magnitude more 
difficult to implement than simply manipulating where you play the 
soundtrack from.

Audio markers would then be useful in this way of working too - though 
how you represent them is also debatable of course (I didn't invent 
these by the way, they were already in JOSM, but to use them you had to 
have a GPX file that knew about audio files).

Don't hold your breath on that one! Equally, I think it is not impossible.

David