Malcolm Slaney
Publications and Pointers
I now work in the Conversational
Systems Laboratory at Microsoft in Mountain View, CA.
I used to work at Yahoo!
Research and
IBM's Almaden Research Center. My IBM work is described on
this page.
Before
that I worked for Interval
Research, Apple Computer's Advanced Technology Group, and
Schlumberger
Palo Alto Research.
Several of my technical reports and papers are available on the
net
for downloading. The following is a brief list. I have a personal
web page for the fun stuff. Many of my papers can be found
online via the IEEE
or the ACM
portals.
This page shows my multimedia
analysis work, auditory
modeling
work, my signal processing work,
some
of
my software tools, and pointers to other work.
Note! My tomography
book
is now online. Get more information here.
The book is back in print and you can order it now from
SIAM. SIAM
honored us by including it in their "Classics
in Applied Mathematics"series of books!!!
Multimedia Analysis
I spent a few years investigating an algorithm
known as Locality Sensitive Hashing (LSH) that is used to
efficiently find nearest neighbors. I wanted to understand
how to make LSH more efficient. I wrote a tutorial
with Michael Casey and Christoph Rhodes. Then with
colleagues at Yahoo I wrote a "definitive" article about how
to choose the optimum parameters. Both the Matlab
(optimization) and Python (implementation) code is online
too.
|
A tutorial
about LSH.
How to optimize
LSH
GIT
Repository for Matlab and Python Code
|
I'm writing a column for IEEE Multimedia
Magazine about my vision of the multimedia world. The
columns are online. |
Vision and
Views |
I get to
work with lots of wonderful
image data and some very smart computer-vision people.
For the last couple of years, I've been working with Rainer
Lienhart
and Eva Hoerster on image classification in large databases.
|
With Eva and Rainer: CIVR2008,
DAGM2008,
CIVR2007,
ICASSP2007
With Srinivasan: Bipartitate
Models
|
I've been working on finding similar
songs in large music
databases with Michael
Casey at Dartmouth and Goldsmiths College, University
of
London. We want to
find matches that are similar, but not exact (fingerprinting
finds
exact matches.) Michael wrote a great overview of music-information
retrieval, and I helped edit a special
issue
of IEEE Transactions on Audio, Speech and Language
Processing.
I've also been working with William
White
from Yahoo's media group to better understand how to deliver
music. We've characterized the diversity
of people's musical interests, studied item-to-item
similarity (using 480,000 subjects), and, most
recently, survey
several techniques for content-based similarity.
And work with Benjamin Marlin when he was an intern at
Yahoo! Research
turned into a nice paper about modeling
uncertainty
in rating data.
|
Best overview of our music-similarity work is
in IEEE
TASLP. See earlier work at ICASSP
2007, ICASSP
2006
and ISMIR
2006.
2006 - Music
Diversity
2007 - Item
Similarity
2008 - Content
Similarity
|
Auditory Modeling
There is now a new version of
the Auditory Toolbox. It
contains Matlab
functions to implement many different kinds of auditory
models. The
toolbox
includes code for Lyon's passive longwave model,
Patterson's gammatone
filterbank, Meddis' hair cell model, Seneff's auditory
model,
correlograms
and several common representations from the
speech-recognition world
(including
MFCC, LPC and spectrograms). This code has been tested on
Macintosh,
Windows,
and Unix machines using Matlab 5.2.
Note: This toolbox was originally published as Apple
Computer
Technical
Report #45. The old technical report ( PDF
PDF and Postscript
) and old code ( Unix
TAR and Macintosh
BinHex ) are available for historical reasons.
|
Auditory
Toolbox
(Version
2.0) |
My primary scientific goal is to
understand how our
brains perceive
sound.
My role in this research area is a modeler, I build models
that explain
the neurophysiological and psychoacoustic data. Hopefully
these models
will help other researchers understand the mechanisms
involved and
result
in better experiments. My latest work in this area is titled
"Connecting
Correlograms to Neurophysiology and Psychoacoustics" and was
presented
at the XIth
International
Symposium on Hearing in Grantham England from 1-6
August,
1997. Two correlograms, one computed using autocorrelation
and other
other
computed using AIM, are shown on the left. |
Abstract |
I have written
several papers describing how to convert
auditory representations
into sounds. I have built models of the cochlea and
central auditory
processing,
which I hope both explain auditory processing and will
allow us to
build
auditory sound separation tools. These papers describe the
process of
converting
sounds into cochleagrams and correlograms, and then
converting these
representations
back into sounds. Unlike the printed versions of this
work, the web
page
includes audio file examples. It includes better
spectrogram inversion
techniques, a description of how to invert Lyon's passive
cochlear
model,
and a description of correlogram inversion. This material
was first
presented
as part of the Proceedings of the ATR Workshop on "A
Biological
Framework
for Speech Perception and Production" published in
September 1994.
A more refined version of this paper was an invited talk
at the 1994
NIPS
conference. The image on the left shows the
spectrogram of one
channel of cochlear output; one step in the correlogram
inversion
process. |
ATR
(Kyoto)
Workshop Web Reprint with Sound Examples
Keynote
NIPS
Conference Paper (Postscript)
|
Pattern Playback is
the term used by Frank
Cooper to describe his
successful
efforts to paint spectrogram on plastic and then convert
them into
sound.
I wrote of Pattern Playback techniques, from Frank
Cooper's efforts to
my own efforts with auditory model inversion, in a paper
which was
published
at the 1995 IEEE International Conference on Systems,
Man, and
Cybernetics.
My paper is titled "Pattern Playback from 1950 to 1995".
The image at
the
left shows a portion of one of Cooper's spectrograms. |
Web
Version
Postscript
(1.8M)
Adobe
PDF
(227k)
|
The following
are publications during my time at Apple. The Mathematica notebooks
are
designed to be self-documenting and in each case the postscript and
PDF files are also available. Those files that are Matlab toolboxes
include
source and documentation All these files are available with the
gracious permission of Apple.
| "Auditory Model Inversion for Sound
Separation"
is the
first paper to describe correlogram inversion techniques.
We also
discuss
improved methods for inverting spectrograms and a cochlear
model
designed
by Richard F. Lyon. This paper was published at ICASSP
'94. |
Postscript
(1.5M)
Adobe
PDF
(243k)
Online
patent
|
| "A Perceptual Pitch Detector" is a paper
that
describes
a model of human pitch perception. It is similar to work
done by Meddis
and Hewitt and published in JASA, but this paper has more
real-world
examples.
This paper was published at ICASSP '90. |
Postscript
(3M)
Adobe
PDF
(315k)
|
| "On the importance of time" is an invited
chapter
by Dick
Lyon and myself in the book Visual
Representations
of Speech Signals (edited by Martin Cooke,
Steve
Beet and Malcolm Crawford, John Wiley & Sons). This
tutorial
describes
the reason that we think time-domain processing is
important when
modeling
the cochlea and higher-level processing. |
Postscript
Adobe
PDF
|
| A software package called MacEar implements
the
latest
version of Lyon's Cochlear Model. MacEar is written in
very portable C
for Unix and Macintosh computers. This link points to the
last
published
version (2.2). (Note the README file included has old
program results.
The names of the output files have changed and there are a
couple of
extra
channels being output. I'm sorry for the confusion.) |
Unix
Shell
Archive with Sources |
Signal Processing
| I recently finished some nice work
establishing a
linear
operator connecting the audio and video of a speaker. A
paper
describing
this work has been accepted for presentation at the
NIPS'2000
conference. |
PDF
Paper
(600k) |
Chris
Bregler, Michele Covell, and I developed a technique we
call
Video
Rewrite to automatically synthesize video of talking
heads. This
technology
is cool because we use a purely data driven approach
(concatenative
triphone
video synthesis) to create new video of a person speaking.
Given new
audio,
we concatenate the best sequence of lip images and morph
them into a
background
sequence. We can automatically create sequences like the
Kennedy and
Johnson
scenes in the movie "Forrest Gump." |
Original
SIGGRAPH
'97 Paper (with examples)
Audio
Visual
Speech Perception Workshop
|
We studied how adults convey
affective messages to
infants using
prosody.
We did not attempt to recognize the words, let alone to
distill more
nebulous
concepts such as satire or irony. We analyzed speech with
low-level
acoustic
features and discriminated approval, attentional bids, and
prohibitions
from adults speaking to their infants. We built automatic
classifiers
to
create a system, Baby Ears, that performs the task that
comes so
naturally
to infants. The image on the left shows one of the
decision surfaces
which
classifies approval, attention and prohibition utterances
on the basis
of their pitch. |
Web
Page
Postscript
(189k)
Adobe
PDF
(42k)
|
| We wrote a more detailed article describing
this
work for the journal Speech
Communications. We can't post that article,
but I can send you
a copy if you send me email. |
Send email
for a copy of journal
article. |
| I was able to help Michele Covell do some
neat
work on
time-compression of audio. Lots of people know how to
compress a speech
utterance by a constant amount. But if you want to do
better, which
parts
of the speech signal can be compressed the most? This
paper describes a
good technique and shows how to test the resulting
comprehension. |
Conference
Paper
Technical
Report
with Audio Samples
|
Eric
Scheirer and I worked on a system for discriminating
between
speech
and music in an audio signal. This paper describes a large
number of
features,
how they can be combined into a statistical framework, and
the
resulting
performance on discriminating signals found on radio
stations. The
results
are better then anybody else's results. (That comparison
is not
necessarily
valid since there are no common testing databases. We did
work hard to
make our test set representative.) This paper was
published at the 1997
ICASSP in Munich. The image on the left shows clouds of
our data. |
Web
Page
Postscript
(349k)
Adobe
PDF
(263k)
|
Work we've done to morph between
two sounds is described
in a paper at
the 1996 ICASSP. This work is new because it extends
previous audio
morphing
work to include inharmonic sounds. This paper uses results
from
Auditory
Scene Analysis to represent, match, warp, and then
interpolate between
two sounds. The image on the left shows the smooth
spectrogram, one of
two independent representations used when morphing audio
signals. |
Web
Page
Postscript
(3M)
Adobe
PDF
(237k)Patent
|
| I wrote an article describing my experiences writing
"intelligent"
signal processing documents. My Mathematica notebook "Lyon's
Cochlear
Model"
was the first large document written with Mathematica. While
I don't
use
Mathematica as much as I used to, I still believe that
intelligent
documents
are a good way to publish scientific results. These ideas
were also
published
in a book titled "Knowledge Based Signal Processing" that
was published
by Prentice Hall. |
KBSP
Book
Chapter in Adobe PDF (3M)
IEEE
Signal
Processing Article in Adobe PDF (2M)
|
Software Publications
I
have written Matlab m-functions that read and write
QuickTime movies.
The
WriteQTMovie code is more general than previous solutions
for creating
movies in Matlab. It runs on any platform that Matlab runs
on. It also
lets you add sound to the movie. The ReadQTMovie code reads
and parses
JPEG compressed moves. |
Matlab
Source
Code |
Chris
Bregler and I coded an implementation of an image
processing
technique
known as snakes. There are two m-files that implement a type
of dynamic
contour following popular in computer vision. First proposed
by Kass,
Witkin
and Terzopoulos in 1987, snakes are a variational technique
to find the
best contour that aligns with an image. The basic routine,
snake.m,
aligns
a sequence of points along a contour to the maximum of an
array or
image.
Provide it with an image, a set of starting points, limits
on the
search
space and it returns a new set of points that better align
with the
image.
The second m-file is a demonstration script. Using your own
array of
image
data, or a built-in default, a demo window is displayed
where you can
click
to indicate points and see the snake program in action. |
Matlab
Source
Code
Matlab
Demonstration
Source
|
| Michele Covell and I wrote some Matlab code
to
compute
multi-dimensional scaling (MDS). MDS allows you to
reconstruct an
estimate
of the position of points, given just relative distance
data. These
routines
do both metric (where you know distances) and non-metric
(where you
just
now the order of distances) data. |
Technical
report
containing the code (no documentation). |
Apple Publications
| The SoundAndImage toolbox is a collection of Matlab
tools to make it easier to work with sounds and images. On
the
Macintosh,
tools are provided to record and playback sounds through the
sound
system,
and to copy images to and from the scrapbook. For both
Macintosh and
Unix
system, routines are provided to read and write many common
sound
formats
(including AIFF). Only 68k MEX files are included. Users on
other
machines
will need to recompile the software. This toolbox is
published as Apple
Computer Technical Report #61. |
Postscript
Documentation
(153k)
Adobe
PDF
Documentation (20k)
Macintosh
Archive
|
I
created a Hypercard stack to make it easier for people
with a
Macintosh
and CDROM drive to interact with the Acoustical Society of
America's Auditory
Demonstrations
CD. This CD is a wonderful collection of auditory
effects
and principles. The ASA Demo Hypercard stack includes the
text and
figures
from the book and lets you browse the Audio CD. |
Macintosh
Archive |
I
wrote a program for the Macintosh 660/AV and 840/AV
computers that uses
the DSP (AT&T3210) to monitor audio levels. VUMeters
runs on any
Macintosh
with the AT&T DSP chip. Source and binaries are
included. |
Macintosh
Archive |
Bill
Stafford and I wrote TCPPlay to allow us to play sounds
from a
Unix
machine over the network to the Macintosh on our desks.
This archive
includes
Macintosh and Unix source code and the Macintosh
application. There are
other network audio solutions, but this works well on the
Macintosh. |
Macintosh
Archive |
Previous Publications
In
a past life, I worked on medical imaging. A book on
tomographic imaging
(cross-sectional x-ray imaging) was published by IEEE Press:
Avinash C.
Kak and Malcolm Slaney, Principles of Computerized
Tomographic
Imaging,
(New York : IEEE Press, c1988). The software used to
generate many of
the
tomographic images in this book is available. The parallel
beam
reconstruction
on the left was generated with the commands
gen n=100 k=100 if=lib.d.s
filt n=100 k=100
back n=100 k=100
disn min=1.0 max=1.05
|
Tomographic
Software
(Unix TAR format)
Tomographic
Software
(Shell archive)
The book is now online. Download
the
PDF or order the book from SIAM)
|
| Carl Crawford, Mani Azimi and I wrote a simple
Unix plotting
package called qplot. Both two-dimensional and 3d-surface
plots are
supported. |
Compressed
Unix
TAR File |
| Now obsolete code to implement a DITroff
previewer under
SunView is available. This program was called suntroff
and is
an
ancestor of the X Window System Troff previewer. It was
written while I
was an employee of Schlumberger Palo Alto Research. All
files are
compressed
Unix TAR files. |
Source
LaserWriter
fonts
Complete
package
|
Other Research Pointers
I organize the Stanford CCRMA Hearing Seminar.
Just
about any topic related to auditory perception is considered fair
game
at the seminar. An archive of seminar announcements can be found at
Stanford
(organized
as a table) or at UCSC
as a chronological listing of email announcements. Send email to
hearing-seminar-request@ccrma.stanford.edu
if you would like to be added to the mailing list.
For more Information
I can be reached at
Malcolm Slaney
The best way to reach me is to send email.
This page last updated on September 3, 2012.
Malcolm Slaney (
malcolm@ieee.org)