Have you checked that it's setup to use North American CWCID signaling (Bellcore GR-30-CORE)? I listened to the recording, and it didn't sound like the right sequence at all. I don't have something to play it slowly handy, but it sounds like the CW signaling is being interspersed with the audible ringback as your call is being connected. Does it sound the same during a normal conversation?
There's a great document that used to be floating around (http://www.testmark.com/develop/tml_callerid_cnt.html), but has gone dark. This document explains the whole process, including the ACK that the phone will send back if it has successfully muted the handset prior to the actual FSK (which may be the difference between your slimline and Panasonic sets).

Here's an excerpt:
The sequence of events that displays the information about the caller begins when the central office temporarily removes the far end party of the current call and sends a Subscriber Alerting Signal (SAS) to the near end party. The SAS is a single frequency of 440 Hz that is applied for approximately 300 ms. This is the tone that is heard when a call is in progress and call waiting beeps to indicate a second call. The SAS tone is mainly for the user and is not required for the CPE to receive the CID information. The SAS tone is followed by a CAS to alert the CPE that it has CID information to send. The CAS is a dual tone signal combination of 2130 Hz and 2750 Hz that is 80 ms in length. Once the CPE hears the CAS it mutes the handset of the telephone and returns an ACK signal to the central office. This ACK signal has a nominal tone duration of 60 ms and is either a Dual-Tone Multifrequency (DTMF) "A" or a DTMF "D". The DTMF "D" is the most common ACK signal and it consists of the frequencies 941 Hz and 1633 Hz. A DTMF "A" consists of the frequencies 697 Hz and 1633 Hz. Once the central office receives the ACK signal it in turn sends the CID information[7][8][10]. The CPE un-mutes the handset as soon as it finishes receiving the FSK signal.