...making Linux just a little more fun!

next -->

Mailbag


Mailbag

Piet Programming
[TAG] Locking down a Linux box
Talkback links should CC article author
Confusion about linux fonts
FVWM Kiosk - a different approach
How to convert RedHat9 to Gentoo over SSH on a live system
State of the anti-spam regime, July 2006 edition
Wiring a house with ethernet: Success
Linux driver question
USB Hard Drives
New URL for GLUE - Groups of Linux Users Everywhere
Copyright Notice
Nullmodem
The all new Ubuntu.... Did I say something wrong?
I will nead a little help or more!!!!!!
Kernel tweaking
Invisible Read!!
Port Linux on DSP
Which process wrote that line into syslog?
Which process wrote that line into syslog? [2]

Piet Programming

Thomas Adam (thomas at edulinux.homeunix.org)
Mon May 29 18:46:27 PDT 2006

Answered by: Kapil, Pedja, Thomas

Hello,

I was reading the esoteric hello-world page [1] when it mentioned a very obscure language called piet [2]. I have to say I have never heard of it until now --- little pictures to represent programs. And they're colourful. :D Go take a look, it's quite clever.

-- Thomas Adam

[1] http://en.wikipedia.org/wiki/Hello_world_program_in_esoteric_languages
[2] http://www.dangermouse.net/esoteric/piet.html

[Kapil] - How about "Ook!"? That would make all readers of Terry Pratchett happier.

[Thomas] - Nah -- I never did like him. This does look interesting however:

http://en.wikipedia.org/wiki/Chef_programming_language

:)

[Pedja] - If you'd like to see Web or e-mail as spoken by Chef, checkout Bork Bork Bork extension for Firefox/Thunderbird :)

http://www.snert.com/
https://addons.mozilla.org/firefox/507/


[TAG] Locking down a Linux box

Faber J. Fedor (faber at linuxnj.com)
Sun Jun 11 19:43:05 PDT 2006

Answered by: Francis, Kapil, Thomas

Hey Guys,

Well, maybe "locking down" isn't the right phrase but I'm not sure what the right phrase is which is why I'm stumped.

I want to boot up a Linux box, go into X and run my application, let's call it FaberOffice, and run nothing else. Nada. Zip. FaberOffice is to be the only thing running and the only thing that can run.

KDE Kiosk immediately sprang to mind but then I thought it might be overkill. I'd have to turn off all the KDE applications and the KDE Desktop functionality (the latter being the reason to use KDE over straight WMs). I don't need a desktop, I need a window manager. And a light one at that (Ice is Nice :-).

But what am I trying to do with the WM? Am I replacing the root window with FaberOffice? Am I simply maximizing the FaberOffice window and disabling the window decorations and Alt-Tab? Or... :-?

I know what I want to do; I just don't know what I need to do.

Suggestions?

[Francis] - What's your understanding of how a typical Linux system boots?

The short version is init runs and does whatever it says in /etc/inittab, and that's pretty much it.

For an application to run, it is either explicitly listed in inittab, or started by a program which itself is listed there. Typically, a program called "rc" is run, which searches through a specific directory and runs all scripts found. Changing the contents of that directory is the usual way to adjust what runs on boot, without having to edit inittab.

So, pick your distribution, trace what it does (most things are shell scripts, handily enough), and decide where you want to change things.

As an example, I'll describe what I recall of a version of a redhat-derived distro where the intention was to do something similar to what you describe.

In that version, inittab included a call to "prefdm", which chose the right X display manager to use. It also checked whether any display manager at all should be used; the alternative being to run a script to do a no-interaction login. I'm not avoiding the specific filenames in order to be coy; I'm just not certain I remember them exactly, and anyway you should be able to read through the scripts on your distribution to find how it does the equivalent.

One file "autologin" was a flag to bypass the display manager.

Another file "autologin" was the script, and that contained something like

su - $user -c "$(which startx)"

with $user set to be the non-root user as whom the application should run.

Depending on the X configuration, "-- -nolisten tcp" may be a useful addition to the commandline.

"startx" above is also a script; you can read it to see what it does. The straightforward thing to do is to make a ~$user/.xinitrc file which runs all of the programs you want. When the last one exits so does the X invocation.

> KDE Kiosk immediately sprang to mind but then I thought it might be
> overkill. I'd have to turn off all the KDE applications and the KDE
> Desktop functionality (the latter being the reason to use KDE over

Yes, it's much easier not to turn things on than to go and turn them off afterwards.

> straight WMs). I don't need a desktop, I need a window manager. And a
> light one at that (Ice is Nice :-).

Are you sure you need a window manager? You may well -- if your application uses multiple windows, or if you really want two applications running, or want any of the facilities the window manager provides.

> But what am I trying to do with the WM?  Am I replacing the root window
> with FaberOffice? Am I simply maximizing the FaberOffice window and
> disabling the window decorations and Alt-Tab? Or... :-?

If you're disabling everything the WM provides, then you possibly don't need it. Depending on how much you control the environment, you may be able to configure the app to start full-screen and never show a second window.

> I know what I want to do; I just don't know what I need to do.

Forgive me if I'm talking down to you, but you should be very clear on what the application-startup procedure for your distribution is; and when it comes to X, you should be very clear on what facility is provided by X, what by a WM, and what by your application.

With that understanding, you may be able to phrase "what I want to do" in such a way as to make "what I need to do" obvious.

Within reason, you want to aim for the "ps -ef | wc -l" low score.

And "netstat -pantu" should have no lines you don't understand.

> Suggestions?

You may also want to consider options within the X config file like DontZoom, DontZap, and DontVTSwitch. They may limit confusion for the user at the cost of limiting convenience for the maintainer. Of course, depending on the user's experience, they may instead increase confusion for the user.

Using ~$user/anything as an important file may be unwise if the user is able to edit files. For single user, the system xinitrc might be more appropriate. You know your system better than I do :-)

Don't forget to allow your maintenance guy some means of getting at a shell prompt without using a screwdriver, if you think that's useful.

[Thomas] - On Mon, Jun 12, 2006 at 11:57:21AM +0100, Francis Daly wrote:

> What's your understanding of how a typical Linux system boots?

http://www.hantslug.org.uk/cgi-bin/wiki.pl?LinuxHints/RunLevels

> In that version, inittab included a call to "prefdm", which chose the

Yes, which is why Redhat suck. What happens, if for some reason the display manager couldn't launch? /etc/inittab would continually try and respawn it, hence the spurious error messages to the console one used to find. It's a stupid way of working.

> "startx" above is also a script; you can read it to see what it does.
> The straightforward thing to do is to make a ~$user/.xinitrc file
> which runs all of the programs you want. When the last one exits so
> does the X invocation.

I'd use ~/.xsession here, since startx will read ~/.xsession in lieu of ~/.xinitrc missing, and has the benefit that some other display managers honour it.

> Forgive me if I'm talking down to you, but you should be very clear
> on what the application-startup procedure for your distribution is;

I disagree -- and that was never the question asked. If Faber had wanted an autologin, he'd have asked for one. The question was about kiosks and restricting modes of applications.

[Francis] - On Mon, Jun 12, 2006 at 01:03:32PM +0100, Thomas Adam wrote:

> On Mon, Jun 12, 2006 at 11:57:21AM +0100, Francis Daly wrote:
> > In that version, inittab included a call to "prefdm", which chose the
> 
> Yes, which is why Redhat suck.  What happens, if for some reason the
> display manager couldn't launch?  /etc/inittab would continually try and
> respawn it, hence the spurious error messages to the console one used to
> find.  It's a stupid way of working.

Anything in inittab potentially has the same looping failure mode. I haven't examined the Redhat setup to see if it only adds that entry after it has confirmed that the display manager is currently working.

It's a choice Redhat made. Presumably they decided it was better for them than the alternatives. It does make an autologin setup straightforward :-)

> > The straightforward thing to do is to make a ~$user/.xinitrc file
> 
> I'd use ~/.xsession here, since startx will read ~/.xsession in lieu of
> ~/.xinitrc missing, and has the benefit that some other display managers
> honour it.

That'll work too. The admin can configure the system to work as seen fit.

> > Forgive me if I'm talking down to you, but you should be very clear
> > on what the application-startup procedure for your distribution is;
> 
> I disagree -- and that was never the question asked.  If Faber had wanted
> an autologin, he'd have asked for one.  The question was about kiosks and
> restricting modes of applications.

Fair enough, I was answering a different question to you.

You read the question as, approximately, "how do I limit the use of the user account". I read it as, approximately, "how do I limit the use of the system". Both are valid. Either or neither could match the original intention. Now the OP can pick an answer, or wait for more, or offer a fuller description of "what I want to do" that will allow us to point at more specific reading material.

To revert to the original problem description, on a straight reading the answer is a simple "so do that".

Do click the FaberOffice icon. Don't click the nethack icon. Don't type "xterm -e wump" into any "run" box. Don't hit control-alt-backspace and wonder what just happened.

But as that is reasonably obvious, and very silly, it probably isn't the answer wanted. And it probably isn't the question intended either.

One guess at the OP's intention is "how do I get *someone else* not to run any other application". Which becomes "tell them 'Do click etc.'". Which is also reasonably obvious, and also probably not wanted.

So there probably needs to be some degree of compulsion or encouragement in the set-up, depending on whether the user is considered hostile or inquisitive, or whether they can be relied on not to try to run anything unwanted.

[Thomas] - It's funny -- I am writing a Kiosk article using FVWM as we speak that's going to appear in LG. This may or may not happen now in lieu of this reply.

[ Thomas' post in this thread was indeed turned into an article, in LG 128. -- Kat ]

[Faber] - On 12/06/06 12:40 +0100, Thomas Adam wrote:

> It's funny -- I am writing a Kiosk article using FVWM as we speak that's
> going to appear in LG.  This may or may not happen now in lieu of this
> reply.

Based on what you've written here, I'm looking forward to the article.

> OK, so this application of yours (FaberOffice) is proprietary I assume,
> since most people would have given the full name of it.  No matter.

What? You think my ego isn't big enough for me to name applications after myself?! O, yea of little faith! :-)

> In the best case, what you'll probably want to do is something like the
> following:

[ lot's of good stuff elided ]

> I hope that at least gives you some ideas to what you can do.  I've
> rambled on a bit -- I hope it helps.

Dude! You did everything but install t for me! Thanks. I'll let you know how it turns out!

[Thomas] - On Mon, Jun 12, 2006 at 03:12:50PM -0400, Faber J. Fedor wrote:

> Based on what you've written here, I'm looking forward to the article.

I probably won't bother with it -- or if I do, it will only be very similar to what I have written here. So your question came as a means for me to write it anyway, it just means it's in the form of TAG, and not as an article.

> Dude! You did everything but install t for me! Thanks. I'll let you
> know how it turns out!

I'd apperciate that, since I hadn't actually tested any of what I had written. :)

[Kapil] -

  1. You want to run only one application and it's main window should run in fullscreen mode. That suggests "ratpoison".
  2. I presume you want the transient windows to emerge with focus in the centre of the screen. If your app has transient windows that don't behave well with WM_HINTS then you must exclude ratpoison. (For example GIMP and ratpoison do not get along).
  3. You want to disable all key-bindings. You might be able to configure or hack ratpoison to do that.
  4. Finally you want to disable the running of any other applications. This suggests that the path be restricted using "rbash" as the shell.
  5. Another possibility (to ratpoison) is "ion" with minimal features and modules loaded. You may be able to configure ICEWM or FVWM to do this as well.

[Thomas] - Eh? Then ratpoison can't be ICCCM compliant in that case -- or if it claims to be, then it's deluded. The ICCCM is quite clear about how transient window are to be handled. Note also that WM_HINTS (as an XAtom) has nothing to do with a window set as transient -- that's what the "WM_TRANSIENT_FOR(WINDOW)" XAtom details.

> 4. Finally you want to disable the running of any other applications .
> This suggests that the path be restricted using "rbash" as the shell .

Or, as my reply alludes to, just don't allow a terminal emulator to run at all.

[Kapil] - On Tue, 13 Jun 2006, Thomas Adam wrote:
> Eh?  Then ratpoison can't be ICCCM compliant in that case -- or if it
> claims to be, then it's deluded.  The ICCCM is quite clear about how
> transient window are to be handled.  Note also that WM_HINTS (as an XAtom)
> has nothing to do with a window set as transient -- that's what the
> "WM_TRANSIENT_FOR(WINDOW)" XAtom details.

Sorry. This is more a case of my mis-prepresentation of "ratpoison" rather than any faults of "ratpoison" per se. I used the term "WM_HINTS" without understanding it fully.

However, it is a fact that "ratpoison" and "gimp" do not get along---I do not know enough to assign blame to either or both.

> Or, as my reply alludes to, just don't allow a terminal emulator to run at
> all.

True enough.

I looked at your detailed reply to Faber and it looks far more complete than the half-baked stuff I wrote up. It is interesting to see that FVWM lives up to its promise of being the "one wm to bind them all".

[Thomas] - On Tue, Jun 13, 2006 at 04:20:28PM +0530, Kapil Hari Paranjape wrote:

> Sorry. This is more a case of my mis-prepresentation of "ratpoison"
> rather than any faults of "ratpoison" per se. I used the term
> "WM_HINTS" without understanding it fully.

That's OK. I won't bore you with the details of how it all works -- that's what the ICCCM attempts to do. :)

> However, it is a fact that "ratpoison" and "gimp" do not get along---I
> do not know enough to assign blame to either or both.

I actually fired up ratpoison in Xnest to see what all the fuss was about -- and it seems you're right. Although interestingly enough, I don't think this is rapoison's fault -- but GIMP's. For instance, the "Open" dialogue window isn't transient. It ought to be marked as such -- hence as far as the WM is concerned, it treats it as a normal window. This is probably where half of the issues lie. Because of the way ratpoison works, it likes to completely consume any pane it's started in, and this includes the toolbar which looks ugly in this way.

> I looked at your detailed reply to Faber and it looks far more
> complete than the half-baked stuff I wrote up. It is interesting to
> see that FVWM lives upto its promise of being the "one wm to bind them
> all".

I suppose. :)


Talkback links should CC article author

Thomas Adam (thomas at edulinux.homeunix.org)
Sun Jun 18 08:28:23 PDT 2006

Answered by: Ben

I missed when the talkback parts were setup, but it seems to me that when the links for it are generated that a CC field should be added to the respective author's email address as well. For those authors who do not provide an email address, or write under the guise of "Anon", then on CC should be used.

Currently, whilst the model in use is quite successful, it does leave any kudos for the author unnoticed, unless they're subscribed to TAG which is unlikely for most of the authors we have.

[Ben] - That would be one of the bits of Python coding that's been on my stack for a LONG time - but has also been put off for a long time. It wouldn't be all that hard, but it would expose the authors' email addresses to spambots, something that is not currently the case. Yes, there would be large benefits to doing this - for the moment, whenever there's a worthwhile Talkback followup, I forward it to the authorfrom my mailbox, screamingly manual and inefficient as it may be - but there's also a detrimental effect, as well as a need for someone to get their hands grimy in Python bowels.

This week's classes are done (*WHEW*... especially since we were doing double the usual, 8 hours of class/welding per day), which means that I'll have a little brainpower available before the systems crash at the end of the day. I'll give this a bit of thought while hoping that a Python-savvy person will volunteer to help with this - as an option, if nothing else - while I'm cogitating. :)

[Thomas] - Cc Mike Orr -- he's not subscribed to TAG at the moment, but I hear he knows a little about Python. ;)


Confusion about linux fonts

J. Bakshi (j.bakshi at icmail.net)
Wed Jun 28 01:54:21 PDT 2006

Answered by: Kapil

Dear list,

I am really confused that where does Linux actually search for fonts ? I have flipped through my system (debian - sarge) and get two locations

1] /etc/fonts and 2] /usr/share/fonts where the #1 location has three html files and #2 has fonts file.

I have true-type font server called xfs-xtt. below is the part of fonts from my XF86config-4

Section "Files"
        FontPath        "unix/:7100"         #local font server
        # if the local font server has problems, we can fall back on these
        FontPath        "/usr/lib/X11/fonts/misc"
        FontPath        "/usr/lib/X11/fonts/cyrillic"
        FontPath        "/usr/lib/X11/fonts/100dpi/:unscaled"
        FontPath        "/usr/lib/X11/fonts/75dpi/:unscaled"
        FontPath        "/usr/lib/X11/fonts/Type1"
        FontPath        "/usr/lib/X11/fonts/CID"
        FontPath        "/usr/lib/X11/fonts/Speedo"
        FontPath        "/usr/lib/X11/fonts/100dpi"
        FontPath        "/usr/lib/X11/fonts/75dpi"
EndSection

All these location have fonts file. Now when fonts are already there at /usr/share/fonts then why do we need those above mentioned font directories ?

there is also a hidden folder named .fonts in my home directory

*xset -q | grep font* shows

/usr/lib/X11/fonts/misc,/usr/lib/X11/fonts/100dpi/:unscaled,/usr/lib/X11/fonts/75dpi/:unscaled,
/usr/lib/X11/fonts/Type1,/usr/lib/X11/fonts/Speedo,/usr/lib/X11/fonts/100dpi,/usr/lib/X11/fonts/75dpi,
home/joy.fonts

so my hidden .fonts directory is been searched. but no where is /usr/share/fonts . so what is the utility of having /usr/share/fonts ?

I am really confused. Can any one solve? Thanks for your time and kindly CC: to me.

[Kapil] - Only two? Where fonts are searched for depends on which program is doing the searching and the device for which the fonts are intended. Hence there are X fonts, TeX fonts, GhostScript fonts, etc. which are further divided according to bitmaps or scalable (vector) fonts. Bitmap fonts are further divided according to resolution.

Hope this clarifies a bit.

[ Thomas' reply was published in LG 128 at Ben's suggestion. His original post and Ben's response have been elided. -- Kat ]


FVWM Kiosk - a different approach

stomfi (stomfi at bigpond.com)
Fri Jul 7 01:06:04 PDT 2006

Answered by: Ben

This is another way of achieving user lock down.

Computerbank Queensland (CBQ) in partnership with a Work for the Dole team implemented a Blue Care Computers for the Third Age project at Salvin Park in the last 3 months of 2003.

This is a short description of the project which uses oroborus as the kiosk window manager, but which could use FVWM in a similar fashion.

I was the coordinator of the project.

#################################################################
The users are aged and have limited faculties, although they can all
see. They could easily do the wrong thing and forget what they
learnt from day to day, so the system has to be relatively fool
proof, while still delivering the required applications in a simple
and straightforward manner.

CBQ donated equipment and built a special Linux system for this
project. The installed systems consisted of two separate facilities
in the park.

They are comprised of an Internet, and print server with a simple
masquerade iptables script to the Internet.

The servers use LPD for the printer and a dialup ppp connection.
The clients use cups and have the Internet nameservers in
/etc/resolv.conf.

One site has a server and three clients connected with a 10Mbps hub,
the other a server/client connected by cross over cable to the client.

The servers bring up ppp0 at boot, which stays up till the system is
closed down or rebooted.

The clients are configured with 4 users, being guestx, help, Internet and
letters.  Letters, Internet and help all use oroborus as the backgrounded
window manager which does nothing except display windows. There are no
active key sequences except CTRL-q which normally kills the window manager
but does nothing since it is running in the background. After backgrounding
the window manager the user .xinitrc file execs the required program, ie
gedit letters.txt, firebird help.html and firebird Internet.html.

The user is prompted to print typed letters as they are not saved
past the session.

Closing the program sends the user back to the GDM user login screen.

No files are saved on the clients and gconf and other configuration
files have their ownerships changed so that user session changes
cannot be saved and are forgotton with each new logon.

Guestx is a normal user with a few apps such as GIMP and abiword,
and some games.

None of the user login names have passwords, except guestx whose
password is guestx. This is for the "sophisticated" user just in
case they have one or two.

The Internet home page has pictorial links to news, email, search,
net games, and the off line help. Email is web mail only.

Each client has the mouse keypad accessibility option configured.

All systems run on RedHat which was chosen by the team as it was the
most consistent and robust and had a good selection of end user
apps. Due to performance issues on P1s, Gnome is the guestx window
manager and GDM the logon manager.

The servers are P2s. One is running RedHat 7.3 and the P1 clients
and server/client are running a cut down RedHat 8. RedHat kindly
donated the software to CBQ which we used for this project.

Many thanks go to the work for the dole team for learning the
necessary routines needed to complete this project on time and at a
minimal cost to Blue Care.
###################################################################

Hope this shows you a robust way of locking users out of a system.
Kind regards
Tom Russell

[Ben] - The project sounds interesting, Tom, but I'm a little unclear about some of the specifics as well as being curious about some of the results.

> #################################################################
> 
> CBQ donated equipment and built a special Linux system for this
> project. The installed systems consisted of two separate facilities
> in the park.
> 
> They are comprised of an Internet, and print server with a simple
> masquerade iptables script to the Internet.

I'm not clear on what you mean by the above. Do you mean that you had a workstation connected to the Internet via a NAT router, with 'iptables' used to set up the masquerading?

> The servers use LPD for the printer and a dialup ppp connection.
> The clients use cups and have the Internet nameservers in
> /etc/resolv.conf.
> 
> One site has a server and three clients connected with a 10Mbps hub,
> the other a server/client connected by cross over cable to the client.
> 
> The servers bring up ppp0 at boot, which stays up till the system is
> closed down or rebooted.

So, it sounds like you had two dialup lines. Was there a particular reason that the two kiosks were so different (3 workstations plus a server versus 1 workstation plus a server)?

> No files are saved on the clients and gconf and other configuration
> files have their ownerships changed so that user session changes
                              ^^^^^^^

I presume that means "changed to root" or something similar.

> cannot be saved and are forgotton with each new logon.
> 
> Guestx is a normal user with a few apps such as GIMP and abiword,
> and some games.
> 
> None of the user login names have passwords, except guestx whose
> password is guestx. This is for the "sophisticated" user just in
> case they have one or two.

Presumably, this has the window managed "running in the foreground", to use your terminology, so that the apps are accessible and can be individually started - or do you mean something else?

> All systems run on RedHat which was chosen by the team as it was the
> most consistent and robust and had a good selection of end user apps.
  ^^^^^^^^^^^^^^^^^^^^^^^^^^

That is, perhaps, an arguable point. :)

> Due to performance issues on P1s, Gnome is the guestx window
> manager and GDM the logon manager.

This seems odd as well. If I was concerned about performance issues, I would have gone with, say, FVWM or IceWM; Gnome is quite resource-intensive when compared with either of those (or many other simple WMs.)

> Many thanks go to the work for the dole team for learning the
> necessary routines needed to complete this project on time and at a
> minimal cost to Blue Care.
> ###################################################################

Indeed; it sounds quite the public-spirited project, and I'd be interested to hear more about it. Tom, I'd like to suggest that you write an article about this for LG - perhaps detailing the larger surrounding issues (i.e., how Work for the Dole and the Third Age project came up with this undertaking, how the project was coordinated, lessons learned, whether there are plans to do more of this kind of thing in the future, etc.) I believe that our readers would be very interested to hear about this; I certainly would like to see this myself.


How to convert RedHat9 to Gentoo over SSH on a live system

Suramya Tomar (security at suramya.com)
Thu Jul 20 10:34:06 PDT 2006

Answered by: Rick, Thomas

Found this HOWTO that describes how to take a stock RedHat9 system and convert it to Gentoo, remotely over ssh and while it is running.

http://www.darkridge.com/~jpr5/doc/rh-gentoo.html

Haven't tried it yet as I don't have access to a RedHat system that I can experiment on. But this sounds very interesting. I wonder if its possible to convert other Linux OS's to another flavor of linux(Like Debian maybe?)

[Thomas] - Google for 'debtakeover'.

[Rick] - Certainly.

http://twiki.iwethey.org/twiki/bin/view/Main/DebianChrootInstall
http://www.hadrons.org/~guillem/debian/debtakeover/
http://trilldev.sourceforge.net/files/remotedeb.html
http://www.starshine.org/SysadMoin/DebootstrapInstallation

[Suramya] - Thanks for a great set of links, Now I have something to waste time on over the week...


State of the anti-spam regime, July 2006 edition

Rick Moen (rick at linuxmafia.com)
Fri Jul 21 11:37:44 PDT 2006

Answered by: Ben, Martin

Quoting Benjamin A. Okopnik (ben at linuxgazette.net):

> That's how I figured you were using it, Rick. If you wanted money, you'd
> have said so - I have this feeling that you got over being all shy and
> retiring a while ago. :) I just thought that you might still be in the
> planning/buying stages.

The newer machine in question's still, to my way of thinking, pretty nice: Single-proc PIII/800 or so, VA Linux Systems 2230 2U rackmount chassis, Intel L440GX+ "Lancewood" motherboard, 1.5TB RAM, 2 x 73GB RAID1 pair (Linux software RAID) for the important filesystems, 16GB? boot drive,

That's pretty snazzy -- for 2001. ;->

All the filesystems are built, and it's loaded with Debian "etch" 4.0 and a rough cut of all the necessary software, not yet fully configured. Data files haven't yet been copied over (IIRC).

The last time I worked on it, I'd fetched a new Debian-packaged binary kernel 2.6.x and blithely removed the previous, believed-bootable installed 2.6.x kernel. And then rebooted, found that I'd just shot myself in the foot, lost patience / ran out of time, and quit for the day. I've not yet gotten back to it, and meantime other things have been keeping me away.

You know, it's also possible the old box has developed a bad spot of RAM, or something like that. Look at this kernel "oops" from /var/log/messages, which is typical of process blowouts, lately:

Jul 21 11:20:38 linuxmafia kernel:  <1>Unable to handle kernel NULL pointer dereference at virtual address 00000004
Jul 21 11:20:38 linuxmafia kernel:  printing eip:
Jul 21 11:20:38 linuxmafia kernel: c0153ca5
Jul 21 11:20:38 linuxmafia kernel: Oops: 0000
Jul 21 11:20:38 linuxmafia kernel: CPU:    0
Jul 21 11:20:38 linuxmafia kernel: EIP:    0010:[prune_icache+53/464] Not tainted
Jul 21 11:20:38 linuxmafia kernel: EFLAGS: 00210213
Jul 21 11:20:38 linuxmafia kernel: eax: 74756564   ebx: 00000000   ecx: 00000006
   edx: 00000004
Jul 21 11:20:38 linuxmafia kernel: esi: fffffff8   edi: 00000000   ebp: 00000383
   esp: c4e0dddc
Jul 21 11:20:38 linuxmafia kernel: ds: 0018   es: 0018   ss: 0018
Jul 21 11:20:38 linuxmafia kernel: Process exim4 (pid: 32387, stackpage=c4e0d000)
Jul 21 11:20:38 linuxmafia kernel: Stack: cb7b3e20 00000000 c4e0dde4 c4e0dde400
000009 c1046310 c025ecd8 00001951 
Jul 21 11:20:38 linuxmafia kernel:        c0153e64 00000383 c01353eb 00000006 00
0001d2 ffffffff 000001d2 00000009 
Jul 21 11:20:38 linuxmafia kernel:        0000001e 000001d2 c025ecd8 c025ecd8 c0
1357bd c4e0de50 000001d2 0000003c 
Jul 21 11:20:38 linuxmafia kernel: Call Trace: [shrink_icache_memory+36/64] [
shrink_cache+379/944] [shrink_caches+61/96]
[try_to_free_pages_zone+98/256] [locate_hd_struct+56/160]
Jul 21 11:20:38 linuxmafia kernel:   [balance_classzone+66/480] [__alloc_pages+3
76/640] [do_anonymous_page+92/256] [handle_mm_fault+119/256] [do_page_fault+456/
1337]
[e100:__insmod_e100_O/lib/modules/2.4.27-2-686/kernel/drivers/net+-687130/96]
Jul 21 11:20:38 linuxmafia kernel:   [process_timeout+0/80] [bh_action+34/64] [t
asklet_hi_action+70/112] [do_IRQ+154/160] [do_page_fault+0/1337] [error_code+52/
60]
Jul 21 11:20:38 linuxmafia kernel: 
Jul 21 11:20:38 linuxmafia kernel: Code: 8b 5b 04 8b 86 08 01 00 00 a8 38 0f 84 
1c 01 00 00 81 fb a8 

Off hand, I'm uncertain of the root cause.

[Martin] - On 21/07/2006 Rick Moen wrote:

> The newer machine in question's still, to my way of thinking, pretty 
> nice:
> Single-proc PIII/800 or so, VA Linux Systems 2230 2U rackmount 
> chassis,
> Intel L440GX+ "Lancewood" motherboard, 1.5TB RAM, 2 x 73GB RAID1 pair
> (Linux software RAID) for the important filesystems, 16GB? boot drive

Rick do you mean 1.5Gb memory?? ;) Just thinking that terabytes of memory is going a bit OTT... I also thought that P3's usually only go up something like 2Gb memory, not sure off hand though.

[Rick] - D'oh! Yes, I only wish I had 1.5 TB of RAM. I'm willing to accept that even in PC-100. Just leave it in a brown bag on my doorstep, please, and nobody need get hurt. ;->

Yes, the old box is back online. Absent-minded members of my household (myself certainly included) had closed up all the doors leading into the garage that is the temporary home of our servers. Today, my town had record high temperatures of 35 degrees (95, if using last millennium's Fahrenheit scale) -- which meant it was probably closer to 40 inside the sealed garage. And the machine simply was unhappy, that way.

There will be a fresh backup, soonish -- and I'll devote serious attention to the long-delayed hardware migration, -and- to creating the planned server-shelf space in the foundation crawlspace, under my house. For now, there's also an electric fan blowing additional air at the server.

For the record, anyway, if you see segfaults and kernel oopses, it may indicate a runaway heat problem. I didn't know that, before.

[Ben] - One of the laptops that I tested before buying the HP that I have as my backup machine ran ridiculously hot - I actually got to see the kernel spit out a "thermal shutdown" message and halt (I didn't realize it had such a goodie in it until it did that.) During the short time that I ran it - and I actually tried two different machines of the same make and model - a number of the sessions terminated either in a thermal cutout or a segfault.

In general, when I see a segfault that wasn't caused by a known factor (e.g., a just-compiled, highly experimental kernel), I immediately suspect either a) bad hardware or b) overtemp conditions. I suppose you could make the case that b) really resolves to a) - I've always considered memory to be analog hardware, anyway... It sorta works within parameters when the moon in in the right phase, but tends to wander outside of them whenever anything (like the price of pork bellies on the commodities market, or the percentage of carbon dioxide on Mars) changes.


Wiring a house with ethernet: Success

Jason Creighton (jcreigh at gmail.com)
Fri Aug 4 23:45:20 PDT 2006

Answered by: Bradley, Jason

Hi Gang,

Way back in October of last year, I asked TAG about how to go about pulling Ethernet in a new house. I pulled all the cable, punched down the ends and hoped for the best. We moved in early this year, but the only networking we needed right away was with two, physically adjacent computers, so I just used a crossover cable.

So the Ethernet built into the house was totally untested until last week when I needed to tie a computer at the other end of the house into the netowrk. Every single (okay, seven total, but still...) drop worked correctly on the first try. Maybe I just got lucky. :)

Anyway, I just wanted to report success and thank you guys for helping out. Chalk another one up for TAG.

[Bradley] - On 8/5/06, Jason Creighton <jcreigh at gmail.com> wrote:

> [...] Maybe I just got lucky. :)

Maybe, maybe not ;-)

I just finished wiring a house for Ethernet as well - the house was built with three UTP cables running to three of the bedrooms, as well as a single unused box in the kitchen. After wiring up the RJ45 jacks in the bedrooms, and pulling a new cable in the kitchen and punching an RJ45 jack onto that, I now have a decent Ethernet network as well.

What sort of terminal arrangement did you use? I used a double-gang switch box with a 12-port faceplate as a "patch panel" for all of my lines (for expandability).

[Jason] - On Sun, Aug 06, 2006 at 04:16:58PM -0400, Bradley Chapman wrote:

> What sort of terminal arrangement did you use? I used a double-gang
> switch box with a 12-port faceplate as a "patch panel" for all of my
> lines (for expandability).

Very similar to yours, actually. I'm don't have proper patch panel, just two single-gang[1] low-voltage wiring boxes coming in closest with a wall-mounted 8-port Netgear switch. Were I to do it again, I think I would:

  1. Do everything in the wiring closet in some sort of flush mounted wiring box. (I think this is called a "structured wiring box")
  2. For futureproofing and flexibility, pull two Cat6 drops to every room in the house, punch them both down to RJ45, and then use one for ethernet and one for phone.

[1] When you mentioned that you used a 2-gang box, I thought "why didn't I do that?" D'oh! (I hoping there was some actual reason, like not being able to find a faceplate for a 2-gang box, but I can't remember now.)


Linux driver question

J. Bakshi (j.bakshi at icmail.net)
Sat Aug 5 10:11:01 PDT 2006

Answered by: Karl-Heinz, Peter

Hi list,

hope everyone is well here :-)

I have a question about linux usb driver. Is there any driver which allows attaching a serial device to usb port under linux ? My old PI PC has a serial port so I don't have problem to use JDM programmer (PIC Programmer). But my new mother board of AMD doesn't have any COM port. Hence I am looking such an arrangement that I can still use serial devices by a USB-to-Serial adapter. But does linux has any driver/feature to support this ?

thanks in advanced.

[Karl-Heinz] - "J. Bakshi" <j.bakshi at icmail.net> wrote:

> I have a question about linux usb driver. Is there any driver which allows 
> attaching a serial device to usb port under linux ? 
> [...]
> But does linux has any driver/feature to support this ?

seems so....

/lib/modules/2.6.5-7.111-default/kernel/drivers/usb/serial> ls
belkin_sa.ko        empeg.ko        io_ti.ko   keyspan.ko      kobil_sct.ko pl2303.ko       visor.ko
cyberjack.ko        ftdi_sio.ko     ipaq.ko    keyspan_pda.ko  mct_u232.ko safe_serial.ko  whiteheat.ko
digi_acceleport.ko  io_edgeport.ko  ir-usb.ko  kl5kusb105.ko   omninet.ko usbserial.ko

but from that list I would assume there are usb-to-serial adapter which are better supported then others -- but maybe safe_serial and usbserial are a lowest common featureset support?

[Peter] - The "Edgeport" boxes from http://www.ionetworks.com/ seem to enjoy good Linux support from the "io_edgeport" and "io_ti" drivers:
http://www.kroah.com/linux/usb/edgeport/

Disclaimer: I haven't tried this myself (yet). Instead I'm using an old "Annex" box from Bay Networks http://www.ofb.net/~jheiss/annex/ to access serial ports over plain old TCP/IP. Not sure if this would be sufficient for your purpose, though.


USB Hard Drives

Bob van der Poel (bvdp at xplornet.com)
Sat Aug 12 19:16:06 PDT 2006

Answered by: Ben, BobV, Faber, Lew

I'm thinking of getting a USB (external) hard drive to use for backup purposes. Is there anything to look out for on these, or should I just try to get the best dollar/meg deal? I'm thinking that something in the 150 to 150 meg size would be perfect.

I'm assuming that all (most?) of these drives will work just fine with Linux :)

Thanks.

[Faber] - On 12/08/06 19:16 -0700, Bob van der Poel wrote:

> I'm thinking of getting a USB (external) hard drive to use for backup 
> purposes. 

I bought a USB enclosure for ~50 USD for a HD I already owned and I recently bought a Maxtor 300G model for ~200 USD. I have had no problems with either. I use them as everyday storage; the former for /home and the latter for my mp3s, VMware files and media files.

> Is there anything to look out for on these, or should I just 
> try to get the best dollar/meg deal? I'm thinking that something in
the 
> 150 to 150 meg size would be perfect.

My rule of thumb on buying anything: buy the most expensive thing you can afford, but don't buy the cheapest thing on the market and don't buy the most expensive.

> I'm assuming that all (most?) of these drives will work just fine with 
> Linux :)

That's been my experience. One person emailed me about problems he was having with a USB enclosure and Linux. We believe his problem turned out to be the enclosure was just too cheap(ly made).

[Lew] - I wouldn't know about all or most of these drives, but I suspect that they will all work with Linux, especially a recent kernel.

A couple of weeks ago, I bought a Vantec NexStar:GX "USB 2.0 External 3.5 inch Hard Drive Enclosure", which I used with an old 60Gb HD that I had laying around. The drive was correctly recognized by a 2.4.29 kernel, and I had no problems using it at all.

I'm certain that once I upgrade the drive to a more modern 300Gb device, the NexStar:GX will make a good external backup medium.

[Ben] - I didn't want to just jump in with "metoo!", but - me too. After ages of making mental notes to do it Sometime Soon, I finally bought myself a USB-to-HD adapter and shuffled through my ancient hard drive collection (ye ghods... some of these things had capacities in the megabyte range. What year was that, 1920 or so?) Except for several drives that wouldn't even spin up, I was able to read all of them - and I had quite a variety.

Amusing note: I tend to be a pack rat when it comes to information, and when I get rid of a machine, I usually keep the HD. Then, at some point when I've completely forgotten what the hell those things contain, I work my way through them, copying off stuff that looks interesting and throwing away the now-ancient hardware. It has been my experience that whatever drive I'm using at the time is plenty and more than plenty to hold the contents of all the drives I'd saved until then.

[BobV] - Thanks for the comments on this, guys. I was at my not-so-local Staples earlier today and they had 250gig USB "My Book" drives on for $119 US. So, I got one. (I really do remember thinking that my first box of 5.25" floppy disks each holding about 160K would be enough storage for the REST OF MY LIFE).

It works just fine. Reformatted to ext2 FS and got rid of a bunch of windows files :)

Just copied my base file system over. 92,000 files; 24 gig in about 28 minutes. No screamer, but no slouch either. And, really, for backups who cares about speed.

Seems to be very little noise or heat on this. And it has an automatic 10 minute power-down built in. So, I'll just leave it plugged in.

I'll have to see if it will auto-mount, etc. But, I don't see any serious problems.


New URL for GLUE - Groups of Linux Users Everywhere

B. E. Irwin (beirwin at shaw.ca)
Sun Aug 13 17:19:10 PDT 2006

Answered by: Rick

I'm trying to track down where "GLUE - Groups of Linux Users Everywhere" lives now. As you can see from my email below, this valuable resource is no longer hosted by LinuxGazette.com. I recall you guys started a new LinuxGazette.net (and linked on my site, btw) a while back. I searched linuxgazette.net and could not find GLUE. Do you know where I might find it? Is it on linuxgazette.net and I missed it? A Google search turned up nothing.

Thanks for your help.

---------- Forwarded message ----------
Date: Fri, 04 Aug 2006 11:41:36 -0600
From: Keith Daniels &lt;[email protected]&gt;
To: Barbara E. Irwin &lt;[email protected]&gt;
Subject: Re: New URL for GLUE - Groups of Linux Users Everywhere =&gt; Was hosted
     here: http://newglue.linuxgazette.com/

Barbara E. Irwin wrote:
&gt; I am one of the contributors for the Loads of Linux Links project
&gt; (http://loll.sourceforge.net/linux/links/index.html).  We have a link to
&gt; GLUE - Groups of Linux Users Everywhere however, this url is no longer valid 
&gt; Is this link somewhere on Linux Journal?

No, Glue has been discontinued and will not be put back up.  Sorry about that 
but management decided to kill Glue, linuxgazette.com and linuxresources.com.

&gt; 
&gt; FYI, this is a GPL'd database of 5000+ bookmarks of important URLs about
&gt; Linux and the Open Source movement.  It was originally a project started for
&gt; the Victoria Linux Users' Group and is now hosted by SourceForge.
&gt; 
&gt; Thanks for any info. about your link!

I will tell the newsletter editor about your site and he might put it in one of 
the newsletters.


Keith Daniels
--
Webmaster
SSC Publications, Inc.
www.ssc.com

Publishers of:

Linux Journal - www.linuxjournal.com
TUX Magazine  - www.tuxmagazine.com
Doc's IT Garage - www.docsearls.com
A42 - www.A42.com

[Rick] - Greetings, Barbara!

I'd love to see GLUE resurrected[1], and we at Linux Gazette were every bit as surprised and dismayed to see it disappear (with apparently no advance notice or consultation with anyone) as you were. Heck, if I had the right software for it, I'd even host it on my home aDSL line, though it would require some effort to prevent it being overrun by comment spam.

[1] As opposed to being one of the ever-growing stable of dead Linux-community-Web-site URLs that now redirect to the commercial www.linuxjournal.com site. I'm struggling to be polite, here.


Copyright Notice

Mahesh Aravind (ra_mahesh at yahoo.com)
Tue Aug 15 22:02:32 PDT 2006

Answered by: Ben, Rick

--- Kristian Orlopp <kristianorlopp at web.de> wrote:

> The script http://linuxgazette.net/129/misc/mail/colors.sh may be a
> modification of the public domain (?) 
> http://www.linux-magazin.de/Artikel/ausgabe/1997/08/Tips/ls.html
> (Farbtest)
> 
> Even if that the script published in linux-gazette is simple I think a
> notice should refer to that page.

Kristian,

Thanks you very much for pointing out the (striking) similarity between the scripts. I think the link you pointed was created in Aug 1997.

I really haven't seen/copied that page (Hell, I can't even read German), but I assure you, that it wasn't my fault.

You can, if you want drop a line to Ben Okopnik and say to include a copyright notice or something. And I'll do that if I release a v2 of the script.

I wasn't aware of the page, and I haven't copied anything. Sorry if I hurt anyone's feelings...

Thank you once again for pointing out the link. I believe it was rather a coincidence.

-- Mahesh Aravind

[Ben] - Hi, Kristian -

I'm probably the last guy in the world to ignore a copyright violation or omit credit for an author - but Mahesh's code, other than the necessary and obvious common points, is clearly different from the code you cite. It's true that the output is much the same, but any script designed to display console colors - and that includes my own version in the Bash tutorial that I wrote, as well as the later versions that I wrote in reply to Mahesh's post in the LG Answer Gang - is going to have a similar-looking output; otherwise, it will have failed in its purpose.

If you take a look at the two scripts, even the programming structures are different - except where they both use the escape codes as detailed in the Bash-Prompt-HOWTO. If there's any credit that should be given, both scripts should be crediting that document. :)

Again, if I considered it a copyright infringement of any sort, or an omission of due credit, I would update our archives, even though - as I often have to explain to people - the change would not propagate to our mirror sites, since they've already downloaded the published issue. In this case, however, it's not a matter of credit or copyright, and making changes in published material without a powerful reason, especially when those changes will only be seen by a vanishingly small percentage of people, does not seem reasonable.

[Rick] - I concur. There's a common misconception that any similarity shows "copying", and that any "copying" constitutes copyright infringement. In fact, by law, copyright arises only in the "expressive elements" of a creative work, first of all. "Functional elements", e.g., portions of code that embody the obvious, or only, or required-for-compatibility way of doing things, are deemed to not be copyrightable at all.

That aside, if one has copied any substantive amount of something, it's simple good manners to give acknowledgement -- but that doesn't seem to apply here, either. (I should hasten to add that all this was implied by your wording; I'm just agreeing with you and making the point more explicit.)

[ Kristian's response is contained below in Ben's next post. --Kat ]

[Ben] - On Thu, Aug 17, 2006 at 12:47:10AM +0200, Kristian Orlopp wrote:

> Hi !
> 
> > If you take a look at the two scripts, even the programming structures
> > are different - except where they both use the escape codes as detailed
> > in the Bash-Prompt-HOWTO. If there's any credit that should be given,
> > both scripts should be crediting that document. :) > 
> Sorry, I did not want to be captious ;-)

[smile] No problem at all, Kris; I didn't take it that way.

> Oh yes, you are right, I studied both codes in a closer way.
> So I just learned the usage of a sequence in bash
> via your "for j in $(seq 40 47)"-construct.

Fun stuff, isn't it? There's a bunch of cool tiny utilities in the 'coreutils' (used to be 'shellutils') package that most people aren't even aware of; 'seq' is only one of them. Most of them make a shell programmer's life much, much easier.

> We in german say: "lots of ways lead to Rome" :)

We have the same thing in English - except we say "Alle Wege führen nach Rom". :) [*]

In fact, I figure that's what Alaric said back in 410AD...

> I am happy to read lots of scripting-examples, as I am not a programmer.
> Here I want to say "thank you" for your work at linux-gazette.
> I read it since 1999. Very good job !

Thank you! I blame The Answer Gang and our staff and authors. :)

[*]
[Rick] - A mere 401 years after Publius Quinctilius Varus said "D'oh!" (That was on the occasion of Augustus Caesar losing the ability to count beyond XVI.)


Nullmodem

(cssutto at attglobal.net) cssutto at attglobal.net
Wed Aug 16 14:29:11 PDT 2006

Answered by: Kapil, Rick, Thomas

Rick:

I looked at your recommendations and the list was long enough to be confusing.

Since I operate from a laptop, it looked like this one might be the right one for me.

nbSMTP (no-brainer SMTP) 

Is this OK or do you have a better suggestion?

[Kapil] - Hello,

On Wed, 16 Aug 2006, cssutto at attglobal.net wrote:

> Since I operate from a laptop, it looked like this one might be the
> right one for me.
> 
> nbSMTP (no-brainer SMTP) 
> 
> Is this OK or do you have a better suggestion?

Here is an alternative (which I probably learned from the LinuxMafia Knowledge Base) which I use:

1. Setup a local MTA (aka sendmail alternative) using any simple to
   configure MTA. Ensure that it is configured not to send mail
   to the internet. This is so the system can send you messages if
   it notices some configuration problems etc.

2. In your user account setup msmtp to send mail via different "accounts"
   depending on which network neighbourhood you find yourself in.

This works well for me.

[Rick] - Quoting Kapil Hari Paranjape (kapil at imsc.res.in):

> Here is an alternative (which I probably learned from the LinuxMafia
> Knowledge Base) which I use:

Heh, what do those guys know? ;->

> 1. Setup a local MTA (aka sendmail alternative) using any simple to
>    configure MTA. Ensure that it is configured not to send mail
>    to the internet. This is so the system can send you messages if
>    it notices some configuration problems etc.

Reminds me of something that I used to be confused about: I used to mistakenly believe that you needed an MTA daemon running in order to process purely-local mail such as system automated notices, logfile analysis, etc. It turns out you don't: Just having any MTA able to run as an on-demand mailer, only long enough to process the mail and then terminate, is more than good enough.

On Debian, this program by default is Exim. (On 2.2 "potato", it was Exim3; in recent versions, it's the Exim4 rewrite-from-scratch series.)

As is traditional on Unixes, this program is callable by the name "sendmail", even though it's not actually the sendmail program at all:

  # ls -l $(which sendmail)
  lrwxrwxrwx 1 root root 5 2006-04-22 08:47 /usr/sbin/sendmail -> exim4
  #

Non-sendmail MTAs such as Exim3, Exim4, Postfix, and Courier-MTA all honour all of sendmail's command-line options as well. Dan Bernstein's qmail of course honours only a subset of them, because this is the easygoing Dan we all know and love. ;->

[Rick] - Quoting cssutto at attglobal.net (cssutto at attglobal.net):

> I looked at your recommendations and the list was long enough to be
> confusing.

It's a bit of an exaggeration to call that recommendations. That page is just a compendium of descriptions of all known examples. I've never personally run any of them, and so can't really speak to choice of nullmailer.

> Since I operate from a laptop, it looked like this one might be the
> right one for me.
> 
> nbSMTP (no-brainer SMTP) 
> 
> Is this OK or do you have a better suggestion?

Looking from a distance, it looks as good as any of the others -- which is a fancy way of saying "I really don't know, but it's probably worth trying."

My history with MTAs is as follows:

1.  I started with sendmail, because I was young and foolish.  ;->

2.  I switched my personal mail machines from sendmail to Exim3
    because it was an emergency rebuild, and Exim3's what came by
    default in then-current releases of Debian, and was dead-easy 
    to configure.

3.  While working as chief sysadmin at $FIRM, a briefly famous 
    professional Linux support and services company in San Francisco 
    that shall go nameless ;-> , I was obliged to administer qmail, 
    and didn't enjoy the experience much.

4.  Upon another rushed rebuild under (again) emergency conditions,
    found myself with a mostly well-functioning Exim4 installation, 
    and tend to not fool with it much because it's a production mail
    system and the Cost Plus rule applies.  ("You break it; you buy it.")

None of those are nullmailers; they're all very full-service MTAs.

[Thomas] - I've always used 'nullmailer' per se ( http://lists.suse.com/archive/suse-linux-uk-schools/2005-Jan/0049.html ).


The all new Ubuntu.... Did I say something wrong?

clarjon1 (clarjon1 at gmail.com)
Wed Aug 23 11:22:59 PDT 2006

Answered by: David

Hey gang!

I'm going to keep this as short, and concise, as I can. Here goes:

I got some Ubuntu CDs yesterday! 10 Dapper Drake 6.06 LTS, with 2 sheets of 4 bumper stickers! Got them from shipit.ubuntu.com

Of course, I booted it up... :D They really improved the look and feel of it. Of course, I want to install it, right? I'm done with gaming for now, now that most of the games I play (emulators) work good under Linux natively or via Wine. So, I start the install program. For those who have used breezy badger, you would expect me to use the install CD. The Dapper drake allows you to install from the live cd -- Only one CD required to ship now. So, I started the install. It asked me for a language, my timezone, and then my keyboard layout. I thought to myself, This is so easy! Boy, did I let Murphy and his law come in there. After the correct (i.e default) keyboard layout was selected, i clicked next. And that's the end of the story. Not a single error message, the non-greyed out buttons and text input boxes are still allowing input. But on that window, anywhere other than the text input, is the little "i'm thinking" mouse pointer. But nothing happens. The CD isn't being read from, and the swap partition isn't showing any signs of being used. Any help would be appreciated.

Thanks!

[David] - "Over 300 post-release updates have been pre-applied, so that fewer updates will need to be downloaded after installation, and a number of bugs in the installation system have been corrected."

I suspect that you may have been bitten by one of the installer bugs. May I suggest trying a 6.06.1 install?

[clarjon1] - Yeah, I think I may have bitten one of the bugs alright. I've tried reinstalling with a different keyboard layout, and that seems to have worked.


I will nead a little help or more!!!!!!

Nico Teiu (nicoleta_teiu at hotmail.com)
Fri Aug 25 08:55:11 PDT 2006

[ Nico's initial post was embedded in a lot of html gobbledygook; I hope that the readership will forgive my taking the easy way out and starting the thread with Neil Youngman's initial response. As a note to future querents to TAG, please send your inquiries in plain text, not html. -- Kat ]

On or around Friday 25 August 2006 16:55, Nico Teiu reorganised a bunch of electrons to form the message:

> Hello!
>  
> i'm Nicoleta from Roumania!

Hi Nicoleta

I'm Neil from England

> I have a little problem, i wanted to make to partition from C and now it's
> not working anything, my laste operated system does not work, i dont know
> the bios password because i will want to reinstall my windows. 

I'm not sure what the BIOS password has to do with reinstalling Wind0ws, unless you're saying it won't boot from CDROM?

> I have a Acer Laptop Travel Mate 552tx.
>  
> I dont know what to do

OK. Start with a clear explanation.

Why were you trying to repartition your disk? Were you trying to install Linux?

What tools did you use to repartition the system?

What error messages do you get when booting the system?

Do you want to reinstall Wind0ws alongside Linux, or just restore it to the way it was?

> Please help me!!!

We'll try, but this is the Linux Gazette and reinstalling Wind0ws isn't our speciality. A Wind0ws support forum may prove more productive?

Neil Youngman

[ Nico additionally followed up with e-mail sent solely to Neil. Note to future querents: Please do ensure that your e-mail is sent to [email protected]! -- Kat ]

On or around Friday 25 August 2006 19:03, Nico Teiu reorganised a bunch of electrons to form the message:

Nico, please use "reply all" to keep emails going to the whole gang. You've got a better chance of getting the help you need if everyone's involved.

> I wanted to make another partition for instaling Linux

Good idea.

> Now i had format my pc with a boot diskete of Wind0ws 98 and i dont know
> how i can install the operating system. I had made the primary partition c,
> so i dont know how to do more from this point I have on a dvd a version of
> Red hat linux

That's still not very clear.

Are you saying you used Wind0ws 98's version of fdisk to repartition it?

Did you defragment the disk first first?

Did you reformat any partitions with the format command.

Did it have Wind0ws 98 on to start with or a different Wind0ws version?

It would have been better to use the tools on the Linux disk to do this. The Wind0ws tools aren't very good.

Assuming that C: took up the whole partition beforehand, or you know what the partitions were, it may be that putting the partition back to the way they were will allow you to boot the old system and start again. If there has been other data written to the partitions then that may not be possible.

> I want to install it because i want to learn linux
> you can help me, on this dvd i have the 3 images of red hat linux
> How can i install it?

Do you want to install just RedHat, or RedHat and Wind0ws? I can offer a little help, but the best thing to do is to read the installation. documentation on the RedHat site and follow the instructions carefully.

[ Again, I've used Neil's response rather than Nico's html-ful post to TAG. -- Kat ]

On or around Friday 25 August 2006 19:42, Nico Teiu reorganised a bunch of electrons to form the message:

> From:  Neil Youngman <ny at youngman.org.uk>
> >Are you saying you used Wind0ws 98's version of fdisk to repartition it?
>
> yes
>
> >Did you defragment the disk first first?
>
> i dont know
>
> >Did you reformat any partitions with the format command.
>
> i just fallow the steps

Without knowing what steps you followed, it's going to be hard to figure out what happened and how best to fix it.

'' > >Did it have Wind0ws 98 on to start with or a different Wind0ws version? > > I had window 2000 ''

You probably had an NTFS file system, which the Wind0ws 98 tools wouldn't recognise. That may complicate things.

'' > >It would have been better to use the tools on the Linux disk to do this. > > The Wind0ws tools aren't very good. > > i dont know how to get at the linux tools > > i can not boot from the cd because i dont have the bios password, and > floppy is put first there, on my laptopt is working anything in this moment ''

OK. It sounds as though you need to install Linux with a boot floppy. There are instructions at

https://www.redhat.com/docs/manuals/linux/RHL-9-Manual/install-guide/s1-steps-install-cdrom.html#S2-STEPS-MAKE-DISKS

> >Do you want to install just RedHat, or RedHat and Wind0ws? I can offer a
> >little help, but the best thing to do is to read the installation
> >documentation on the RedHat site and follow the instructions carefully.
>
> Yes i will want to insall both, but i will need more Linux

I'm afraid you'll need to ask someone else for help with reinstalling Wind0ws.


Kernel tweaking

Benjamin A. Okopnik (ben at linuxgazette.net)
Mon Aug 28 08:46:41 PDT 2006

Answered by: Pedro

----- Forwarded message from Joris Lambrecht <jl_post at telenet.be> -----

Hello tag,

As i'm a Debian user for some years and feel at home with this distro i recently took on the challenge of moving to the testing distro (etch). Thus reviving a desktop pc wich i had barely used for about 6 months.

Admittedly, i took the gorilla approach but hey, it works. And looks to be one of the better released the Debian team and community is about to deliver (Dec-2006)

But of course, but the kernelimage 2.6.16-2-k7 is preventing the proprietary nvidia module to load, using m-a to rebuild it from scratch failed on a 'rivafb enabled' message. After reinstalling some previously removed versions of gcc (sigh) that message (rivafb ...) dissapeared but still the module wouldn't compile.

So i tried to figure out how to disable this part of the kernel without rebooting. Since i couldn't find any related information to this matter i figured this is not possible, still somewhere in my memory the idea persists.

As such i'd like to ask you people for a final opinion. Can a root-user disable certain (external or compiled-in) modules in a running kernel, or in this kernel at boottime ?

And, should you be able to spare the resources, why the plunk won't my nvidia kernel module compile correctly without giving any other reason than 'could not be built'

Best of Regards,

Joris

[Pedro] - Hi Joris,

Talking about modules, if what you are trying to achieve is preventing a module from loading automatically at boot time, I think there are at least two things you may do:

1) Add a file under /etc/modprobe.d (or modify an existing one), with the following content:

blacklist <your module name here>

If I have understood correctly, this line prevents the automatic loading of a module based on its internal alias list. However, the module may still be manually loaded with a "modprobe <module name>".

2) Again, under /etc/modprobe.d, put the following line in a file:

install <your module name here> /bin/true

This effectively disables the module load.

Both keywords ("install" and "blacklist") are explained in more detail in the manual page of modprobe.conf.


Invisible Read!!

Vikas Mohan (vikas-m at chintech.org)
Wed Aug 30 15:16:48 PDT 2006

Answered by: Ben, Neil, Thomas

Sir,
iam trying to emulate a login session.

login:<some login>
NOTE----------------------->password:<the text typed here should not be visible> how to do this with shell script BASH.

Please this is my assignment and iam a student.

your's faithfully,
Vikas Mohan.

[Ben] - It is indeed your assignment, and you are a student.

That being the case, why aren't you *studying*? The idea of going to school is for you to gain knowledge; when you try to cheat instead of actually studying, you waste money, time, and effort - and that includes other people's as well as your own.

Go study, Vikas Mohan from chintech.org [1]. I hope that your professor reads Linux Gazette and gives you a poor grade in this class for attempting to cheat; perhaps that will turn out to be the most valuable part of your current education.

[1] Chinmaya Institute of Technology Govindagiri, Chala PO Thottada, Kannur 670007

[Neil] - Ben, maybe a small credit is due for honesty here? Or am I being too nice?

Vikas -

We don't do people's homework here, but we do occasionally point them in the right direction. May I suggest that you read the man page for bash, specifically the part relating to the "read" builtin command and it's options. (RTFM is good advice in these situations)

[Ben] - On Thu, Aug 31, 2006 at 01:28:55PM +0100, Neil Youngman wrote:

> Ben, maybe a small credit is due for honesty here? Or am I being too nice?

You are a nice guy, Neil; me, well, I'm afraid that I see no redeeming qualities in his request. Instead, I see either 1) a student going down a bad path, or 2) a proto-skript-kiddie who wants to learn how to fake a login session in order to steal login info. He clearly knows that it's all about Bash - but just as clearly, he hasn't troubled himself to look it up (a Net search with the keywords he used in his email is quite instructive, BTW.)

I'm always happy to help someone with a real question - students included - but basic stuff like this, where the OP just has to lift a hand? Nope, no credit from me.

There are a bunch of routes to getting the necessary information; heck, a net search alone turns up thousands of hits. My diagnosis is acute grade-chasing, severely exacerbated by a laziness infection.

[Thomas] - I assume you're referring to something like the following:

echo "Input secret: "
stty -echo
read -k key
[ "$key" == "$SomethingElse" ] && stty echo

Have fun.

[Ben] - Actually, I think it was more like

echo "Be vewy vewy quiet - I'm hunting GWADES!"
stty --turn_off_the_noise
read --please_please_keep_it_quiet dont_let_anyone_find_out
stty --not_crazy
echo "I didn't study..."|mail -s 'Please fail me!' professor at chintech.org

[Neil] - Just "read -s key" should do it, if I've read the man page correctly.


Port Linux on DSP

Neil Youngman (ny at youngman.org.uk)

[ My thanks again to Neil Youngman for his quoting messages otherwise unreadable to me. -- Kat ]

On or around Wednesday 30 August 2006 13:59, [,,,] reorganised a bunch of electrons to form the message:

> Dear sir:
>         I want port uclinux on freescale's 56800 DSP,can you give some
> advice on how to deal with it and please give some materials on that.
>                                                           Thank you!

That's a very specialised question and I doubt that this list has the expertise to answer it. I would suggest trying the ucLinux mailing lists, see http://www.uclinux.org/maillist/


Which process wrote that line into syslog?

Ville (v+tag at iki.fi)
Tue Jun 27 06:37:26 PDT 2006

Answered by: Ben, Thomas

Hi,

On an odd day, these began to pop into /var/log/messages:

                Jun 25 05:01:15 servername out of memory [19164^N\213^M at out]
                Jun 25 05:49:06 servername out of memory [25038^N\213^M at out]
                Jun 25 07:01:53 servername out of memory [10600^N\213^M at out]
                Jun 25 07:51:05 servername out of memory [16145^N\213^M at out]
                Jun 25 09:05:53 servername out of memory [24702^N\213^M at out]
                Jun 25 09:56:24 servername out of memory [30349^N\213^M at out]
                Jun 25 11:13:14 servername out of memory [7752^N\213^M at out ]
                Jun 25 12:05:04 servername out of memory [14101^N\213^M at out]
                Jun 25 13:23:53 servername out of memory [23758^N\213^M at out]
                Jun 25 14:17:00 servername out of memory [29815^N\213^M at out]
                Jun 25 15:37:52 servername out of memory [9325^N\213^M at out ]
                Jun 25 16:32:25 servername out of memory [16081^N\213^M at out]
                ....

(where 'servername' is the hostname of the server.)

Notice the absense of colon (':').

[Ben] -

That's pretty odd. It looks like a hand-crufted message sent by "logger" or something similar, not an actual system report (which makes me very, very suspicious of where it may have come from.) In fact, an error message from the kernel that dealt with this kind of issue would look more like this:

Jun 25 05:01:15 localhost kernel: oom-killer: gfp_mask=0x1d6

[Ville] -

Exactly. I've seen those pesky kernel oom-rambo messages more than I wanted. This was not it.

I had a couple of "usual" suspects, a closed-source UPS monitoring program which is not exactly robustly coded and a closed-source virus scanner. I haven't been able to conclusively link them to this problem, though.

[Ben] -

That is, the kernel knows what to do about "out of memory" conditions; it's not just going to tell you and wait for you to do something about it. :)

[Ville] -

Well, I've found that out the hard way and several other ways. It's just too easy to have a oom problem. The kernel used to be even worse-behaved, nowadays it seems to sometimes axe the actual culprit process, not just innocent bystanders (like in early 2.4 days.) The overcommit setting in /proc also helps.

No cronjob (my usual suspect) seemed to fit the bill. Google didn't give an easy answer.

The first number could have been the PID, but then again, it could also be pure junk. It was probably a shortlived process, or at least it probably kept /dev/log open only for a short while.

This one almost^W drove me nuts. Which process and executable was littering my syslog? Was there a real emergancy somewhere? It was almost like receiving bottle mail - pretty hard to answer...

[Ben] -

Heh. Yeah, it's sorta like trying to troubleshoot intermittent problems in electronics. There's more than one e-tech in a padded room due to those.

[Ville] -

I've done some programming, and it tends to happen there as well. The most tenacious ones only happen in once a month, with a 3GB data set, in another country, and when you have it in reproducible state just waiting for a remote debugger to get set up, the machine must be booted...

I did:

        [1] grep -rsHU "out of memory \[" /usr/{sbin,bin,local/bin,local/sbin}

==> "out of memory \[" only matched one binary which I was able to rule out. "out of memory " matched 93 files. I couldn't find "@out".

        [2] strace -p $(pidof syslogd) -o /root/logi

and

            tail -f /root/logi | grep "out of memory" | grep writev | while
            read line; do
                  date >> /root/oom-trace
                  fuser -uv /dev/log >> /root/oom-trace
                  POSPID=$(echo $line|sed 's,.*y \[,,; s,\^N.*,,')
                  ps $POSPID >> /root/oom-trace
            done

==> Does not work, since the second | causes the pipeline to be so much buffered that the payload (fuser et al) didn't even trigger at the same minute (required several lines of input to get triggered).

[Thomas] -

Of course that won't work. You want --line-buffered to grep (if you're version is GNU and supports it) or you can use the 'unbuffer' expect program, or if you had used awk (which would have greatly reduced your entire ugly pipeline above), that has the fflush() system call.

[Ville] -

Hmm, how's awk any better than perl I used in the second version (which did get rid of the buffer problem)? I anticipated there was something like --line-buffered in existence, but, it was quicker to redo it with perl that I knew was going to work.

[Thomas] -

YMMV on this. Choice is such a wonderful thing.

[Ville] -

The first option got ugly and lengthy incrementally as I added more filters. Doesn't that ever happen to you? I never meant it to be pretty, I just wanted to solve the problem...

        [3] strace -p $(pidof syslogd) -o /root/logi
              and
            tail -f /root/logi |
            perl -nle '
              next unless /^writev.*out of memory \[(\d+)/;
              print `date;
              $pospid = $1;
              print "Possible pid = $pospid";
              print `echo \$\$`;
              print `fuser -uv /dev/log`;
              print `ps le $pospid`;
            ' | tee /root/oom-trace

==> This one triggered, but as I suspected, fuser was executed too late (the process had already closed /dev/log) and ps likewise (and there's no evidence that $pospid was really the pid.). [4] Encouraged by http://www.linux.com/howtos/Secure-Programs-HOWTO/sockets.shtml I hacked the following

--- syslogd.c~  Tue Jun 27 09:38:03 2006
+++ syslogd.c   Tue Jun 27 10:06:02 2006
@@ -1104,8 +1104,33 @@ int main(argc, argv)
 #ifdef SYSLOG_UNIXAF
                for (i = 0; i < nfunix; i++) {
                    if ((fd = funix[i]) != -1 && FD_ISSET(fd, &readfds)) {
+                       struct ucred cr;
+                       int cl=sizeof(cr);
+                       int ret;
+                       
+                       ret = getsockopt(fd, SOL_SOCKET, SO_PEERCRED, &cr, &cl);
+                       
                        memset(line, '\0', sizeof(line));
                        i = recv(fd, line, MAXLINE - 2, 0);
+                       
+                       dprintf("ret=%i  Peer's pid=%d, uid=%d, gid=%d\n",
+                               ret, cr.pid, cr.uid, cr.gid);
+                       if (ret == 0 && strstr(line, "out of memory"))
+                       {
+                            char tmp[1024];
+                            int t;
+                            snprintf(tmp, sizeof(tmp), "fuser -vu /dev/log >> /root/syslog.log");
+                            system(tmp);
+                            snprintf(tmp, sizeof(tmp), "ps le %d >> /root/syslog.log", cr.pid);
+                            system(tmp);
+                            memset(tmp, 0, sizeof(tmp));
+                            t = snprintf(tmp, sizeof(tmp),
+                                "ret=%d   Peer's pid=%d, uid=%d, gid=%d\n",
+                                ret, cr.pid, cr.uid, cr.gid);
+                             printchopped(LocalHostName, tmp, t + 1, fd);
+                       }
+                                     
+                                     
                        dprintf("Message from UNIX socket: #%d\n", fd);
                        if (i > 0) {
                                line[i] = line[i+1] = '\0';

into sysklogd-1.4.1.

I thought this was easier than writing an unix domain reader/writer proxy to read /dev/log and feed real syslogd.

==> That one triggered, but I only got

                ret=0   Peer's pid=0, uid=-1, gid=-1

and naturally, fuser & ps showed nothing interesting. This also happened when I tested the hack with

                initlog -s "out of memory [foo"

Later, the messages stopped appearing.

Some notes:

  - This was a productiong server => the most reckless stunts were out of question
  - 2.4.32-rc1 kernel => hence no dnotify/inotify goodness (nevermind dprobes)
  - getsockopt(SO_PEERCRED) was introduced in 2.2, so that one WAS supposed to 
    work

[Ben] -

Well, SO_PEERCRED depends on the socket being created by socketpair() (see 'man 7 socket'); I'm not willing to dig through the syslog code to find out if that's the case, but it should be easy enough to hack up a short test prog to see if it works or not.

[Ville] -

Sure. The syslogd end just does

  sunx.sun_family = AF_UNIX;
  strcpy(sunx.sun_path, "/dev/log");
  fd = socket(AF_UNIX, SOCK_DGRAM, 0);
  bind(fd, (struct sockaddr*) &sunx, sizeof(sunx.sun_family)+strlen(sunx.sun_path));

Somewhat like http://publib.boulder.ibm.com/infocenter/iseries/v5r4/index.jsp?topic=/rzab6/uafunix.htm -- no socketpair().

The problem of course is that I don't know what the client does - because I don't know who the client is in this case...

[Ben] -

[Nod] I suspect that you're out of luck with that approach, then.

[Ville] -

  - sysklogd-1.4.1-14.legacy.7x (the source code I hacked was sysklogd-1.4.1-19 or 
    so from Debian archive (the first I was able to find)
  - syslogd was NOT accepting crap from network (verified several times, and supported
    by strace - the message came from the /dev/log unix domain socket.)

The odds are, I'm never going to find out where the heck they came from. I'm still rather curious how a real linux admin is supposed solve this sort of thing... I know Solaris has dtrace, but unfortunately I couldn't transfer the problem onto Solaris.

[Ben] -

I suppose you could try attaching 'strace' to the syslog process, if you prefer that approach.

[Ville] -

If you mean "strace -o $(pidof syslogd)" I did that in alternatives [1] and [2]. The only thing I found out was that someone writes the message to /dev/log and syslogd picks it up from there.

[Ben] -

Whoops! Sorry, I don't know how I managed to miss that.

[Thomas] -

Looking at your initial log from /var/log/messages (and assuming that was an accurate verbatim copy, as opposed to having your editor or some other program mangle it)

[Ville] -

I did change the server name to protect the innocent, but nothing else.

[Thomas] -

I would have said it was some weird error message from a program that the perhaps logger(1) obligingly shunted there -- this could explain the lack of the rigid structure (c.f. no colon) you noticed.

[Ville] -

logger(1) and initlog(1) both seem to add ':' there, but I did not try all the command line options.

At this point the alternatives I can think of are

  - writing a kernel module to overwrite connect() syscall with logging variant
  - modifying glibc to do the same (provided the culprit is not statically linked)
  - instrumenting the kernel by some other means (perhaps vai /dev/kmem)

[Thomas] -

Overkill and completely unnecessary.

[Ville] -

I do realize that (if nothing else) :).

[Thomas] -

If you can't reliably reproduce it, it can be much of a problem.

[Ville] -

What ever happened to the pioneer spirit of finding out the solution just for the sake of it? ;-)

[Thomas] -

Depends how much time you have on your hands, I suppose. The output is indeed not similar to any output I would expect a standard process to generate -- so either it's a faulty process, or more likely some rogue process was running on your system.

[Ville] -

... which is why I initially got curious.

What I meant to ask from TAG is, in more general terms, how does one find out which process/executable responsible for an odd syslog line. I didn't mean to ask TAG to solve all my problems, sorry if it came out that way.

[Thomas] -

"With difficulty" is the answer. PIDs change each time a process is spawned, and any kind of persistent data one observes within any of the log files is a snapshot in time usually.

[Ben] -

The question is, a snapshot of what time? Ville is definitely trying the right thing, since part of the SO_PEERCRED definition is "The returned credentials are those that were in effect at the time of the call to connect(2) or socketpair(2)" - not at the time of the call to 'getsockopt'.

[Ville] -

Trying, perhpas, but probably not having enough knowledge to do so :) (See my other message.)

[Thomas] -

Since this output is dubious at best, and points to something which is non-standard (it has to be since it's not easily identifiable) then you have two choices:

- Sit at your computer endlessly rotating logfiles to see if it happens again. (Comes with a health-warning though).

[Ville] -

Hmm, I find myself sitting on the front of computer anyway, and come to think of it I'm not sure about my health either. But again, that's a separate question.

[Thomas] -

- Use a reporting tool that monitors your /var/log/messages file for any occurances of a regexp such that it also generates other data and emails you the report.

[Ben] -

Too slow, as his experience so far has shown - and would tell him nothing useful, since the message itself does not contain a pointer back to the process. As is indeed the case, sometimes.

[Ville] -

Actually, I've been using logcheck and LogWatch. No matter how delirious this question might sound, it didn't exactly come to me as a revelation in a premonition dream...

LogWatch does generate other data, and logcheck is mostly a filter. Both of these scan the log files periodically - woefully late to dig any further information of the then-gone mysterious process.

There might be other tools that scan the log file continuously, but I think they still suffer from the latency I faced in my try [2] in the original problem report - you just can't execute ps(1) or anything quite fast enough, the process is already gone then. Remember that both ps and fuser scan /proc on linux - not exactly lightning fast.

[Thomas] -

I'd go with the second choice. :) I can't remember off the top of my head the names of programs which do that, but they do exist (that's some homework for you to do). Whether the reporting mechanism suffers from any latency in terms of the message appearing in /var/log/messages, and any subsequent data you might reply on thereafter (such as a 'ps' snapshot) is unclear, you'd have to see.

[Ville] -

Well, I'll have a look, but I do fear that's a dead end.

I think the answer (if any) would have to be something that reliably gets the information when the process is still connected to /dev/log. That's exactly what I tried to achieve by hacking syslogd to save the PID of the process that's at the other end of the /dev/log unix domain socket.

[Thomas] -

Going down the root-kit avenue is probably the better option still, even if you do consider it another question in its own right.

Note that you might want to install some root-kit devices to be sure it's not some h4x0r.

[Ville] -

Thanks, I did consider that (but that's a separate question...)

[Thomas] -

How do you mean? It's something you should look into.

[Ville] -

I mean "I am naturally looking into that, but that is a separate question."

None of these seem even remotely feasible (expect for, perhaps, the glibc alternative.)

Please, hit me with a cluestick!

[Ben] -

Ville, I don't think that such a cluestick exists; you've definitely got a Very Large Clue, and have done a bunch of right things in pursuit of that elusive beast. You may, in fact, win the "most clued querent ever" award in TAG - [ ... ]

[Ville] -

... the crucial things to realize of course is that - the MOST clued querents never get to ask TAG, because they already have to problem solved - the SECONDS MOST clued ones know where to stop and don't bang their heads against the wall endlessly.

[Ben] -

Sure; my point was that, out of the querents we get, we're most likely to see a) the completely lost, b) the moderately clued looking for "the next step", and c) highly clued but with a complex, subtle problem. a) is fine, b) is interesting, and we don't get enough of it, and c) can be frustrating by its nature but the search to find the answer is usually fascinating (where it's not so application/situation/querent specific that it's of no help to anyone other than the querent.) Your question manages to pass that hurdle without even ticking the top bar - I see the answer to it as something that would be very useful to admins and other system people everywhere.

[Ben] -

[ ... ] with the oak cluster and the maple leaf. It's just that, given your bug's intermittent (and now completely absent) nature, there's nothing left to trace.

[Ville] -

Exactly. That's the problem.

But when you've already walked such a long way, and the rainbow vanishes and there's no longer a gold pot to hunt -- the only thing you can do is to look around and try to learn. That's what I was trying to do here. The actual problem might never show up, but perhaps I'll be just a little more better-prepared for the next one. I already got to know about grep --line-buffered (although it is not present in grep-2.4.2-5 I have on the server), getsockopt(SO_PEERCRED) and its socketpair() limitation. And it got me thinking about Solaris dtrace which actually now does sound useful.

[Ben] -

I agree with you, and I appreciate that motivation highly - since I believe that this is how the best types of learning happen. Being able to fix the problem is very, very important - but gleaning knowledge of how to fix the category of that kind of problems is miles better than that, especially if it can be propagated to others.

[Ville] -

It did appears for about two days, perhaps once an hour on average.

The system was not that short of memory at that time (it's been much worse at times), so it might have had something to do with the data the program was chewing (for example email spam/virus scanner trying to bite a too big mail.)

[Ben] -

As to how I would go about tracing such a bug if it was present on my system - I think I'd note how often it occurred (often, hopefully), and start killing all non-essential processes to narrow down the list of what it could be. Next, I'd see if I could replace the essential processes with similar programs, one at a time, and look for the messages to disappear.

[Ville] -

The problem with an oldish production server is, that not all the programs can be replaced. Thankfully, this box doesn't run Oracle, but for example the UPS monitoring program has been kind of shaky, but that's the only thing that talks to this brand of UPSes. A few years ago I did trace an unexplained log message back to this very software.

I actually did something remotely like this. I weeded some spam from the mail queue and one large mail that was generated by a overly keen logger. That might have caused the problem to disappear, but then again it might be completely unrelated.

[Ben] -

You did mention that it's a production server... yeah, but what do you do with a production server if it's got (e.g.) a rootkit on it? The answer is the same as with any other system: you take it off-line (in the case of the server, hopefully by replacing it with a working machine) and fix it. That part of the scenario doesn't change regardless of how "critical" that machine is; the problem you're describing supersedes that critical need, since it implies a far more dangerous problem than the one on the surface.

[Ville] -

This is a very though call to make. Obviously, even given inifinite time for system administration you can't just format and redo a server each time something unexplained happens. Many times, I've traced unexplained events for hours, and eventually found a perfectly understandable (if not valid) reason for them. In case of security problems, there's usually also been a string clue, having looked hard enough. That doesn't mean there's a decisive clue in all cases, and that's when it's hard to decide what to do. Wipe out and reinstall? Forget about it?

[Ben] -

It depends on your security policy, of course. Most places don't care enough about it to do something like that; those that do set up systems that make the "wipe/reinstall" cycle a trivial, nearly-automatic procedure and don't consider it a problem to do so.

[Ville] -

I don't agree 100%. I still think it implies a possibility of a far more dangerous problem. This is about risk management. Obviously the magnitude of the threat here is very severe, but so is, say, crashing a car on a highway. If you hear strange sounds from a wheel, you might stop and investigate, but if you (and mechanics) can't spot the problem and it doesn't happens again, you might forget about it and go on instead of buying a new car.

[Ben] -

Not if you're driving a Formula1 car, though. :) At that point, swapping in a new steering system is essentially a "standard" task; the stakes are very high, the car is made for it, and "heck, I dunno" is not an acceptable answer.

[Ville] -

Now, the magnitude and the likelihood of the threat vary, but you still must draw the line somewhere. Even though the Right Thing to do would be to re-install.

(Or at least tell me why getsockopt(SO_PEERCRED) failed...)

[Ben] -

Write a test program and let us know. :)

[Ville] -

I'll try getsockopt(SO_PEERCRED) on socketpair() vs. socket()-bind()-listen() a la

http://publib.boulder.ibm.com/infocenter/iseries/v5r4/index.jsp?topic=/rzab6/uafunix.htm
http://publib.boulder.ibm.com/infocenter/iseries/v5r4/index.jsp?topic=/rzab6/uafunix.htm

and let you know.

[Ben] -

Better yet, if you discover a bug, let the developers know.

[Ville] -

I'm sure it's in my code - this was my first getsockopt(SO_PEERCRED) hack, afterall. Although there have been bugs: http://www.redhat.com/docs/manuals/enterprise/RHEL-3-Manual/release-notes/as-s390/RELEASE-NOTES-U1-s390-en.html

[Thomas] -

Most likely it was syslogd trying to use that call to ascertain the user/process who opened the logfile in the first place.

[Ville] -

Umh, come again?

I couldn't find getsockopt(... SO_PEERCRED ...) anywhere in the sysklogd source in the first place [1] -- that's the very reason I added it there.

Thank you for your insights.

[1] Most likely because getsockopt(SO_PEERCRED) is linux only, and sysklogd is older that getsockopt(SO_PEERCRED) in linux kernel)

[ Some time passes... ]

[Ville] -

Adding the http://iki.fi/v/tmp/syslogd-peercred.patch snippet to the (small and well-commented) server example from above:

      sd2 = accept(sd, NULL, NULL);
      if (sd2 < 0)
      {  
         perror("accept() failed");
         break;
      }
      
+      {   
+          struct ucred cr;
+          int cl=sizeof(cr);
+          int ret;
+          
+          ret = getsockopt(sd, SOL_SOCKET, SO_PEERCRED, &cr, &cl);
+          printf("ret=%i  Peer's pid=%d, uid=%d, gid=%d\n",
+                  ret, cr.pid, cr.uid, cr.gid);
+          ret = getsockopt(sd2, SOL_SOCKET, SO_PEERCRED, &cr, &cl);
+          printf("ret=%i  Peer's pid=%d, uid=%d, gid=%d\n",
+                  ret, cr.pid, cr.uid, cr.gid);
+      }


% ./server& ./client     
Ready for client connect().
[3] 12836
ret=0  Peer's pid=12836, uid=1414, gid=100
ret=0  Peer's pid=12837, uid=1414, gid=100
250 bytes of data were received

so it works with socket()-bind()-listen()-accept(), not just with socketpair(). (To be completely frank, I found socket(7) and unix(7) a tad vague about these things.) The output does reveal that you need to getsockopt() that for the accepted connetion.

But syslogd.c doesn't do accept(), it just select()'s and then recv()'s.

I believe the crucial difference is explained in an syslogd.c comment

    * Changed: unixm is gone, since we now use datagram unix sockets.
    * Hence we recv() from unix sockets directly (rather than
    * first accept()ing connections on them), so there's no need 
    * for separate book-keeping.  --okir

That's probably why it doesn't work.

More precisely, the boulder.ibm.com example does

  sd = socket(AF_UNIX, SOCK_STREAM, 0);
whereas syslogd.c does
  socket(AF_UNIX, SOCK_DGRAM, 0);
[Ville] -
On Wed, Jun 28, 2006 at 10:53:45AM +0300, you [Ville]  wrote:
> 
> Yes .../lib/ likewise. I actually tried to grep more directories, but I
> concluded it couldn't 100% solve the problem, since "out of memory" string
> occurs in so many places, and the rest might just as well be a random
> argument to sprintf("%s").

But looking closer, I noticed f-prot (one of my usual prime suspects) has " [%s]" just above "out of memory" in strings(1) output.

One of the first things I tried was in fact sth like

 cat > /usr/local/f-prot/f-prot.WRAP <<END 
 #!/bin/sh

 TMP=/var/tmp/fprot.last.$$
 strace -o $TMP /usr/local/f-prot/f-prot $*

 if grep -q "out of memory" $TMP; then
   free | mutt -a $TMP -s "f-prot out of memory" root
 fi

 rm $TMP
 END

 mv /usr/local/f-prot/f-prot /usr/local/f-prot/f-prot.REAL
 ln -s /usr/local/f-prot/f-prot.WRAP /usr/local/f-prot/f-prot

I trust you realize why that was not a great idea (hint: that's when I last had an appointment with kernel oom killer...) Eventually I got it right of course, but it didn't trigger soon, so took it off.

(I didn't list this step in the original message, because it was just a shot in the dark, and the results were not so great.)

Trying to make f-prot run out of memory _on purpose_ turns out to be surprisingly difficult. Without purpose, it really has not been a problem in the past. Larger .ZIPs have caused it to wake up kernel oom rambo, which in turn has killed a lot of innocent daemons. Now, no matter what I try, I can't seem to get it run oom.

[Ben] -

Well, if they have somehow managed to handle the "huge ZIP" problem, great for them - but, just in case, see the mime-encoded chunk below (it's a 238-byte long result of double-bzipping 1 terabyte of nulls.) That tends to break most AV filters, so I'm not sending it as an attachment; you can always decode it and send it to yourself, though. :)

---------------------------------------------------------------------
--vkogqOf2sHV7VnPd
Content-Type: application/octet-stream
Content-Disposition: attachment; filename="1TB_of_nulls.bz2.bz2"
Content-Transfer-Encoding: base64

QlpoOTFBWSZTWcaRHXYC7Jt//2LxQgjDAWCkcQIIMMBAQABEEUSAYCEACFAAAAC0ADABcABg
NGQ0GEA0A00AAJqqoADQDQA0AYj0JoYDRkNBhANANNAAYFKIWwUohYilELZ65SiFyaZSiF5b
m6UohYpSiF+ylELblKIXjm7tXMUohYYSlELeKUQtHEUohaJSiFkx5ceEpRC9SlELNzylEL6K
UQvwpRC9ilELKUohfZSiFl4ZSiF8ZsMxSiF8ylELRrwlKIWfPyylELjlKIWnThKUQsmnzKUQ
tWjXKUQtf9KUQtf+LuSKcKEhjSI67A==

--vkogqOf2sHV7VnPd--
---------------------------------------------------------------------

[Ville] -

Good idea. I recall ridiculing f-prot with such bombs a few years a go, when they got some publicity. Back then, f-prot failed miserably (had a meeting with the kernel oom rambo). But a few years a go, f-prot had trouble even with 'normal' large .zip's. Now I'm surprised to see how small foot print it has chewing large archives. They must have done something, although I admit I had lost hope.

To be frank, I was unable to base64 decode your attachment. I did fed it to 'base64 -d', which said

 base64 1.2
 Copyright 2004, 2005 Simon Josefsson.
 Base64 comes with NO WARRANTY, to the extent permitted by law.
 You may redistribute copies of Base64 under the terms of the GNU
 General Public License.  For more information about these matters,
 see the file named COPYING.
 BZh91AY&SY?v??b???0?@@DD^P?0p^Base64: invalid input

The starting (BZ) is at least right, but bzip2 says:

 base64 -q -d < oo |bzip2 -d > /dev/null
 base64: invalid input

 bzip2: Compressed file ends unexpectedly;
         perhaps it is corrupted?  *Possible* reason follows.

Anyway, I did:

  f=foo
  mkdir $f
  for i in $(seq 1 10); do 
       touch $f/$f$i; 
       perl -e 'truncate "'$f/$f$i'", 1024**2'; 
  done 
  cp ~test-virus.gz foo/foo.gz

  for j in bar zot urf goo zik; do 
      mkdir $j
      zip -9 $j/$j.zip $f/$f* 
      for i in $(seq 2 10); do ln $j/$j.zip $j/$j$i.zip; done
      f=$j
  done

(test-virus.gz contains the standard Eicar test virus in gzipped form.)

That should contain 100GB of zero and 10000 virii, if I counted right.

Surprisingly enough, it found the test virii:

  test2/zik/zik.zip->goo/goo5.zip->urf/urf8.zip->zot/zot4.zip->bar/bar9.zip->foo/foo.gz->test-virus
  Infection: EICAR_Test_File
  (...)

  f-prot zik/zik.zip | grep -c Infection:
  10000

and only consumed a tad over 6000k while doing so;

12296 test      20   0  6044 6044   400 R    39.7  0.6   0:09 f-prot
                        ^^^^

Since adding more zero seemed to add more cpu time that memory use, I first went ballistic with the recursion, making it 20 branches wide and 30 levels deep.

  f=foo
  mkdir $f
  for i in $(seq 1 10); do
       touch $f/$f-$i;     
       perl -e 'truncate "'$f/$f-$i'", 1024';
  done
  cp ~/test-virus.gz foo/foo.gz
  
  for j in $(seq 1 30); do
      mkdir $j                    
      zip -9 $j/$j-1.zip $f/$f*
      for i in $(seq 2 20); do ln $j/$j-1.zip $j/$j-$i.zip; done
      f=$j                                                   
  done
  zip -9 all.zip $f/$f*

Now, that must hurt... (Okay, 20^30 is far less than a googol, still "much".)

With the largest of these, it segfaulted, until I gave it 8m of memory. With 8m, It happily churned the all.zip (not all of it, since I only have finite time, but still):

  test2/all.zip->30/30-10.zip->29/29-10.zip->28/28-10.zip->27/27-10.zip->26/26-10->25/25-10.zip
  Infection: EICAR_Test_File
  (...)

  20534 test    18   0  7168 7168   412 R    64.5  0.7   5:33 f-prot

But limiting the memory caused it to segfault, not give an oom message.

So if it ever going to oom in a "normal" situation (with the normal ulimits), it's due to an odd bug, not a pathology in handling recursion.

Now, compare that to

   2393 haldaemo  16   0 72136 3364  636 S  0.0  0.7  14:54.79 hald                                                       
  23738 user      20   5 86908 9736 2504 R  0.3  1.9  14:27.96 xmms                                                       
  15806 user      15   0 30804 1604 1096 S  0.0  0.3  35:14.98 gnome-settings-daemon 
  15866 user      15   0 32168 4016 2820 S  0.7  0.8 744:28.97 gkrellm                                                    

And so on and so on. Not bad from f-prot, I'd say.

[Ben] -

You could also try setting 'ulimit' to squeeze f-prot down to a small footprint and see what that does.

[Ville] -

I've been trying to do that. As I said, with 'addressspace', 'memoryuse' and 'datasize' limited to 5m , f-prot fails to even load the virus db - if I up it to 6m, f-prot churns through almost everything.

[Ville] -

The process 'addressspace', 'memoryuse' and 'datasize' limits seem to have megabyte granularity and with 5m f-prot can't even read the virus db (fails with different message - I'm not surprised if it doesn't properly check every malloc()) and with 6m, it happily churns through everything I throw at it. And it can be just one allocation that triggers the log message, not every allocation that might fail in it. But I'm still trying. (Oh, the joy of having source code vs. a closed source program...)

Thank you for your insights! Much appreciated!


Which process wrote that line into syslog? [2]

Ville (v+tag at iki.fi)
Sat Jul 1 13:01:10 PDT 2006

Answered by: Ben

[ This thread resulted from an earlier one of the same name, but covers rather different ground - so it got an entry of its own. -- Ben ]

[Ben] -

Sure; my point was that, out of the querents we get, we're most likely to see a) the completely lost, b) the moderately clued looking for "the next step", and c) highly clued but with a complex, subtle problem.

a) is fine,

[Ville] -

I can imagine it is - especially, when the querent shows some respect to the ones who answer and willingness to learn, not just get rid of the problem.

[Ben] -

b) is interesting, and we don't get enough of it,

[Ville] -

An optimist would perhaps presume that this is because the linux system and documentation is in such a good shape that a clued person rarely runs into a dead-end. A pessimist might find other reasons...

[Ben] -

and c) can be frustrating by its nature but the search to find the answer is usually fascinating (where it's not so application/situation/querent specific that it's of no help to anyone other than the querent.) Your question manages to pass that hurdle without even ticking the top bar - I see the answer to it as something that would be very useful to admins and other system people everywhere.

[Ville] -

If you are in a dead-end, I think you can often (maybe not always) find a pattern or a generalization of the problem, if you step back and look it from distance.

In this case it is "how do I find which process wrote that line into syslog", which in turn divides into several other less and more general questions. While exploring the alternatives to get grip of the problem one surely find a set of general patterns that can be useful in situations other than the one at hand. If they are not familiar yet, looking closer to them might be a valuable - the next time you'll know their possibilities and limitations right from the start.

[Ben] -

I agree with you, and I appreciate that motivation highly - since I believe that this is how the best types of learning happen. Being able to fix the problem is very, very important - but gleaning knowledge of how to fix the category of that kind of problems is miles better than that, especially if it can be propagated to others.

[Ville] -

Yes, the 'category' metaphor describes quite accurately, what I tried to say.

[Ben] -

I assume you know about Perl's "$|" variable, then. If you don't, 'perldoc perlvar' will be highly enlightening. :)

[Ville] -

Actually, I do.

It was just that with the perl variation, buffering was not a problem, since there was only one pipe in the equation. In hindsight, it might have been, and if I had come to think of $|, I would have used it proactively. Good idea.

[Ben] -

I will say that Solaris tools for this kind of thing do come to my mind a bit more readily than anything similar in Linux. For one thing, in Solaris, you could always just enable BSM (be sure to have LOTS of disk capacity for logging, though!) and beat on that machine with every software hammer you've got until it does produce one of those warnings - then, examine that microsecond-by-microsecond log that BSM produces. You will definitely know who and what did X at Y time. I know that there's got to be something like that for Linux, since I recall hearing that Linux can pass the DoD "C2" certification - but I don't know what that app would be.

[Ville] -

BSM sounds fascinating, but perhaps overkill, in case Solaris DTrace is available...

http://www.sun.com/bigadmin/content/dtrace/
http://users.tpg.com.au/adsln4yb/dtrace.html
http://daemons.net/~matty/articles/solaris.dtracetopten.html
http://www.sun.com/software/solaris/howtoguides/dtracehowto.jsp

I think something like the following DTrace script:

      dtrace -n 'syscall::connect:entry 
                 / arg1->sun_path == "/dev/log" /
                 { printf("%s %s", execname, copyinstr(arg0)); }'

(Completely untested as I have no Solaris installation around)

would have solved the problem. (Add a time stamp and pid printing to that).

There something like that for linux; kprobes & systemtap:
http://sourceware.org/systemtap/
http://www.redhat.com/magazine/011sep05/features/systemtap/
(And Frysk, Oprofile and LTT).

Systemtap wasn't available for the problem server, though, too old kernel.

I gather is would have been something like

   stap -p2 -e 'probe kernel.function("sys_connect") { 
       log "connect /dev/log called: " . 
           execname() . string(pid()) . 
           " at " . string(gettimeofday_s()) }'

to solve the problem with systemtap.

I haven't yet been able to test that, since on the box I first installed systemtap, I couldn't get it working.

[Ben] -

It depends on your security policy, of course. Most places don't care enough about it to do something like that; those that do set up systems that make the "wipe/reinstall" cycle a trivial, nearly-automatic procedure and don't consider it a problem to do so.

[Ville] -

Yes, the cost of reinstall is one crucial variable.

[Ben] -

Not if you're driving a Formula1 car, though. :) At that point, swapping in a new steering system is essentially a "standard" task; the stakes are very high, the car is made for it, and "heck, I dunno" is not an acceptable answer.

[Ville] -

For Mercedes F1 team it seems to be ;) (I don't know if you follow the F1 series, but Finnish Kimi Räikköinen has been plagued with mechanical problems for the past few years...)

Anyway, I so agree with you on that.

One more thing I tried was the LD_PRELOAD trick:

--8<-----------------------------------------------------------------------
/* 
   gcc -shared -ldl -o libwrap_connect.so wrp.c
   LD_PRELOAD=/pth/to/libwrap_connect.so initlog -s "Test"
*/
#include <stdlib.h>
#include <stdio.h> 
#include <sys/types.h>
#include <sys/socket.h>
#include <sys/un.h>
#include <sys/stat.h>
#include <unistd.h>


static int same_inode(char* file1, char* file2)
{
    struct stat st1, st2;
    if (stat(file1, &st1) != 0 ||
        stat(file2, &st2))
        return 0;

    return st1.st_dev == st2.st_dev && 
           st1.st_ino == st2.st_ino;
}

static int read_file(char* file, char* buf, int sz_buf)
{
    int ret;
    FILE* f = fopen(file, "r");
    if (!f) return 0;
    ret = fread(buf, 1, sz_buf, f);
    fclose(f);
    
    return ret;
}
       
int connect(int sockfd, const struct sockaddr *sa, socklen_t addrlen)
{
    struct sockaddr_un* serv_addr = (struct sockaddr_un*)sa;
    if (serv_addr->sun_family == AF_UNIX &&
        same_inode(serv_addr->sun_path, "/dev/log"))
    {
        /* Note: should use file locking or at least sth like /tmp/LOG.<pid> 
           here */

        FILE* f = fopen("/tmp/LOG", "a");
        if (f)
        {
             int s;
             time_t t;
             char exename[PATH_MAX] = { 0 };
             char cmdline[512] = { 0 };
             readlink("/proc/self/exe", exename, sizeof(exename));
             s = read_file("/proc/self/cmdline", cmdline, sizeof(cmdline));
             for (--s; s > 0; s--) 
                 if (cmdline[s] == '\0')
                     cmdline[s] = ' ';

             time(&t);
             fprintf(f, "%s\t%s [%d] calling connect(\"/dev/log\")\n\t%s\n",
                     ctime(&t),
                     exename,
                     getpid(),
                     cmdline);
             fclose(f);
        }
    }

    /* N.B. Simply replacing <symbol> with __<symbol> to access the original 
            function doesn't always work. See the output of "nm -D <lib>" to
            check if __<symbol> is available.

            You could do something like

               typedef void (*connect_t)(int sockfd, const struct sockaddr *sa, socklen_t addrlen);
               static connect_t real_connect;
        
               if (!real_connect)
               {
                   real_connect = (connect_t)dlsym(RTLD_NEXT, "connect");
                   if (!real_memcpy) exit(EXIT_FAILURE);
               }

            to reach the original symbol more reliably. */

    return __connect(sockfd, sa, addrlen);
}
--8<-----------------------------------------------------------------------

I got

  Fri Jun 30 09:02:11 2006
          /sbin/initlog [10383] calling connect("/dev/log")
          initlog -s Test

That was actually surprisingly easy to do (for a quick and dirty hack, you could shorten the above a bit; same_file -> strcmp etc).

It gives you a lot of possibilities for debugging, but it also has downsides:

- doesn't work with statically linked executables
- hard to enable globally
- doesn't affect daemons that are already running

I did have a look at syscall wrapper kernel modules, but since 2.4 sys_call_table hasn't been exported, so that's no longer feasible (and was never encouraged.)

I also tried inotify-tools (http://rohanpm.net/inotify-tools), but inotify doesn't seem to note unix domain socket connect() as file access.

As a side-note, a friend of mine tested the viral test.zip I created to stress F-Prot with a handful of anti-virus programs and sent it to a couple of anti-virus vendors. While F-Prot actually did pretty well with it, not all anti-virus programs fared that well. According to him, at least Norman already fixed up their product somewhat :)

[Ben] -

On Thu, Jul 06, 2006 at 12:02:12PM +0300, Ville wrote:
> On Sat, Jul 01, 2006 at 11:01:10PM +0300, you [Ville] wrote:
> > 
> >    stap -p2 -e 'probe kernel.function("sys_connect") { 
> >        log "connect /dev/log called: " . 
> >            execname() . string(pid()) . 
> >            " at " . string(gettimeofday_s()) }'
> 
> I actually tried this on a Fedora 5 system, and systemtap seems pretty cool
> indeed.
> 
> It requires these:
>    % yum install systemtap kernel-devel
> And 'kprobes' enabled in kernel (Fedora and RHEL have that by default.)
> 
> and after that, you can do 
> 
>    % (sleep 10; mkdir test)&
>    [1] 9086                                         
>    % stap -v -e 'probe kernel.function("sys_mkdir") 
>                  { log("mkdir() called: "); 
>                    log(execname()); 
>                    log(string(pid()));
>                    log(string(gettimeofday_s())); }'

Actually, this is quite similar to the BSM config file - except for the 'log()' syntax. You'd just tell it what to log - reads, writes, etc. Thanks for writing this up, by the way: that's an end of Linux that I have never explored, myself, and I'm very, very chuffed to hear that there's good tools available for it.

> So I definetely could have solved the problem with system tap, if the kernel
> and distro had been new enough.
> 
> The only downside is this:
> 
>    % rpm -qi kernel-debuginfo
>    Name        : kernel-debuginfo             Relocations: (not relocatable)
>    Size        : 1730037086                       License: GPLv2
>                  ^^^^^^^^^^
>    % rpm -qi kernel-devel
>    Name        : kernel-devel                 Relocations: (not relocatable)
>    Size        : 13954129                         License: GPLv2
>                  ^^^^^^^^

Yikes. Well, that's generally the case with leaving all the debug info in an executable - and, of course, doing it with the kernel verges on the ridiculous.

r!calc 1730037086-13954129
1716082957

2GB versus 14MB, wow. Pretty impressive. Well, if you go hunting elephants, you definitely need a big-bore rifle...

[Ville] -

On Thu, Jul 06, 2006 at 10:48:27AM -0400, you [Benjamin A. Okopnik] wrote:
> >    % (sleep 10; mkdir test)&
> >    [1] 9086                                         
> >    % stap -v -e 'probe kernel.function("sys_mkdir") 
> >                  { log("mkdir() called: "); 
> >                    log(execname()); 
> >                    log(string(pid()));
> >                    log(string(gettimeofday_s())); }'
> 
> Actually, this is quite similar to the BSM config file - except for the
> 'log()' syntax. 

That's very similar to Solaris DTrace (which i gather is slightly newer?)

I haven't actually tried it (I have no access to Solaris), but I have drooled over several praising articles about it (see my earlier mail for links). It is definetely nice Linux is gaining something similar.

> You'd just tell it what to log - reads, writes, etc. Thanks for writing
> this up, by the way: that's an end of Linux that I have never explored,
> myself, and I'm very, very chuffed to hear that there's good tools
> available for it.

Great :) I thought I was not the only one who hadn't yet explored it. I won't paste the systemtap URLs again, since you probably spotted them in my earlier mail.

> >    % rpm -qi kernel-debuginfo
> >    Size        : 1730037086                       License: GPLv2
> >    % rpm -qi kernel-devel
> >    Size        : 13954129                         License: GPLv2
> 
> Yikes. Well, that's generally the case with leaving all the debug info
> in an executable - and, of course, doing it with the kernel verges on
> the ridiculous.
>
> 2GB versus 14MB, wow. Pretty impressive. 

But you need to install both for systemtap. :-P

> Well, if you go hunting elephants, you definitely need a big-bore rifle...

Sure, but it might feel like hunting for a fruit-fly riding a mammoth...

[Ben] -

On Thu, Jul 06, 2006 at 09:21:28PM +0300, Ville wrote:
> On Thu, Jul 06, 2006 at 10:48:27AM -0400, you [Benjamin A. Okopnik] wrote:
> > >    % (sleep 10; mkdir test)&
> > >    [1] 9086                                         
> > >    % stap -v -e 'probe kernel.function("sys_mkdir") 
> > >                  { log("mkdir() called: "); 
> > >                    log(execname()); 
> > >                    log(string(pid()));
> > >                    log(string(gettimeofday_s())); }'
> > 
> > Actually, this is quite similar to the BSM config file - except for the
> > 'log()' syntax. 
> 
> That's very similar to Solaris DTrace (which i gather is slightly newer?)

Well, BSM is actually a full-time "recorder" that tracks exactly who does what and at what time; it's one of the reasons that full-spec C2 systems are such a huge admin load. DTrace has similar capabilities, as I understand it, but is more of a specific-case troubleshooting tool.

> I haven't actually tried it (I have no access to Solaris), but I have
> drooled over several praising articles about it (see my earlier mail for
> links). It is definetely nice Linux is gaining something similar.

Yeah, DTrace was quite a leap in the state of the art when it first came out. Again, I don't usually deal with that end of administration myself, but I agree - it's wonderful to see similar tools made available for Linux.

> > 2GB versus 14MB, wow. Pretty impressive. 
> 
> But you need to install both for systemtap. :-P

[laugh] Well, disk space is cheap nowadays. I still recall the days when a 21MB hard drive cost over $200 - i.e., ~$10/MB. At those prices, the above would have been a problem indeed.

> > Well, if you go hunting elephants, you definitely need a big-bore rifle...
> 
> Sure, but it might feel like hunting for a fruit-fly riding a mammoth...

Ah - the "use a tiny-caliber rifle but DON'T miss" scenario. Very familiar. :)

[Ville] -

On Thu, Jul 06, 2006 at 09:21:28PM +0300, you [Ville] wrote:
>  
> > >    % rpm -qi kernel-debuginfo
> > >    Size        : 1730037086                       License: GPLv2
> > >    % rpm -qi kernel-devel
> > >    Size        : 13954129                         License: GPLv2
> > 
> > Yikes. Well, that's generally the case with leaving all the debug info
> > in an executable - and, of course, doing it with the kernel verges on
> > the ridiculous.
> 
> > 2GB versus 14MB, wow. Pretty impressive. 
> 
> But you need to install both for systemtap. :-P

Well, as it happens, it seems Roland McGrath, Dave Jones and the other Fedora kernel fellows have managed to slim down the elephant:

http://kernelslacker.livejournal.com/43037.html

--8<-----------------------------------------------------------------------
                 -debuginfo, now with 86% more awesome!   
Before:

-rwxr-xr-x 1 48 48 693332940 Jun 18 02:53  kernel-debuginfo-2.6.17-1.2136_FC5.i686.rpm

After:

-rwxr-xr-x 11 48 48 163280651 Jul 13 22:24 kernel-debuginfo-2.6.17-1.2396.fc6.i686.rpm
-rwxr-xr-x 11 48 48 27167808 Jul 13 22:07  kernel-debuginfo-common-2.6.17-1.2396.fc6.i686.rpm
-rwxr-xr-x 11 48 48 172196950 Jul 13 22:12 kernel-kdump-debuginfo-2.6.17-1.2396.fc6.i686.rpm
-rwxr-xr-x 11 48 48 163326708 Jul 13 22:18 kernel-PAE-debuginfo-2.6.17-1.2396.fc6.i686.rpm

We'd all be downloading a lot more bits if it wasn't for the efforts of
Roland McGrath on this one.
The amount of change in the kernel specfile isn't as much as I'd feared,
which was one reason I had procrastinated over this (besides constantly
seemed to find something more important to be tackling, like "my kernel
doesn't boot").

It may still need some slight tweaks, but it's getting there. Longterm,
hopefully we can bring down the size of the individual rpms further too.

[Ville] -

On Sun, Jul 16, 2006 at 09:26:31PM +0300, you [Ville] wrote:

> > But you need to install both for systemtap. :-P
> 
> Well, as it happens, it seems Roland McGrath, Dave Jones and the other
> Fedora kernel fellows have managed to slim down the elephant

No just that, but now there's a GUI / IDE for systemp as well:
http://stapgui.sourceforge.net/features.shtml

[Ben] -

On Fri, Jul 21, 2006 at 02:12:47PM +0300, Ville wrote:
> On Sun, Jul 16, 2006 at 09:26:31PM +0300, you [Ville] wrote:
> > > But you need to install both for systemtap. :-P
> > 
> > Well, as it happens, it seems Roland McGrath, Dave Jones and the other
> > Fedora kernel fellows have managed to slim down the elephant
> 
> No just that, but now there's a GUI / IDE for systemp as well:
> http://stapgui.sourceforge.net/features.shtml

I've just been offered an opportunity to be certified as a Solaris-10 "Operating System Internals" instructor (it would require my going to a week-long, open-skull/install-firehose type of training seminar in Boston.) A part of the training involves lots of heavy-duty work with 'mdb', 'kmdb', and 'dtrace'. I don't think I'm going to do it - it's not really my cuppa tea, even though there's theoretically a bunch of money in it - but having someone who teaches this course and knows Linux take a look at 'systemtap' would make for a very interesting comparison. I'm going to see if a quiet word in the right ear will result in anything publishable... at least eventually, since this class is still three weeks away.

Talkback: Discuss this article with The Answer Gang

Copyright © 2006, . Released under the Open Publication license unless otherwise noted in the body of the article. Linux Gazette is not produced, sponsored, or endorsed by its prior host, SSC, Inc.

Published in Issue 131 of Linux Gazette, October 2006

next -->
Tux