Platform: PC / Mac
Category: DIY
Difficulty: Beginner to Intermediate
By Malefico
Nearly everyone knows the heartbreak and frustration of trying to use their computer only to have problems arise, preventing you from getting work done or even worse, playing your favorite games. This article will help guide you through some basic troubleshooting steps and help you determine exactly what’s ailing your friendly neighborhood PC or Mac. Since I know far more about PCs than Macintosh products, I’ll just say that although software issues can manifest themselves differently, most hardware problems present the same symptoms.
First, let’s enumerate some of the more common computer illnesses:
-
POST (Power On Self Test)
-
Boot
-
Stop codes (Blue Screen Of Death)
-
OS (Operating System)
-
Application (all programs other than OS) includes malware and spyware
-
Hardware
-
Memory
-
Motherboard
-
Power supply
-
Processor
-
Storage (magnetic or solid state drives)
-
Video
-
Audio
-
POST issues usually present with a Beep code or Stop code and failure to boot to OS, but may lead to a simple BIOS (Basic Input/Output System) error message with beeps rather than a Stop code. When you power on a computer, the first thing it does is check all vital internal components to make sure they are functioning correctly. If any part fails the POST, your system will notify you of the failure, often with a series of Beeps and either try to boot again, or wait for user activity. In order to properly test POST issues, you can either use one of several test kits, like Passmark Software’s comprehensive suite that contains not only the diagnostic software, but burn-in and benchmarking applications as well as test equipment. This option will allow you to definitively identify any issue on any system, but it’s expensive- one of the reasons taking your system to a technician may be pricy- the shit you need to test computers ain’t cheap!
If you’re not willing to invest in such products, and most users aren’t with good reason, you can still test individual components, but you’ll need extra, functioning, compatible parts in order to reliably identify the problem. Many computer BIOS programs include Beep Codes that will tell you what’s wrong with your system. You’ll need to correctly identify the developer and version of your BIOS in order to interpret these codes. If you get a Stop code, snap a pic of the display or write down the code, for instance 0x0000000A. There are literally hundreds of Stop codes, so writing down the specific code is the first step in diagnosing the issue manually. This should give you some idea of the nature of the problem.
Boot issues occur after a system has passed POST, but before the OS has successfully loaded to the desktop. The most common cause is a corrupted or missing OS file that’s needed by your computer in order to boot, but may indicate impending hard drive failure. File corruption is usually caused by the installation of a program that overwrites a system file incorrectly. Missing files can be traced to the installation of poorly-coded programs or the deletion by the user of a shared file. Shared files are those that are used by a non-OS application but are part of the OS library of crucial files. Important note: Windows will notify you and ask for permission before deleting a shared file, in fact the newest versions simply skip these files when uninstalling existing programs.
In most cases, boot issues can be remedied by use of a System Repair Disc. When you install Windows Vista and newer versions, the Action Center will prompt you to set up the Backup feature. One of the other options in the Backup dialog window is to create a system repair disc. I highly recommend you do so. The files will fit on a standard CD-R or any larger storage media, and contain everything Windows needs to successfully boot to desktop. The repair disc will replace any missing or corrupted files during the repair process. If you are working on a new system, creating the disc has the ancillary benefit of testing the write function of your optical drive.
If the boot problems persist, one or more sectors on your hard drive may have failed. In this case you will usually be notified as described below in the hard drive section.
Stop Codes show up as the dreaded Blue Screen Of Death (BSOD) and can indicate hardware or software issues. As stated above, there are hundreds of stop codes and they are not always easy to interpret. However, some BSOD issues can be readily diagnosed as they coincide with the installation of a new hardware component or software application. If you install a new component and get BSOD, remove the part to insure the system then boots and runs normally. You can then concentrate on the new part, which is either faulty or incompatible with your system, or attempt to diagnose a related component. For instance, if you install new memory, your system may either go BSOD, fail to boot or recognize the new DIMM either incorrectly or not at all. In this situation, the cause may be faulty DIMM(s), faulty motherboard slots, or incompatible memory. Even something as simple as plugging in a new USB accessory can cause BSOD, in which case you have an incompatible or faulty accessory, a corrupt, missing or outdated driver, or other related issue.
If you install a new program and then get a stop code, the new application may be incompatible with your OS, or it may just be a case of faulty coding on the part of the developer (the shizzle don’t work). If the problem persists after restarting your system, the easiest way to resolve it is to enter Safe Mode (accessed by pressing a specific key during POST), uninstall the program and then visit the developer’s site to look for known issues, software patches or other updates. In the worst case scenario, you may have to just chalk it up as a learning experience since most software, once purchased is non-refundable.
Operating System issues can be caused by hardware or software incompatibility, or corrupt or missing OS files and often take the form of system freezes or crashes. If you recently installed a new component or application, start your search there. First, check to make sure the component drivers are in place. To do so, navigate to Control Panel > System > Hardware > Device Manager. Any system piece that is missing a driver will have a yellow warning icon on the component line. Especially if you are having trouble with a new system, this is usually because one or more drivers were not installed when the system was assembled. If you just added a new program, follow the directions above and go to the developer’s site for support.
Through numerous iterations, Windows has become pretty reliable at warning you against doing things that will harm your system. Windows UAC (User Account Control) was introduced with Vista and Server 2008 and will confirm your desire to allow new programs to open/run or be installed on your system. If you don’t recognize a program, disallow the installation. Unfortunately, in many cases malware attached to a known file type can slip through the security protocols so again, make sure you know where a file came from before you open it.
Although you now have to work at deleting system files accidentally, it’s much easier to delete important registry entries through the use of CCleaner or any of a number of “registry-cleaning” applications. The system registry is like a journal of all the elements of your system, OS, applications, web browsing history, etc., literally everything you’ve ever done o your computer. Although this is a source of debate among professionals, I think that these programs, while certainly not malicious, can result in user error by deleting registry entries necessary for a healthy system. They do no harm in and of themselves, but are not necessary and can cause problems as a result of their injudicious use. Luckily, the system repair disc can usually rescue a system that’s hobbled by missing registry entries but in some cases only reinstalling the OS will fix the problem.
Application errors can cause system slowdowns, freezes, or crashes. Common causes of application failure are: using the app with an incompatible OS, or the right OS but the wrong version. Certain programs may conflict with one another, for instance anti-virus software. If a newly installed app is slowing the system down, you can verify that the new program is the culprit by starting Task Manager (Ctrl + Alt + Delete). Then, click on the Processes tab and identify the suspect program’s process. In most cases this is intuitive, for instance AvastUI.exe (AVAST anti-virus) or Explorer.exe (Windows Explorer). Highlight the process and click the End Process button. Acknowledge the warning and end the process. Evaluate your system’s operation; if it returns to normal the new program is causing the instability.
A special section on Malware– Unfortunately, the world being what it is, some asswipes write programs or scripts with the intent to cause instability or damage to computer systems. Others write applications that hide inside your computer and transmit sensitive data to remote computers. Moreover, certain sites have stockpiles of malware so even those who lack the coding acumen to write their own malware (these folks are called script kiddies and have done more to proliferate malware than the actual authors themselves) can download ready-made mischief. As the complexity of viruses, other malware and spyware have increased, more and more programs and files have become entry points for malicious code. Although a good anti-virus program is your best defense, none of them can protect you if you’re not smart about where you go, and what you download and open on your system. Currently there are many file types that can include malware/spyware including but not limited to emails, text files, word-processing documents, spreadsheets, databases, music, photo and video files, and others. Aside from maintaining your defense applications, you have to follow some simple rules- don’t open anything that doesn’t come from a recognized source (and even be wary of odd emails from people you do know if the subject line is blank or unusual, or if the email contains attachments), don’t download “free” software like toolbars or browser addons which almost always includes at least tracking cookies and often more invasive code.
One of the best tools I’ve found for dealing with malicious programs is McAfee’s AVERT Stinger. It is not an anti-virus program and should not be used in place of one. Rather, Stinger is a standalone scanning tool that can be used to remove infections from PCs already afflicted with malware. It is free, and the definitions are updated frequently so check the page at least every few months, or whenever a particularly nasty bug breaks out. Another good, free program is Malwarebytes. This app can be used to detect spyware and other suspicious programs that, while not viruses, nonetheless may be affecting system performance or collecting info without your knowledge.
Virus symptoms vary, but often result in extreme slowdown as the program eats up processor and memory power by replicating its process until the system is overwhelmed and crashes, They can also present by generating pop-ups repeatedly, again until the entire screen is populated, and further until the system again crashes from the process congestion. There are others, but if your computer is suddenly acting up after a visit to a new website, opening an email or any unrecognized file, chances are you’ve unleashed a virus.
All “well-written” viruses have a few things in common. First, the virus will cut you off from help by disabling network adapters or interfering with the launch of web browsers, etc. This has the effect of rendering you unable to get programs that can help you rid your system of the infection. That’s why in addition to security software, a standalone scanner ready in advance is a must to make short work of these issues.
Next, the malicious software will perform its main task, which usually includes “spreading” through various system areas and causing unwanted functionality, or may be as severe as destroying hardware like storage, corrupting/deleting vital OS files and traveling beyond your system by accessing your email contacts and sending itself on to all your friends and family. If this happens, you’ll certainly be the toast of whatever gathering is upcoming.
Users can usually halt the spread of a virus by restarting in Safe Mode and running Stinger or another scanner. Perform a thorough scan including any removable media (flash cards or drives) and give the scanner time to run through everything. If any viruses are found Stinger will quarantine and delete them. In many cases, even after a short time the malicious program will have replicated itself into a number of system folders. If the problem is less severe, or just as a best practice one can run Malwarebytes periodically to eliminate less-dangerous software before it becomes a real issue.
If the virus has done other damage before you notice it, you may have to use your system reapir disc to replace OS files. If it has damaged or destroyed system hardware, the only ting you can do is bite the bullet and replace whatever is lost.
Hardware Issues
Basic rule of hardware failures is that the components that contain moving parts or generate substantial heat usually fail much more often than those that don’t. Parts that break most often are power supplies, magnetic storage devices, video cards, etc. Other components that can fail but don’t as frequently are processors (due to incorrect installation or overclocking), memory DIMMS and the motherboard itself (provided you take some precautions when working with it!).
Memory malfunctions are annoying, since the system may not boot if your RAM fails entirely. Luckily, this is very uncommon. Since RAM uses almost no power and generates little heat, RAM modules do not break very often without help from the owner. The best way to avoid memory issues is to check the board manufacturer’s QVL (Qualified Vendor List) and choose a brand/model of RAM that has already been tested with that specific board, and in general don’t try to save money buying Brand X modules. Some brands I have used and recommend include ADATA, Corsair, Crucial, G.Skill, Hynix, Kingston, Mushkin and Patriot. Look for a lifetime warranty on the memory, and again check compatibility before you buy.
When your system starts, pay close attention to the screens displayed before the OS boots. During POST, the system should detect the correct number of memory modules, as well as their size and speed information.
If you notice that your computer is not correctly identifying memory, you can begin your troubleshooting by restarting the system and entering BIOS (commonly called Setup). Nearly all BIOS allow you to focus on the system memory. Once you navigate to this function, you can often manually adjust variables like clock speed, timing, and voltage. In many cases, this will resolve the memory issue. As an example, on the old POS (when it was first assembled) the board was recognizing the correct amount of memory, but was listing it as PC 8500 or 1066MHz instead of the correct 1333MHz. I had to manually configure the memory in order for it to function at the correct speed. If your board will not allow you to configure RAM manually, it’s possible a BIOS update will remedy the situation, or you may have mistakenly ordered incompatible modules, or the QVL list is wrong. Important: Don’t mess with the memory variables unless you know the accepted baseline values for your RAM. If necessary, remove one of the modules and verify the base timing or visit the manufacturer’s support site before you attempt to configure RAM manually. It’s better to have memory that’s running, even if not at the correct speed, than to guess at timings and make the problem worse. For a more detailed discussion of RAM, see my article here.
If the system is not detecting the correct amount of memory, you may have a more serious problem on your hands. Failure to recognize the correct size of memory modules is often the result of using incompatible memory, or because of a faulty DIMM or motherboard slot. Remember that all the troubleshooting described below requires that you unplug your system and discharge any current before you remove or replace RAM. In any event, if the issue arises after you install additional memory, remove the new stick(s) and make sure your PC correctly identifies the RAM that was already there. If you are dealing with a new build, remove all the DIMMs except for the one installed in the slot your board needs populated to boot. To determine which slot that is, consult your mobo manual or the manufacturer’s support site. If at this point the RAM module is recognized correctly, switch out the DIMMS using the primary slot until you have tested all the RAM sticks. If you get a recognition error., sit that DIMM aside and continue until you have tested them all. If all the DIMMS test good in the primary slot, leave one in that slot and test the remaining modules in additional slots. By using trial and error, you can determine whether you have one or more bad DIMMS, bad mobo slots or incompatible memory.
Motherboard issues do crop up from time to time, whether through rough handling, improper grounding during board installation/system assembly, age, power supply failure or heat/moisture damage. The main thing to remember about a motherboard is that it has a huge number of circuits (half a billion or more), it carries a substantial current load, it is fairly delicate and easy to damage through rough handling, excess pressure while installing components or static electrical discharge, and it is always working when the system is on, much like the power supply and processor.
Some signs your motherboard may be dying is failure of peripheral ports- USB, audio, video, PS/2 (systems based on older architecture) or on-board slots like DIMM, PCI or PCI-E. This article ends with a description of the demise of the old POS and shows how I traced the problem to the board (pretty obvious if you are familiar with the symptoms).
To avoid board issues, buy a respected brand like ASUS, ASRock, Biostar, Foxconn, Gigabyte and MSI for example. Foxconn boards, though often snubbed by DIY folks, are the OEM providers of many Dell and HP system boards and should not be discounted as solid choices. Due to their relatively unknown/unpopular status with smaller builders, they usually have attractive prices compared to other brands that are more in-demand.
Next, take care to ground yourself when working inside your system, and unplug and discharge the current stored in the PSU before you work. Static electricity can fry the tiny circuits in a board easily, and you will not know it’s happened until your system fails to work.
Finally, make sure your board has good support when you install it, and use the standoffs (spacers) if required, including any washers that may be included with your case. When seating components, especially the RAM and video card, make sure you don’t try to force them into place. Putting excessive pressure on the board can kill it just as quickly as a spark.
Power supply issues are fairly common, and can manifest as a failure to power on, unexpected system shutdowns with or without Stop codes, POST failures, general stability issues, and video errors among other things. Since the power supply generates a fair amount of heat and the flow of electricity through its circuits degrades components like capacitors through normal use, you should always invest in a decent power supply. This is the first and one of the easiest ways to prevent power issues from ruining your day. Because the PSU carries all the current running through your system, it’s subject to what’s called capacitor aging over time. This means the components that store electricity in the PSU, called capacitors, will degrade and eventually fail as the unit gets older. You can take this into account Other conditions/events that can kill a PSU (or your whole system) are power surges, overheating, and using a PSU that’s not strong enough to support the needs of your system. Use a Power Supply Calculator to understand your systems’s needs based on component choice, and remember to include a capacitor aging factor of 25-30% so you get the longest life out of your PSU.
Like most components, it’s worth the extra money to get a quality power supply, since this part can kill everything else in your system when it fails. Quality power supplies include protective circuitry to avoid such a calamity, and they use less power to boot (no pun intended).When you are building or repairing a system, always look for PSUs that have an 80-Plus certification and active PFC (Power Factor Correction) circuitry. A lesser-known fact about power supplies is that the amperage rating, especially on the 12V rail(s) is just as important as voltage output. Stick with brands like Antec, Cooler Master, Corsair, EVGA, (some) Rosewill, Seasonic, Silverstone, Superflower, Thermaltake, to name a few. There are literally hundreds of brands of PSUs out there, and if you are unsure about your choice check the tiered list here. Chances are if the PSU you’re looking at doesn’t appear on this list, it’s not worth buying. It’s best to go no lower than tier 3, although if you just have to have the best stuff go with tier 1 or 2. Although you can buy a $12 Logisys PSU that claims a 480W rating, you should know that such supplies do not deliver the rated wattage, in fact most struggle to provide even half their rated capacity. In addition, they lack protective circuitry in all but the most basic form, simple buss fuses that won’t stop them from destroying your board, CPU, memory, video card, hard drives or all of the above when they fail.
If you suspect your power supply may be faulty, follow the directions here to test your PSU. You’ll need a digital multimeter in order to perform the tests. Pay careful attention to the ATX standards for voltage delivery as values outside the accepted norms are a sure sign you have PSU problems.
Processor failure is relatively uncommon, unless you practice overclocking without (or even with) proper cooling upgrades, etc. One of the most common failures occurs when a processor is incorrectly installed, resulting in broken or bent pins. Really, it’s pretty tough to screw up when installing a CPU, but it can happen if you’re not paying attention to what you’re doing. If you do notice bent pins, use the technique described here to attempt a repair. After prolonged (five years or more) use, a processor may succumb to circuit degradation, but generally they are pretty robust.
If the processor appears to be intact, you can test it by socketing it in a board you know works properly. Alternately, you can socket a known-good CPU in your board to see if it functions. By process of elimination you’ll soon know whether the CPU, board or both have failed.
Storage (hard drive) issues occur most often due to failure of the IDE (Internal Drive Electronics) circuitry, read/write head mechanism, or malfunctioning or dead sectors within the magnetic platter(s). Magnetic drives fail much more often than solid state drives. With the introduction of S.M.A.R.T. (Self Monitoring, Analysis and Reporting Tools), users no longer had to wait for a drive to malfunction before they knew their drive was failing. In the old POS, SMART could be turned on and off in BIOS, but most new boards utilize this feature by default. The goal of SMART is to tell a user that their storage media is on its way out so they can take steps to save data before it’s lost forever (or the user has to send the drive to a forensic data recovery company to get it back). Take heed of any SMART error message that appears during POST, and don’t assume you have a specific time period before your drive fails. Take steps to back up data as soon as you get the error message. You can try to “repair” your hard drive by following the procedure listed here.
Since actual hard drive repair requires special tools and skill, if the software repair doesn’t work, or you continue to get SMART errors you should simply replace the drive either by requesting warranty support or just buying another. If you have some extra cash, consider an enterprise rather than consumer drive as they are built to last much longer than standard hard drives.
Note that some issues which appear to be drive-related may instead be due to motherboard malfunction. Especially in the case of system freezes, your system may provide an indication of the source of the problem. If your system freezes, you may be able to determine the cause just by looking at your case. Check the HDD activity LED when the system freezes. If there is no hard drive activity, your problem may well be hardware-related, either the drive itself or the motherboard. If the indicator is either solid or flashing extremely rapidly, it’s likely the cause is an out-of-control system process or other software-related issue.
Video issues crop up most frequently in systems that use a discrete GPU. Because video cards use more power and generate the more heat than any other part, they are subject to heat damage over time or in systems with inadequate cooling. More powerful cards are also heavy, and should preferably be installed with a support bracket to alleviate excess pressure on the motherboard’s AGP/PCI-E slot. Cards of sufficient weight usually include them as accessories.
Video failure is usually accompanied by display artifacts- misplaced pixels, flickering or other display anomalies including sudden total display failure. The source is either the GPU or the display adapter card slot. To test which, replace the suspected card with one you know is functional and fire up the system. Install drivers as necessary and then check the display. If the problem goes away, your GPU is failing. If not, the board slot is on its way out.
If the card is bad, replace it with another that meets your power supply’s ability to support it, or invest in a new GPU and a PSU that will meet the increased power demands of the system. You can find a good Power Supply Calculator through the link above, and Newegg has one on their site as well.
Audio problems are very rare, and difficulties in this area are usually linked to corrupt, missing or outdated audio drivers or codecs. However, such problems can be indicative of something as simple as broken speakers or headphones, or as serious as impending motherboard failure. Also, if you are using a discrete audio card, that part may be the culprit. Follow a process of elimination if possible to determine whether the output device (speakers), board or audio card is faulty. In the vast majority of actual audio port failure (on-board hardware vs. output device or software-drivers or codecs), it’s an indication your board is dying.
Troubleshooting Example- System Freezes
Hardware
During the latter part of the POS’ lifespan, I was plagued by intermittent system freezes. Independent of the application I was using, the system would suddenly fail to respond to input. There was no hard drive activity indicated during these episodes, leading me to suspect either the drive or the motherboard. Moreover, I had SMART activated but I was getting no drive-related error reports, leading me to believe the board was dying a lingering death. Another symptom was the intermittent failure of USB ports. At times, my mouse would suddenly stop responding to input or my wireless adapter would lose its connection to the local network.
All of these things point to the motherboard. Thinking about it logically, what’s the one thing all these devices- hard drive, mouse, and wireless adapter have in common? They are all plugged directly into the system board.
When I assembled the new POS and was ready to start transferring data, I plugged the old POS back in and fired it up, ready to hook my portable drive into the system and move my data.
The first thing I noticed was that the RAM was not recognized correctly. The old POS had 4GB (or 4096MB) of RAM, but now the total amount was listed as 3840MB. Each of two total 2GB DIMMs is made up of eight 256MB chips, so the RAM amount could indicate one of the modules was itself faulty, since 4096 minus 3840 is 256. Again, though, the RAM is plugged directly into the system board…
Next, the system took an inordinate amount of time to list the available storage volumes, and then it only found one, a secondary drive devoted to backup.
Finally, after waiting several minutes I was rewarded with the following error message “NTLDR is missing”. NTLDR is the bootstrapper for Windows XP, the OS on the old POS. The old system was telling me that the primary drive (OS and most of the useful data) was no longer visible to the system despite the fact that it was only about half a year old.
Taken all together the symptoms led me to one conclusion; the ASRock board had given up the ghost.
I pulled one RAM module and tried to boot again. This time the system only recognized 1792MB of RAM (2048 – 256) and still wouldn’t boot. Power off, switch DIMMs and try to reboot. Still only 1792MB of RAM and no joy. Since in dual-channel mode the system was missing 256MB of RAM, and in single-channel mode using either slot the system was still short 256MB, I deduced the RAM was not the issue. This was confirmed when I put the two sticks into the new POS and it recognized all the RAM.
I took the old POS main drive out of the tower and plugged it into the new system and booted to XP just fine, no delays or SMART errors. So, the hard drive is perfectly good. I put the old POS drive back into that system and tried successive boots using all available SATA ports. Turns out there are still two functional ports on the old board.
This was conclusive proof that the old POS system board was the root of the problem. After some shuffling around of drives I was able to get all my data directly onto the new system without further incident.
Troubleshooting Example- System Freezes
Software
Not too long ago, I took a job diagnosing a system freeze problem on an HP laptop. The customer reported that the system was working fine until they tried to download and install a large driver package. They were unable to update the drivers and in addition, the computer now opened Windows Media Center and then froze whenever they tried to open and install the package.
This was actually a nifty little issue and fun to investigate. The first mistake the customer made was to assume the system could open and install the drivers. In that person’s defense, it wasn’t a stupid assumption. However, the drivers were in .rar form. RAR file compression is often used when files are large and downloading the raw form would take considerably more time, and RAR supports file checking so as to avoid file corruption. Unfortunately, Windows has no native feature to open this file format. Also, when the customer downloaded the file it was in a .zip folder. The user opened the folder using WinZip, a commonly available utility for zip files. When they tried to associate the .rar file with a program to open it, they searched for WinZip again thinking it would do the job. Somehow, they associated .rar files with Media Center instead. In addition to opening Media Center, the computer tried to open the .rar file repeatedly, duplicating the failed process over and over again until the system was clogged with the failed attempts.
One of the main ways I knew this was a software issue was that when the system slowed and eventually stopped, the HDD activity LED was gong crazy. This indicated the cause was probably due to a cascading failure in the system processes. When I checked the file associations, I immediately noticed the wrong program was associated with .rar files.
After that, it was a simple matter to download and install WinRAR and use it to open the compressed file and install the new drivers.
Problem solved.
The last and most overriding piece of advice I can give you is this. Pay attention to what’s gong on with your system and as soon as you notice a change for the worse, think about what you were doing just prior to the manifestation of the problem. It’s a sad but true fact that user error still accounts for most computer issues.
Holy crap this is a long article.
Still, I hope it helps you now or in the future. If you ever have any specific questions, please email me, malefico@butthole.nerdbacon.com or take your system to a reputable local technician for help. Enlisting the aid of a professional may well save you money in the long run.