Step by Step Guide on how to use xlrd

xlrd is a module written for Python to manipulate with the data in .xls files. This includes Microsoft Excel files. You can do a basic CRUD – (create, read, update, delete) operation directly on the excel files using this module.

A practical application would be to let’s say extract Mary’s address and postal code from an Excel sheet used to store information about your customers and then put those details automatically in your letter for delivery or perhaps store your own utility bills and minus it off from your savings in another sheet. The possibilities are abundant and it’s up to you how you choose to use it.

To begin: Get the latest xlrd library from the link: https://pypi.python.org/pypi/xlrd

File Type Py Version Uploaded on Size
xlrd-0.9.4.tar.gz (md5) Source 2015-07-15 315KB

0. Unzip it into C:/Python27/

1. Launch cmd and navigate to the location of python.exe file which is usually under C:/Python27/

2. Use the command python.exe ./xlrd-(version number)/setup.py install

3. After that launch idle and type: import xlrd

If the shell does not throw any error at you, that means everything is fine. You can now use the xlrd module in your script.

Let’s now create a sample example.

I have created this excel file and the name of the sheet is People (in red). What we are going to do is find out who earns the highest and then write in a text file, the name and salary he earns. Simple enough.

Open idle and create a script file. Give it a name: info.py

First let’s try printing all the contents from the excel file.

        import xlrd
        from xlrd import cellname

        def open_xlBook(file_name, sheet_name):
            data = xlrd.open_workbook(file_name)
            sheet = data.sheet_by_name(sheet_name)
            return sheet

        def print_contents(sheet):
            for row_index in range(sheet.nrows):
                for col_index in range(sheet.ncols):
                    cell_name = cellname(row_index,col_index)
                    print sheet.cell(row_index,col_index).value

        if __name__=='__main__':
            xlFile = './Information.xlsx'
            sheetName = 'People'
            sheet = open_xlBook(xlFile, sheetName)
            print_contents(sheet)

        


              The output will look like this
            >>> ================================ RESTART ================================
            >>>
            Name
            Age
            Salary
            Kroos
            24.0
            5000.0
            Ronaldo
            29.0
            9000.0
            Reus
            28.0
            6000.0
            Rakitic
            26.0
            4000.0
            Maldini
            24.0
            5000.0
            Kaka
            29.0
            9000.0
            Paulo
            28.0
            6000.0
            Gago
            26.0
            4000.0
            >>>

              Notice how the traversal occurs from the output.
              From left to right starting from the top row and when it reaches the end of each row,
              it continues from the second row

        

Now let’s look at the complete code to execute our objective:

          import xlrd
          #from xlrd import cellname #this is not needed unless you want to use the name of cell e.g A1

          def open_xlBook(file_name, sheet_name):
              data = xlrd.open_workbook(file_name)
              sheet = data.sheet_by_name(sheet_name)
              return sheet

          def print_contents(sheet):
              for row_index in range(sheet.nrows):
                  for col_index in range(sheet.ncols):
                      #cell_name = cellname(row_index,col_index)
                      print sheet.cell(row_index,col_index).value

          def createDictionary(sheet):
              name_sal_dict = {} #create a one to one mapping between names and salaries
              sal = [] #create list to store salaries
              for row_index in range(sheet.nrows):
                  for col_index in range(sheet.ncols):
                      #skip the title row (i.e topmost row) and only focus on column 0
                      #where we get all the names
                      if row_index > 0 and col_index <1:
                          #get the salary from 3rd column and map it to the name from column 0
                          name_sal_dict[sheet.cell(row_index,col_index-col_index).value] = \
                                                      sheet.cell(row_index,col_index+2).value
                          #keep adding the salaries to list
                          sal.append(sheet.cell(row_index,col_index+2).value)
                      else:
                          pass
              return name_sal_dict,sal

          def whoEarnMost(dic, lis):
              #assemble the result string to write to the file
              output_string = ""
              for name, sal in dic.iteritems():
                  if sal == max(lis):
                      output_string = output_string + name + " earns " + str(sal) + " USD."+'\n'
              return output_string


          def writeOut(output_string):
              f = open('output.txt','w')
              f.write(output_string)
              f.close

          if __name__=='__main__':
              xlFile = './Information.xlsx'
              sheetName = 'People'
              #open sheet
              sheet = open_xlBook(xlFile, sheetName)
              #get the name: salary dictionary and the salary list
              name_salary_dict, salary_list = createDictionary(sheet)
              #get the result string
              out_string = whoEarnMost(name_salary_dict, salary_list)
              #output to the file
              writeOut(out_string)

          

This is the result:

These are my 3 files in the working directory:

In Conclusion:

The main idea is to think of the excel sheet as a 2D Matrix. Create a one to one mapping of name:salary pairs and store them in a dictionary data structure. Then we create a list to store all the salaries. We then get the maximum salary from the list by using max() function provided by Python. Then we compare that max with the content in the dictionary and every time there is a match, we write the key and value pair (i.e., name and salary) into text file.

P.S: I used the name of footballers because their names come to me naturally. ;)

The full code can be found at my Github : MY GITHUB PAGE