数据清洗之 Excel文件读写

Excel文件读写

  • 使用read_excel读取,读取后的结果为dataframe格式
  • 读取excel文件和csv文件参数大致一样,但要考虑工作表sheet页
  • 参数较多,可以自行控制,但很多时候使用默认参数
  • 读取excel时,注意编码,常用编码为utf-8、gbk、gbk2312和gb18030等
  • 使用to_excel快速保存为xlsx格式
import pandas as pd
import os

os.getcwd()

'D:\\Jupyter\\notebook\\Python数据清洗实战\\数据清洗之文件读写'

os.chdir('D:\\Jupyter\\notebook\\Python数据清洗实战\\数据')

df1 = pd.read_excel('meal_order_detail.xlsx', encoding='utf-8', sheet_name='meal_order_detail1')

df1.head(5)

<div>

<style scoped>

.dataframe tbody tr th:only-of-type {

    vertical-align: middle;

}

.dataframe tbody tr th {

    vertical-align: top;

}

.dataframe thead th {

    text-align: right;

}

</style>

<table border=”1″ class=”dataframe”>

<thead>

<tr style="text-align: right;">

  <th></th>

  <th>detail_id</th>

  <th>order_id</th>

  <th>dishes_id</th>

  <th>logicprn_name</th>

  <th>parent_class_name</th>

  <th>dishes_name</th>

  <th>itemis_add</th>

  <th>counts</th>

  <th>amounts</th>

  <th>cost</th>

  <th>place_order_time</th>

  <th>discount_amt</th>

  <th>discount_reason</th>

  <th>kick_back</th>

  <th>add_inprice</th>

  <th>add_info</th>

  <th>bar_code</th>

  <th>picture_file</th>

  <th>emp_id</th>

</tr>

</thead>

<tbody>

<tr>

  <th>0</th>

  <td>2956</td>

  <td>417</td>

  <td>610062</td>

  <td>NaN</td>

  <td>NaN</td>

  <td>蒜蓉生蚝</td>

  <td>0</td>

  <td>1</td>

  <td>49</td>

  <td>NaN</td>

  <td>2016-08-01 11:05:36</td>

  <td>NaN</td>

  <td>NaN</td>

  <td>NaN</td>

  <td>0</td>

  <td>NaN</td>

  <td>NaN</td>

  <td>caipu/104001.jpg</td>

  <td>1442</td>

</tr>

<tr>

  <th>1</th>

  <td>2958</td>

  <td>417</td>

  <td>609957</td>

  <td>NaN</td>

  <td>NaN</td>

  <td>蒙古烤羊腿\r\n\r\n\r\n</td>

  <td>0</td>

  <td>1</td>

  <td>48</td>

  <td>NaN</td>

  <td>2016-08-01 11:07:07</td>

  <td>NaN</td>

  <td>NaN</td>

  <td>NaN</td>

  <td>0</td>

  <td>NaN</td>

  <td>NaN</td>

  <td>caipu/202003.jpg</td>

  <td>1442</td>

</tr>

<tr>

  <th>2</th>

  <td>2961</td>

  <td>417</td>

  <td>609950</td>

  <td>NaN</td>

  <td>NaN</td>

  <td>大蒜苋菜</td>

  <td>0</td>

  <td>1</td>

  <td>30</td>

  <td>NaN</td>

  <td>2016-08-01 11:07:40</td>

  <td>NaN</td>

  <td>NaN</td>

  <td>NaN</td>

  <td>0</td>

  <td>NaN</td>

  <td>NaN</td>

  <td>caipu/303001.jpg</td>

  <td>1442</td>

</tr>

<tr>

  <th>3</th>

  <td>2966</td>

  <td>417</td>

  <td>610038</td>

  <td>NaN</td>

  <td>NaN</td>

  <td>芝麻烤紫菜</td>

  <td>0</td>

  <td>1</td>

  <td>25</td>

  <td>NaN</td>

  <td>2016-08-01 11:11:11</td>

  <td>NaN</td>

  <td>NaN</td>

  <td>NaN</td>

  <td>0</td>

  <td>NaN</td>

  <td>NaN</td>

  <td>caipu/105002.jpg</td>

  <td>1442</td>

</tr>

<tr>

  <th>4</th>

  <td>2968</td>

  <td>417</td>

  <td>610003</td>

  <td>NaN</td>

  <td>NaN</td>

  <td>蒜香包</td>

  <td>0</td>

  <td>1</td>

  <td>13</td>

  <td>NaN</td>

  <td>2016-08-01 11:11:30</td>

  <td>NaN</td>

  <td>NaN</td>

  <td>NaN</td>

  <td>0</td>

  <td>NaN</td>

  <td>NaN</td>

  <td>caipu/503002.jpg</td>

  <td>1442</td>

</tr>

</tbody>

</table>

</div>

df1 = pd.read_excel('meal_order_detail.xlsx', encoding='utf-8', sheet_name=0)  # sheet_name 可根据索引值取工作簿 等于0即为第一个工作簿

df1.head(5)

<div>

<style scoped>

.dataframe tbody tr th:only-of-type {

    vertical-align: middle;

}

.dataframe tbody tr th {

    vertical-align: top;

}

.dataframe thead th {

    text-align: right;

}

</style>

<table border=”1″ class=”dataframe”>

<thead>

<tr style="text-align: right;">

  <th></th>

  <th>detail_id</th>

  <th>order_id</th>

  <th>dishes_id</th>

  <th>logicprn_name</th>

  <th>parent_class_name</th>

  <th>dishes_name</th>

  <th>itemis_add</th>

  <th>counts</th>

  <th>amounts</th>

  <th>cost</th>

  <th>place_order_time</th>

  <th>discount_amt</th>

  <th>discount_reason</th>

  <th>kick_back</th>

  <th>add_inprice</th>

  <th>add_info</th>

  <th>bar_code</th>

  <th>picture_file</th>

  <th>emp_id</th>

</tr>

</thead>

<tbody>

<tr>

  <th>0</th>

  <td>2956</td>

  <td>417</td>

  <td>610062</td>

  <td>NaN</td>

  <td>NaN</td>

  <td>蒜蓉生蚝</td>

  <td>0</td>

  <td>1</td>

  <td>49</td>

  <td>NaN</td>

  <td>2016-08-01 11:05:36</td>

  <td>NaN</td>

  <td>NaN</td>

  <td>NaN</td>

  <td>0</td>

  <td>NaN</td>

  <td>NaN</td>

  <td>caipu/104001.jpg</td>

  <td>1442</td>

</tr>

<tr>

  <th>1</th>

  <td>2958</td>

  <td>417</td>

  <td>609957</td>

  <td>NaN</td>

  <td>NaN</td>

  <td>蒙古烤羊腿\r\n\r\n\r\n</td>

  <td>0</td>

  <td>1</td>

  <td>48</td>

  <td>NaN</td>

  <td>2016-08-01 11:07:07</td>

  <td>NaN</td>

  <td>NaN</td>

  <td>NaN</td>

  <td>0</td>

  <td>NaN</td>

  <td>NaN</td>

  <td>caipu/202003.jpg</td>

  <td>1442</td>

</tr>

<tr>

  <th>2</th>

  <td>2961</td>

  <td>417</td>

  <td>609950</td>

  <td>NaN</td>

  <td>NaN</td>

  <td>大蒜苋菜</td>

  <td>0</td>

  <td>1</td>

  <td>30</td>

  <td>NaN</td>

  <td>2016-08-01 11:07:40</td>

  <td>NaN</td>

  <td>NaN</td>

  <td>NaN</td>

  <td>0</td>

  <td>NaN</td>

  <td>NaN</td>

  <td>caipu/303001.jpg</td>

  <td>1442</td>

</tr>

<tr>

  <th>3</th>

  <td>2966</td>

  <td>417</td>

  <td>610038</td>

  <td>NaN</td>

  <td>NaN</td>

  <td>芝麻烤紫菜</td>

  <td>0</td>

  <td>1</td>

  <td>25</td>

  <td>NaN</td>

  <td>2016-08-01 11:11:11</td>

  <td>NaN</td>

  <td>NaN</td>

  <td>NaN</td>

  <td>0</td>

  <td>NaN</td>

  <td>NaN</td>

  <td>caipu/105002.jpg</td>

  <td>1442</td>

</tr>

<tr>

  <th>4</th>

  <td>2968</td>

  <td>417</td>

  <td>610003</td>

  <td>NaN</td>

  <td>NaN</td>

  <td>蒜香包</td>

  <td>0</td>

  <td>1</td>

  <td>13</td>

  <td>NaN</td>

  <td>2016-08-01 11:11:30</td>

  <td>NaN</td>

  <td>NaN</td>

  <td>NaN</td>

  <td>0</td>

  <td>NaN</td>

  <td>NaN</td>

  <td>caipu/503002.jpg</td>

  <td>1442</td>

</tr>

</tbody>

</table>

</div>

os.getcwd()

'D:\\Jupyter\\notebook\\Python数据清洗实战\\数据'

df1.to_excel('asdf.xlsx', index=False, sheet_name='one')

正文完