Python中的简单语法与常见操作

发表于 2019-06-27 更新于 2025-02-14 分类于 Python 阅读次数：

缩进是 Python的灵魂

关于 Python 的教程可参考廖雪峰的网站
https://www.liaoxuefeng.com/wiki/1016959663602400

1. 逻辑运算符

df[(df['id']>=1) & (df['id']<=2)]

# 以上写法等价于：
df.query('id>=1 & id<=2')
# (https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.query.html)

2. 比较操作符

>, <, >=, <=, ==, !=

pandas进行条件筛选和组合筛选

参考 https://www.cnblogs.com/qxh-beijing2016/p/15499009.html

df = pd.DataFrame({'A':[100, 200, 300, 400, 500],'B':['a', 'b', 'c', 'd', 'e'],'C':[1, 2, 3, 4, 5]})

        A    B    C
 0    100    a    1
 1    200    b    2
 2    300    c    3
 3    400    d    4
 4    500    e    5

（1）找出df中A列值为100、200、300的所有数据

num_list = [100, 200, 300]
df[df.A.isin(num_list)]     #筛选出A列值在num列表的数据条
       A    B    C
0    100    a    1
1    200    b    2
2    300    c    3

（2）找出df中A列值为100且B列值为‘a’的所有数据

df[(df.A==100)&(df.B=='a')]
        A    B    C
 0    100    a    1

也可写作：
df.where((df.A==100)&(df.B=='a'))

（3）找出df中A列值为100或B列值为‘b’的所有数据

df[(df.A==100)|(df.B=='b')]
        A    B    C
 0    100    a    1
 1    200    b    2

注：多条件筛选的时候，必须加括号'()'

2.1 is vs ==

print(True == 1)	#True
print('' == 1)		#False
print([] == 1)		#False
print(10 == 10.0)	#True
print([] == [])		#True
print('1' == 1)		#False
print([1,2,3] == ['1','2','3'])		#False

== checks for equality of value. 例如 True==1，其实比较的是 True==bool(1)，会先转化成同一种类型再比较。但值得注意的是 '1'==1，结果为False，并没有先转化成同一种类型。在使用==的时候，最好比较的是同一类型的，不要让python do the type conversion and hopefully python figures it out for us.

print(True is 1)		#False		
print('' is 1)			#False
print([] is 1)			#False
print(10 is 10.0)		#False
print([] is [])			#False
print([1,2,3] is [1,2,3])	#False
print('1' is 1)			#False
print(True is True)		#True
print('1' is '1')		#True

is actually checks if the location in memory where the value is stored is the same. 为什么 [] is [] 是False，因为每次create a list, it's added in memory somewhere (in another location).

3. Python中的双引号

Python中双引号和单引号作用一样，即字符串可以用单引号也可以用双引号

4. type()

4.1 type()函数

i = 4
print(type(i))
# <class 'int'>

print(type(2/4))
# <class 'float'>

print(type("hello"))
# <class 'str'>

n = None
print(type(n))
# <class 'NoneType'>

4.2 dtype

查看type

print(train["ID"].dtype)
# object

# 或 train["ID"].dtypes

#### 4.3 转换type

# 当单个数想转换type时，可直接使用 str(), int()等
print(type(str(100)))
# <class 'str'>

# 当一列数想转换type时
# 法一
train["ID"] = train["ID"].astype(np.int32)

# 法二
train = train.astype({"user_age": "float64"})

例：

# 将因缺失值而成为object的变量转换为float
for col in X_train1.columns[X_train1.dtypes=="object"]:
    try:
        X_train1 = X_train1.astype({col: "float64"})
    except:
        pass

5. Python二维列表初始化

1	[[0]n for _ in range(m)] # mn的二维列表

6. 查看列表中最小值所在位置(索引)

第一个位置

1
2
3

listA.index(min(listA))
或：
np.argmax(listA)

所有位置：[i for i,v in enumerate(listA) if v == min(listA)]

7. 取top n

# 写法一：
df[['Country name', 'happiness']].sort_values(by='happiness', ascending=False)[:10]

# 写法二：
df[['Country name', 'happiness']].nlargest(10, 'happiness')

nlargest: Return the first `n` rows ordered by `columns` in descending order.

8. 输出

参考：
https://www.runoob.com/w3cnote/python3-print-func-b.html
https://www.cnblogs.com/penphy/p/10028546.html
https://www.cnblogs.com/lovejh/p/9201219.html

简单的输出：

>>>print('My name is', 'John')
My name is John

>>>student = (30,"University of Oxford","1 High Street, London, W1 1PP", "John Cameron")
>>>print(student[3]+ ", "+student[1] + ", " + student[2] + "," + str(student[0]))
John Cameron, University of Oxford, 1 High Street, London, W1 1PP,30

print不换行：

在 Python 中 print 默认是换行的:

>>>for i in range(0,3):
...     print (i)
... 
0
1
2

要想不换行应该写成 print(i, end = '' )

>>> for i in range(0,6):
...     print(i, end=" ")
... 
0 1 2 3 4 5

8.1 %用法

支持参数格式化，与 C 语言的 printf 类似

>>>print('Hello, %s. Your age is %d.' % ('Johon', 20))
Hello, Johon. Your age is 20.

>>>str = "the length of (%s) is %d" %('runoob',len('runoob'))
>>>print(str)
the length of (runoob) is 6

python字符串格式化符号:

符号	描述
%c	格式化字符及其ASCII码
%s	格式化字符串
%d	格式化整数
%u	格式化无符号整型
%o	格式化无符号八进制数
%x	格式化无符号十六进制数
%X	格式化无符号十六进制数（大写）
%f	格式化浮点数字，可指定小数点后的精度
%e	用科学计数法格式化浮点数
%E	作用同%e，用科学计数法格式化浮点数
%g	%f和%e的简写
%G	%f 和 %E 的简写
%p	用十六进制数格式化变量的地址

格式化操作符辅助指令:

符号	功能
*	定义宽度或者小数点精度
-	用做左对齐
+	在正数前面显示加号( + )
`<sp>`	在正数前面显示空格
#	在八进制数前面显示零('0')，在十六进制前面显示'0x'或者'0X'(取决于用的是'x'还是'X')
0	显示的数字前面填充'0'而不是默认的空格
%	'%%'输出一个单一的'%'
(var)	映射变量(字典参数)
m.n.	m 是显示的最小总宽度,n 是小数点后的位数(如果可用的话)

8.1.1 格式化输出整数

%x --- hex 十六进制
%d --- dec 十进制
%o --- oct 八进制

1
2
3

>>>nHex = 0xFF
>>>print("nHex = %x,nDec = %d,nOct = %o" %(nHex,nHex,nHex))
nHex = ff,nDec = 255,nOct = 377

8.1.2 格式化输出浮点数

%f --- 保留小数点后面六位有效数字
%.3f -- 保留3位小数位

>>>pi = 3.141592653  
>>>print('%10.3f' % pi) #字段宽10，精度3 (整个数包括小数点后的位数一共为10位) 
     3.142  
>>>print("pi = %.*f" % (3,pi)) #用*从后面的元组中读取字段宽度或精度  
pi = 3.142  
>>>print('%010.3f' % pi) #用0填充空白  
000003.142  
>>>print('%-10.3f' % pi) #左对齐  
3.142       
>>>print('%+f' % pi) #在正数前面显示加号( + )
+3.141593

8.1.3 格式化输出字符串

%s
%10s --- 右对齐，占位符10位
%-10s --- 左对齐，占位符10位
%.2s --- 截取2位字符串
%10.2s ---10位占位符，截取两位字符串

>>> print('%s' % 'hello world')  # 字符串输出
hello world
>>> print('%20s' % 'hello world')  # 右对齐，取20位，不够则补位
         hello world
>>> print('%-20s' % 'hello world')  # 左对齐，取20位，不够则补位
hello world         
>>> print('%.2s' % 'hello world')  # 取2位
he
>>> print('%10.2s' % 'hello world')  # 右对齐，取2位
        he
>>> print('%-10.2s' % 'hello world')  # 左对齐，取2位
he

三个双引号中也可以使用：
sql = """
	select *
	from dws_base.t_table
	where dt=%(end_date)s
"""%{"end_date":20231203}

另外的变量替换方法：
>>>from string import Template
>>>s = Template('$who like $what')
>>>s.substitute(who='tim', what='kung pao')
'tim likes kung pao'

8.2 format用法

相对基本格式化输出采用‘%’的方法，format()功能更强大，该函数把字符串当成一个模板，通过传入的参数进行格式化，并且使用大括号‘{}’作为特殊字符代替‘%’

位置匹配

（1）不带编号，即“{}”
（2）带数字编号，可调换顺序，即“{1}”、“{2}”
（3）带关键字，即“{a}”、“{tom}”

>>> print('{} {}'.format('hello','world'))  # 不带字段
hello world
>>> print('{0} {1}'.format('hello','world'))  # 带数字编号
hello world
>>> print('{0} {1} {0}'.format('hello','world'))  # 打乱顺序
hello world hello
>>> print('{a} {tom} {a}'.format(tom='hello',a='world'))  # 带关键字
world hello world

>>> print('{} is {:.2f}'.format(1.123,1.123))  # 取2位小数
1.123 is 1.12

>>> '{:,}'.format(1234567890)
'1,234,567,890'

想输出：The percentage of female students is 68.85%.👇

1 2	str_student = "The percentage of female students is {:.2f}%.".format(68.8543) print(str_student)

百分数：

>>> points = 1
>>> total = 2
>>> 'Correct answers: {:.2%}'.format(points/total)
'Correct answers: 50.00%'

注：python3中可以更加简洁：在开头加f:
>>> print(f'Correct answers: {points/total:.2%}')

千分位：

1	df["金额"].apply(lambda x: f"{x:,.0f}")

更多关于数字格式化的内容见 https://www.runoob.com/python/att-string-format.html

9. 切片操作

参考 https://www.liaoxuefeng.com/wiki/1016959663602400/1017269965565856

Python切片操作可作用于list (包括tuple, string)

可以想象为每个位置有两个编号：

b = a[i:j] - 表示复制 a[i] 到 a[j-1]，以生成新的list对象。例：

a = [0,1,2,3,4,5,6,7,8,9]
a[1:3] # [1,2]
a[:-2] # [0,1,2,3,4,5,6,7]
a[-2:-1] # [8]
a[:2] # [0,1]，前2个数
a[-2:] #[8,9]，后2个数

当 i 缺省时，默认为0，即 a[:3] 等价于 a[0:3]
当 j 缺省时，默认为 len(a)，即 a[1:] 等价于 a[1:len(a)]
当 i, j 都缺省时，a[:] 就相当于完整复制一份 a (等价于 a.copy() 浅拷贝，注：更多关于浅拷贝与深拷贝的内容见我的另一篇博客)

b = a[i:j:s] - 其中 i 与 j 与上面一样，s 表示步进，缺省时默认为1.

a = [0,1,2,3,4,5,6,7,8,9]
a[::-1] # [9,8,7,6,5,4,3,2,1,0]
a[::-2] # [9,7,5,3,1]
a[:5:2] # 前5个数，每两个取一个，[0,2,4]
a[::5] #所有数，每5个取一个，[0,5]

a[i:j:1] 等价于 a[i:j]
当 s<0 时（step小于0），i 缺省时默认为-1，j 缺省时默认为 -len(a) -1
a[::-1] 等价于 a[-1:-len(a)-1:-1]，即从最后一个元素f复制到第一个元素，即倒序

练习：

python = 'I am PYHTON'

print(python[1:4])
print(python[1:])
print(python[:])
print(python[1:100])
print(python[-1])
print(python[-4])
print(python[:-3])
print(python[-3:])
print(python[::-1])

结果：

另：dataframe中：

若想取第2列到第4列：
1
df.iloc[:, 2:5]
若想取第6列、第10列：
1
df.iloc[:, [6,10]]
若想取第2列到第4列，与第6列、第10列：

参考 How to slice continuous and discontinuous index in pandas? - Stack Overflow

使用numpy.r：
1
2
import numpy as np
df.iloc[:, np.r_[2:5,6,10]]

10. 日期操作

参考：
https://blog.csdn.net/u010159842/article/details/78331490 https://blog.csdn.net/sinat_30715661/article/details/82534033

import datetime
print(datetime.time(5,42,2))
#05:42:02
# ↑get time object

print(datetime.date.today())
#2023-12-23

字符转日期：

>import datetime
>d1 = datetime.datetime.strptime('20220920','%Y%m%d')
>d1
datetime.datetime(2022, 9, 20, 0, 0)

>datetime.strptime('2018-09-08','%Y-%m-%d')
datetime.datetime(2018, 9, 8, 0, 0)

↑ strptime函数接收两个参数，第一个是要转换的字符串日期，第二个是日期时间的格式化形式。

日期加减：

当前时间的3天后：

>import datetime
>now = datetime.datetime.now()
>now
datetime.datetime(2022, 10, 17, 11, 33, 50, 378505)

>delta = datetime.timedelta(days=3)
>n_days = now + delta
>n_days.strftime('%Y-%m-%d %H:%M:%S')      #strftime把datetime格式转化为string
'2022-10-20 11:33:50'
>n_days.strftime('%Y-%m-%d')
'2022-10-20'

指定时间的1天前：

>import datetime
>d1 = datetime.datetime.strptime('20220920','%Y%m%d')		#转换为datetime格式
>d2 = (d1+datetime.timedelta(days=-1)).strftime('%Y%m%d')   #计算前一天，并转换为string格式
>d2
'20220919'